Discovering a new dataset always makes for an exciting time at the Clark Library and provides an excellent opportunity to experiment. We recently learned about the newest version of the Institute of Museum and Library Services (IMLS) Museum Universe Data File. The dataset consists of information describing over 35,000 museums throughout the United States. The dataset describes location, type of museum, rural/urban status and tax information (e.g. revenue, income) where available. There is much more that could be done with this data, but we initially just wanted to get a sense of what the data look like on a map.
A full sized image is also available.
We decided to make a series of small static multiples of the different categories of museum provided in the data in order to see the national distribution. Seeing each of the maps side by side makes it easy to compare the distributions visually. We created our initial maps in ArcGIS for Desktop, producing a single small map for each of the nine categories of museum. We exported the image to a (rather unwieldy) Adobe Illustrator file. Illustrator was able to handle the 35,000 points, but just barely. We then changed the colors to a palette from Color Brewer. Despite working with these tools and methodologies on a regular basis, we were - as we often are! - surprised at how long it took to go from raw data to concept to final visualization.
We were initially interested in some of the tax data, but found the data to be too messy to work with in a reasonable way. Museums that were part of larger institutions, such as the Museum of Art at Duke listed with income of $12.5 billion, most likely had their revenue and income data drawn from the parent institution's tax information. Stanford University was listed as a museum, with no delineation from the rest of the university, with the highest income in the dataset of $17.6 billion. It is typical to find ourselves trying things out that don't ultimately work, and this is part of the process of building an understanding of a dataset's possibilities and limitations.
As we worked on creating the small multiples pictured above, we discussed next steps for exploring and visualizing this dataset. In the coming weeks we plan to produce a similar but interactive visualization using Leaflet or other open source web map technologies. The underlying data will be the exact same csv file with 35,000 records, but the choices we make with these tools will produce a very different visualization.
Justin Joque and Nicole Scholtz