Understanding thorny file formats - the netCDF Data Curation Format Profile (DCFP)

First page of netCDF DCFP

First page of netCDF DCFP

In the UMich Research Data Services (RDS) group, we see and work with all sorts of data.  One particularly thorny variety is netCDF. In Deep Blue Data, we have been getting regular deposits of data in this format, and we didn't know much about it. We had many questions how do we open it, what's its structure, how do researchers create these files and why can the size vary so widely from 100s of MBs to 100s of GBs or even TBs? Jake Carlson, Director of RDS, and I hashed out the idea of creating "profiles" for file formats as quick reference resources for RDS as well as others in the data curation field to help us do our jobs more easily and consistently. So, we thought we'd pilot this idea by creating a “Data Curation Format Profile” (DCFP) for netCDF data files since it seemed like an interesting file format and we were likely to get more of them in the future.  Sam Sciolla our graduate student intern from the University of Michigan School of Information did the research and pulled together a number of resources and information about netCDF files to create our first DCFP with a bit of guidance and shaping from me. The basic structure is for the first page (screenshot above) to be a quick resource summary of the file format including mime-type, file extension(s), structure, versions, primary fields of use, affiliation and source, metadata standards, key questions for curation and tools for review. The rest of the profile expands on these areas to provide additional details that inform how we conduct a curation review for netCDF data files. Of particular use is the "tool for review" section where guidance is provided on which tools to use, the basics of how to use them and why.  You can download the netCDF DCFP here.