Data Management and Archiving Tips
There are some general guidelines to store data in a proper way. Following these, is a step towards good archiving! By properly archiving a dataset with a proper description it can be reused in the future, even if you are no longer able to explain the data. So think about what information someone needs to be able to use the data when they would find it in the future.
In general
- Make use of existing standards, like ISO norms, the SDN vocabularies, or field specific standards for field labels, units and content
- Provide units for all fields, even when you think your units are perfectly logical
- For dates and times use ISO 8601
- Format dates from the largest unit of time to the smallest unit: yyyy-mm-dd
This makes it very easy to sort things by date and a fixed order prevents mixing up day and month - Use 24 hour time format
Again an advantage when sorting, and again clearness on what is meant - Using UTC is preferred, when not using UTC providing an offset from UTC on each timestamp is required
- Format dates from the largest unit of time to the smallest unit: yyyy-mm-dd
- For describing locations use ISO 6709
- Use the WGS84 projection (EPSG: 4326)
- Use decimal degrees, positive for North/East, negative for South/West
- When combining latitude and longitude in a single field give latitude first
- Make clear which decimal separator you use
For spreadsheets or flat text tables (e.g. xlsx, csv or txt)
- Put the metadata in the first sheet or above the table with data
- Describe every (sheet and every) column (include calculations!)
- Separate raw data from processed and analysed data
For relational databases (e.g. PostgreSQL or MS Access)
- First make a data model in which:
- All table fields are clearly described
- The relational structure – referential integrity is described
- Avoid redundancy, avoid null values
- Ask a professional database expert or the data manager to check your data model
For device/instrument data
- Describe the raw data in a separate file and describe every column of data
- Describe the instrument/device (manufacturer, type) and software (version!) which was used to acquire the data
- Save the manual and protocol together with the (raw) data
- Save the raw data files in structured file system by date and time (ISO 8601 formatting)
- Save the data in a sustainable format, preferable an open or otherwise well documented format (so not a format only closed-source software package X can read)
Treat your data with care:
- Backup your data at key moments (after collection, analyses, end of project, etc.)
- Keep backups at a different location than the original data (spread in such a way that it is unlikely that all locations would burn down or flood at the same time)
- Perform quality control (standardize taxonomy (WoRMS/IT IS), flag bad/good data, add remarks)
For describing metadata, make sure these items are present:
- Parameters and Units and sampling frequency according to applicable vocabularies (the GCMD vocabularies for data from Polar research, the SDN vocabularies for oceanographic research)
- Used instruments, protocols and information on calibration (link to publication)
- Geo-reference stations (coordinate systems (WGS84), depth, height)
- Date and time ("YYYY-MM-DD HH:MM:SS") in UTC
- Register the people involved
- Publications or other information sources linked with data
Data Manager
Marten Tacoma
Royal Netherlands Institute for Sea Research (NIOZ)
Coordinator
Taco de Bruin
Royal Netherlands Institute for Sea Research (NIOZ)