Data documentation


Data documentation is both a product and a process. As a process, it is the active and continuous recording of relevant information about the data collected or processed throughout the research cycle. As a product, data documentation is an umbrella term for several categories of facts and details about your research data. These can include:

  • Documentation about data objects or datasets, e.g.: the formats of the data, the software necessary to read them, codebooks, and the meaning of codes and/or variables.
  • Documentation about the process of data collection, e.g.: lab notebooks, questionnaires, manuals, diaries, etc.
  • Administrative metadata: information that details the origin, purpose, time, geographic location, creator, access, and terms of use of datasets. This information is used to retrieve or index data in repositories or archives.


Data documentation brings several benefits. The most important are:

  • Helping to remember all details about your data.
  • Increasing the reproducibility of your research.
  • Helping to produce high-quality data.
  • Easing your collaborators to use the data during collaborative projects.
  • Increasing the findability and visibility of your data in repositories.
  • Serving as strong evidence of scientific integrity.

Data documentation or metadata?

One term that is often used in relation to data documentation is metadata, or ‘data about data’. In practice, these terms are often used interchangeably. It is useful, however, to distinguish between unstructured and structured documentation/metadata.

Unstructured documentation or metadata includes any information about the research data that investigators choose to provide as context for their dataset and is human-readable. These can include general README files detailing information about the entire research project, including research questions and methodology, but also detailed information about data objects, such as interviews or fieldwork notes.

Structured documentation or metadata is a special type of metadata object that can be written both in plain language or in .xml and contains a set of information fields that take a particular value (e.g., temperature: 30°C). As such, structured metadata is computer-readable and used for indexing or retrieving datasets in repositories or archives, in the same way that metadata in a library catalogue helps you retrieve the books you need from the library search engine. Structured metadata is created in accordance with metadata standards.

This page was last updated in January 2023. Did you find a broken link or (seemingly) incorrect information? Please send an email with the title 'Website content' to

Compare @count study programme

  • @title

    • Duration: @duration
Compare study programmes