Citing data

The authority of a researcher, research group or research field is currently measured by citation scores of published work: how often do other scientists refer to their work? This impact factor is not just an open end. A higher impact provides opportunities for promotions and getting research funds. Within this structure, it is important that the publication of underlying datasets (data publication) also may count as a legitimate, citable contribution to the research curriculum. DataCite is committed to help achieving this goal. 

To enable the citation datasets, they should be easy to find on a persistent place on the internet. This can be done by assigning a so called Digital Object Identifier (DOI). DOI's are already widely used in the scientific literature to link to journal articles. Assigning a DOI to a dataset, makes its origin traceable and citable.



All DOI's start with a 10. Strictly speaking, the 10 is not a part of the DOI. It's a common identifier that actually says: what follows now is a DOI. After the 10 a character string follows divided in two parts: a prefix  and a suffix. The identification code in the prefix represents the party that has registered the dataset. After the slash, the identifier for the dataset follows.
It doesn't matter whether you use uppercase or lowercase characters: 10.123/ABC is the same as 10.123/abc

A URL and a DOI are both identifiers. But a URL indicates the place where a certain document or information can be found, while a DOI identifies the document itself regardless of where it is located. URL's often lose their value as people rearrange websites. With a DOI that's different. The citation for a dataset is persistent. This means that following the link will always get you back to where the dataset is located. This guarantee is very important in building confidence in the value of data citation after data publication.
If you want to retrieve a DOI, put "" in front of the DOI. Then you always get where you want to be. You can also use "resolve a DOI". The resolver itself must of course also be preserved for the long term. This is done by the International DOI foundation. Concerns about failing of this resolver are absent: "It's too big to fail"

Data citation

Datacite advises1 how to cite a dataset if you mention it in a publication. Members of the metadata working group recommend the following notation: 

Creator (PublicationYear): Title. Publisher. Identifier 

It might look like this:

  • Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP site 127-797. Geological Institute, University of Tokyo. doi:10.1594/PANGAEA.726855
  • For this dataset in 3TU.datacentrum it looks like:
    Keen, A.S (2011): Erosive Bar Migration Using Density and Diameter Scaled Sediment Erosive Profile Set-Prototype Scale (Actual Scale 1:10). TU Delft. doi:10.4121/uuid:32c53005-a4f2-447c-b231-6cdb7dcdd17f

More information?

  1. Building a culture of data citation2
  2. Cite Datasets and Link to Publications3



The time that impact of research was measured by scientific publications alone, seems to have had its day. If sharing of datasets leads to a greater visibility and impact of research, this can give data publication momentum. In addition, within the research community initiatives exist to measure the total impact of research.


1. DataCite. (2011). DataCite MetaData Scheme for the Publication and Citation of Research Data. Retrieved 18-12-2012 from
2. ANDS. (2011). Building a culture of data citation. [poster]. Retreived 9-12-2011 from .
3. Ball, A., Duke, M. (2011). How to Cite Datasets and Link to Publications. DCC How-to Guides. Edinburgh: Digital Curation Centre. Retrieved 9-12-2011 from