Searching for data
Statistical data, for example the unemployment figures or the life expectancy in a particular country, is gathered by national statistical agencies, like the CBS (Statistics Netherlands) or United States Census Bureau, and by statistical divisions of supranational organizations, like EuroStat (of the EU) and UNData (of the United Nations).
A collection of links to the homepages of these statistical agencies can be found here. In the list of ‘Databases for the social sciences’ of the Erasmus Data Service Centre you can find links to databases like SourceOECD and World Development Indicators. In these databases statistical data series are collected for multiple countries, which makes the data from different countries comparable.
Other figures are collected by professional and branch organizations and market research companies. For example in the Netherlands BOVAG publishes figures on the sales of cars, and GfK collects data on the turnover in Dutch supermarkets.
Finding these figures can be difficult. A good starting point is by first considering which organization might collect the data you’re looking for. Sometimes the organizations website has a ‘Facts and figures’ part, which is not indexed by search engine like Google. Newspaper articles can also make you aware of the existence of certain data and the organization that gathers them. Sometimes it takes some ‘Google digging’ to find the data.
Please be aware that historical data series are not always digitalized (yet). Sometimes you’ll have to use the original hard copy publications to collect the data. To find these publications you can use the UL-catalogue.
Most researchers collect data themselves for their scientific research, for example by using surveys or collecting data from population registers or logbooks of medieval ships. Developments on the internet create more possibilities to store data and make this data accessible for other researchers for re-use, sometimes for a complete different purpose. A famous example of re-use of data is the use of the records of whales caught in the Antarctic: these records showed also the decline of sea ice (because there can’t be whales where there is ice).
Some funding agencies already make the preservation of research data a condition of the funding of the research. However, preserving large datasets and keeping them available for next generations researchers and software (!), requires large investments that can’t be made by a single researcher or university. There are already data archives where a researcher can deposit data sets (and where other researchers can access them). In the Netherlands we have DANS (Data Archiving and Networked Services), an institute of KNAW and NWO. ICPSR (Inter-university Consortium for Political and Social Research) is an international consortium of about 700 academic institutions and research organizations. ICPSR maintains about 500.000 files of research data. The access to datasets in DANS and ICPSR differs: sets can be available for everyone, for some sets you have to register first and sometimes you have to ask the creator of the dataset for permission to re-use it.
The Research Data Forum (a collaboration of SURFshare, the 3TU Data Centre, DANS, and the Netherlands Coalition for Digital Preservation (NCDD)) works on national guidelines for the preservation of research data. They also published a useful guide to clarify the legal protection applying to research data, intended for researchers who need to know what they can do with other people’s data.
RePub has also saved a couple of datasets of EUR-researchers.
Source: Nomura, D. K., Long, J. Z., Niessen, S., Hoover, H. S., Ng, S., & Cravatt, B. F. (2010). Monoacylglycerol lipase regulates a fatty acid network that promotes cancer pathogenesis. Cell, 140(1), 49-61. doi:10.1016/j.cell.2009.11.027
Financial data, like stock prices and annual report figures of public companies, is mostly collected by commercial companies, like Thomson Reuters. The Erasmus Data Service Centre provides access to these databases, including Bloomberg, CompuStat, CRSP, Datastream, Orbis and SDC Platinum, and supports researchers in using these databases.