Digital information and data

Digital information and data play complex roles in research in the humanities and social sciences (SWR 2003; Arzberger, Schroeder et al. 2004; Boonstra, Breure et al. 2004). This creates particular challenges for the application of e-research methods and techniques, especially if complex and fuzzy data sets are involved (eg. visual data, music, complex texts). The increased availability of digital resources, data and collections, partly the result of digitisation of cultural heritage and of administrative databases, promises to facilitate more possibilities for comparative research. There may be more scope for interdisciplinary research that is based on the combination of data from very different types of sources. Questions that until recently could only be dealt with in a speculative way may now be approached by data-oriented empirical research. Re-use of data may become more prominent (SWR 2003). The capacity to process and visualise huge datasets is moreover expected to create additonial opportunities for empirical research with the help of new computational research methods. In short, both in the humanities and in the social sciences new objects of research, which we call "epistemic objects"(Rheinberger 1997), will emerge. This development is parallel to the creation of new experimental arrangements in e-science.

The research in this theme will address the question what the characteristics of these new epistemic objects will and should have, and how they may reconfigure scholarly research. What type of questions will be foregrounded and which questions may become less central? Which assumptions are built into the new epistemic objects and how may they influence the boundaries between scientific specialties? We will also pay attention to the specificity of qualitative data. They are often more fuzzy and less easy to standardise. This also influences the development of research traditions to share qualitative data for comparative (re)-analysis (Wouters and Schroder 2003). The Studio research will strive to complement existing research into scientific and scholarly data and data standards by focusing on the epistemic and social role of data and data sources in the humanities and social sciences. Purely technical research into data and meta-data formats is the domain of expertise of computer and data science departments in the universities. Where a joint effort seems fruitful, we will seek cooperative research with research teams in information and computer science (eg. CWI and the Telematics Institute). In the area of informatics for the humanities, we will seek collaboration with humanities computing research groups in the Netherlands and abroad, and with the R&D departments of data archives and repositories.

To provide a sharper focus on the particularities of data handling in the social sciences and humanities (Hockey 2000; SWR 2003; Boonstra, Breure et al. 2004), the research in this theme will maintain a firm comparative perspective with the natural and technical sciences. This will also enable the Studio researchers to be alert to new developments in data science and technology. For example, in those fields that have undertaken major digitisation projects, how does e research change the way data is conceptualised, handled and shared? And how do disciplinary communities organise their work around digitised data, eg. do practices become standardised or do field differences persist? In this respect, the comparison of the development of data initiatives in the humanities with data grids in the social sciences seems relevant.

The data theme will also pay specific attention to the issue of data sharing and data sharing policies. This research is based on the completed Nerdi projects on data sharing (Wouters 2000; Beaulieu 2003; Wouters and Schroder 2003; Arzberger, Schroeder et al. 2004). The emergence of e-research creates specific tensions for data sharing, partly because it may no longer be clear who has control over the data sets. Increased attention to data sharing, also in the framework of the organisation of new data archives in the social sciences and humanities, may create tensions with established research practices and routines that are often not oriented to data sharing. The Studio will therefore not only study data sharing but also resistance to data sharing.

The flood of Web data poses a new challenge to social science and cultural analysis which cuts across the divide beteen quantitative and qualitative data. The Studio will organise a Webometrics Collaboratory within the theme Data and Digital Information to enable the rapid mobilisation of existing international expertise in this area.

The last decade has witnessed an increase in quantitative methods using Web data and in sophisticated quantitative analyses of the structure of the Web and the internet (Ebeling and Feistel 1990; Adamic 1999; Watts 1999; Albert and Barabasi 2002; Scharnhorst 2003). This has even led to the establishment of a new field in the information sciences, "webometrics" (Almind and Ingwersen 1997; Rousseau 1997; Boudourides, Sigrist et al. 1999; Bjorneborn and Ingwersen 2001). Web data can be used to analyse the internet and the Web as a complex information space in which communication patterns emerge and self-organise (Leydesdorff 2002). Webometrics can also be used to study the change of institutional structures (by means of hyperlink analysis) and the emergence of new institutional structures and infrastructures. Changes in scientific production and communication can be studied in so far as they can be represented in Web based indicators. We expect that webometrics will also contribute to our understanding of the emergence of new forms of Web based scientific communication and collaboration, such as related to e-journals, collaboratories, online databases, file sharing and collaborative simulations. Indicators developed on the basis of Web data can have both an evaluative and descriptive role. In this collaboratory, they should primarily provide insights in the nature of knowledge production in e-research.

The research in this theme builds further on recent European research projects in webometrics, in particular on WISER and EICSTES . It will extend the research questions in these projects toward a "reflexive webometrics". It aims to develop novel methods for automated data gathering (with open source web crawlers, commercial software, Web page annotation schemes, and search engine tools) and to contribute to the development of professional standards to observe the dynamic Web. We expect that this will lead to analytic tools that can be used by other researchers in the social sciences and humanities without the need for additional programming expertise (Thelwall 2001; Thelwall 2002). We expect that these methods will be particularly successful if they are intimately related to qualitative and quantitative content analysis of Web phenomena. For instance, hyperlink network analysis has shown interesting topological features in graph theory. It is, however, still far from clear how these graph theoretical structures can be interpreted. An important aspect of future research in webometrics will be the development of dynamic observation based on the self-organizing and fluid nature of the web as a medium. New insights of complexity theory into the description of complex structures will have to be taken into account in this research.