18 February 2020How to cope with an expanding universe of scientific data? “The Hitchhiker’s guide to the semantic web galaxy”

© 2024 Observatoire Astronomique de Strasbourg | Webdesign et développement Alchimy.

Le 23 janvier 2015
De 10h30 à 12h00

Yann LE FRANC

Universiteit Antwerpen, Belgique

In all fields of Science, researchers are facing the problem of dealing with increasingly complex and large datasets and the need for integrating data from multiple sources and scales. With the emergence of the Big Data approach, cutting-edge technologies have been developed to store large datasets and perform complex analyses on distributed storage and computing architectures at a lower cost and with greater efficiency.

Despite these technical developments, one of the main issues hampering the use of Big Data analysis in Science is the lack of access to most of the scientific data. Several domain-specific data sharing initiatives have been or are currently developed. European Research Infrastructure projects like EUDAT currently offer a data-sharing platform linked with Big Data services to the scientific community. At the international level, the Research Data Alliance brings together experts from academia, industry and government to define standards, policies and propose common technical solutions to share efficiently research data and to create cross-disciplinary bridges.

One of the main challenges faced by these initiatives is to provide a flexible structure to the data to make it reusable and linked with other data. The Linked Data approach, based on semantic web technologies, offers a solid foundation to address this major challenge of structuring scientific data.

In this presentation, I will provide an overview of the Semantic web and Linked Data approach and describe how it is applied to provide a flexible structure to scientific data. I will then discuss the limitations and issues raised by this approach. In particular, I will emphasize the need to build common terminologies, also called ontologies.  I will finally focus on the challenges of integrating seamlessly these new tools into the everyday scientist’s workflow so that data is structured during the first stages of the scientific process.