More visibility for FMI METIS datasets through EUDAT-Fairdata integration

Written by: Erja Kortelainen, Sonja Sipponen, Anssi Kainulainen (CSC) and Anca Hienola (FMI)

Fairdata Services now include dataset metadata from METIS – The Finnish Meteorological Institute’s (FMI) EUDAT B2SHARE instance. The catalouge is available on Etsin service. FMI has been publishing their research datasets in METIS since 2020. According to FMI’s  research data policy, all research datasets are openly accessible to the general public. However, when needed, a maximum 2 years of embago is allowed. FMI’s METIS datasets have been in Fairdata since December 8th 2022 and the integration is now fully active which means that all updates and new datasets are synced from FMI to Fairdata daily. By bringing the datasets into Fairdata, it also means that the METIS datasets are automatically visible in Research.fi portal through an existing integration. This improves the visibility and findability of the datasets considerably in addition to existing harvests to EUDAT B2FIND. Implementing integrations like this to multiple services improves reliable findability of information with little work after initial project thus creating impact with less effort in the long run.

FMI Metis research datasets from EUDAT B2SHARE to Fairdata Metax and to Research.fi Portal

This post is an overview of the steps taken to bring the research dataset metadata from EUDAT B2SHARE to Fairdata Services and thus also in Research.fi portal. We will bring out the challenges we faced with different aspects of the project and also what we learned and how could we use the knowledge and know-how in future projects.

First we will lay out the basics of our project and collaboration. After that we will go into more technical details of the implementation. Lastly we’ll speculate on the future uses of this type of integration.

Collaboration

Project was organized by CSC and carried out by a close collaboration between CSC’s Fairdata Team and EUDAT Team. Also Research.fi team was involved to ensure the smooth transition to the Research.fi portal as well.

The Finnish Meteorological Institute (FMI) produces observation and research data on the atmosphere, the near space and the seas, as well as weather, sea, air quality and climate services for the needs of public safety, business life and citizens. The Finnish Meteorological Institute is an administrative branch of the Ministry of Transport and Communications.

The Fairdata Services are integrated services for storing, sharing and publishing research data. The Fairdata Services, organized by the Ministry of Education (of Finland) and produced by CSC, are offered free of charge for users in Finnish higher education institutions and research institutes.

EUDAT Collaborative Data Infrastructure (EUDAT CDI) is a pan-European network/consortium consisting of more than 25 research organisations, data and computing centers, with its origin in CSC led EU projects. EUDAT partners develop and provide a portfolio of interoperable services for different stages of the data lifecycle. CSC provides customised services for organisations based on the EUDAT components, as well as the public EUDAT CDI B2SHARE service.

Reseach.fi portal is a service offered by the Ministry of Education and Culture that collects and shares information on research conducted in Finland. Research.fi contains information on the Finnish research system, publications by Finnish organizations, projects funded by public and private research funders, information on researchers operating in Finland and their research activities, and statistical information on the development of research resources and impact. The service improves the location of information and experts on research and increases the visibility and societal impact of Finnish research.

Fairdata Services, as well as Research.fi, have a search function which makes finding the datasets easy. All FMI’s METIS datasets can be listed at once but datasets can also be filtered and searched by more detailed search terms. Having the datasets both in B2SHARE and in Fairdata and Research.fi gives the datasets more visibility and makes them more findable.

METIS research datasets

The Finnish Meteorological Institute conducts internationally recognized science, the results of which are applied in society and used for supporting decision-making. In addition to measurement and experimental data, the work involves the use of scientific calculation models that make use of supercomputers. FMI produces observations and research data on the atmosphere, the near space and the seas. It also provides services on weather, sea, air quality, climate and near space for the needs of public safety, business life and citizens.

Open and reproducible science is one of the FMI’s strategic objectives. As such, at FMI, through METIS, publicly funded research data is made available to the widest possible audience (under CC BY license), as the best way to maximize the data impact but also to do justice to all the hard labor put into collecting, cleaning, and analyzing the data. However, even if the publication of data is not possible for reasons listed in the institute’s Open Research Data policy, FMI seeks to publish the metadata, acknowledging their existence, topic, contact information and ways – whenever possible – to obtain the data. METIS allows researchers to self-archive their research data, which can increase the visibility, usage, and impact of research conducted at FMI. Knowledge management, research and openness assessment, open access to scholarly research, fragmentation avoidance are some of the other functions of the repository. METIS is evolving into a potent tool for hosting and disseminating accumulated knowledge, highlighting the FMI’s accomplishments in research, and gaining peer recognition.

About the project

The aim of the integration project was to create a solution to automatically copy dataset metadata from FMI’s B2SHARE instance to Fairdata so that it would work in the background without any manual steps. An integration project is never just a simple “we get this 100% right the first time”. It’s more like “Make the first configurations and mappings and see how it works” and then we iterate from there. Integration rarely is absolutely perfect: compromises and agreements are needed in order to get two different systems to understand each other and their underlying data structures.

Although both Fairdata and B2SHARE already had API interfaces in place, we still needed to get the interfaces to talk to and understand each other. When specifying the technical solution, one of the important questions was to decide which service would be the active party in the interaction: Would B2SHARE push the data in Fairdata or would Fairdata pull the metadata from B2SHARE. This was also a question about the responsibilities: Which party would implement the active solution to push or pull the metadata and also make the commitment to maintain it. After the negotiations it was decided that B2SHARE would push its data to Fairdata by using the Fairdata Metax API.

Key steps in the project were:

  • Mapping the datamodel structures and value sets used in the EUDAT to those used in Fairdata
  • Setting up the accounts and connections between the systems, and agree on the API endpoints
  • Build the actual integration solution
  • Testing and reviewing
  • Deployment & Go-Live

This project was no exception to how integration projects usually go. We spent several hours on specifying the field mappings, fine tuning the mappings for different value sets and just overall: getting everything right.

The outcome was first tested in Fairdata’s demo environment. We brought all dataset metadata into the environment and reviewed it closely by making thorough comparisons on how the content looked like in B2SHARE and how it now looked in Fairdata. Adjustments were made, mostly to the mapping rules, and then the result was reviewed again. When the project group agreed that it was finally perfect, the final review was done by FMI’s representatives: hard work had paid off and we did have a ready-for-production solution ready!

Mapping

Integration solution

What’s next?

Creating and showcasing links between research entities is important in order to create a big picture of Finnish research and it’s impact on a national and international level. Funders and decision makers need reliable information and transparency in order to make steering decisions. Finding links between research outputs, funding, infrastructures and organisations is vital also for the development of research information management. Many Finnish research organisations aim to define what is research and what kind of entities relate to it.

Since services like Research.fi include information on publications, research funding, researchers and research organisations in addition to research datasets, it can be used in the future to showcase links between the entities and provide the links through an API for further use. Also B2INST-service is being developed for cataloguing instruments operated within science.

We have been investigating the possibility to bring Finnish research dataset metadata into Fairdata from other Finnish research data repositories, including other B2SHARE instances. As the Fairdata services are meant for increasing the visibility and societal impact of Finnish research and research datasets, the question has been, and still is, how to specify and identify the Finnish datasets from the others. What does it actually mean that the dataset is Finnish? A describer of the dataset is Finnish or the creator affiliates to Finnish a higher education institutions and research institutes? The research group is Finnish and how to define what makes a group Finnish? The research is financed by a Finnish funder? Or something completely else? We are still evaluating these questions and the possibilities in collaboration with the Finnish Fairdata network.

In short, to prepare for national instances to become available we now have a functional EUDAT Core & Extended mapping to Fairdata – we would “only” need to add an extension mapping. In addition, publishing an existing dataset catalog from Fairdata to for example B2FIND would be rather an easy task, at least in technical level, with this already existing mapping. Also, new research entities can be published and catalogued, and linked to datasets through future integrations. Further integrations from both B2FIND and Research.fi are possible in the future to international infrastructures such as OpenAIRE that provides services for open science.

If you are interested in learning more about Fairdata integrations, please visit the Fairdata Network’s page ”Metax integration for organisations”. The page is only in Finnish.

FMI B2SHARE and Fairdata Metax metadata flows