A recent PwC survey (1) revealed that a fifth (20%) of 1,000 business executives said their organisation plans to implement Artificial Intelligence (AI) enterprise-wide in 2019. If these ambitious plans pan out, many companies will become AI-enhanced — not just in some pockets of the organisation, but across the entire business.

Data is a crucial aspect of any AI-enhanced company, often referred to as the lifeblood of AI. The more data, the better. AI systems can process a tremendous amount of data, with their precision increasing along with the rise in quantity of data. Consider the Healthcare or Life Sciences industry, arguably among the sectors with the biggest growth in AI usage. Both industries have an abundance of data, but using this data to gain insights can be challenging.

Especially when data quality is subpar. Obtaining a dataset that’s of sufficient high quality to train an AI model is often mission impossible. Datasets may not always be findable or available for some reason (e.g. undocumented data, patchy metadata, etc.), and even when they are, the data may neither be interoperable nor reusable (e.g. myriad of unstructured data types, common vocabulary not used, etc.).

Good data and AI go hand-in-hand. Unfortunately, the current digital ecosystem prevents us from extracting maximum benefit from existing data. To help resolve this, science funders, publishers and governmental agencies are beginning to ask for data management and stewardship plans for data generated in publicly funded experiments. This would facilitate and simplify the ongoing process of discovery, evaluation and reuse in downstream studies (2). FAIR principles exist to guide data producers through their journey of generating good data and thus maximising its reuse.

The FAIR principles (3), i.e. Findable - Accessible - Interoperable - Reusable, contribute to science and research data quality. Good data isn’t a goal in itself, but rather the key conduit to knowledge discovery and innovation (4). In this respect, the FAIR principles support a move towards better data management practices. A recent European Commission study, authored by PwC, estimated the value of research data, both in economic and non-economic terms, and compared it against the current situation where a majority of research data doesn’t adhere to the FAIR principles.

We found that having FAIR research data could save the European economy at least 10.2 billion euros each year, indicating that the expected benefits will clearly outweigh the implementation costs (5). We also listed a number of beneficial outcomes from FAIR, such as the positive impact on research quality, economic turnover and machine readability of research data.

FAIR research data will make a significant contribution to research, innovation and business growth in Europe. In other words, FAIR data and its impact on AI could be a game-changer for data-intensive sciences (e.g. Healthcare, Life Sciences, Astronomy, etc.).

The quality of insights produced by AI systems relies heavily on the quality of data processed, but are we able to adequately and easily assess data quality? With enhanced metadata following best practices, including vocabularies and other semantics, it’ll be easier to appraise data quality and determine its reusability. Better described data will not only be effective for organisations that carry out research, but also for AI purposes. Well-documented, high-quality data not only reduces the risk of bias and allows for large-scale AI model testing, but also greatly contributes to the trust and explainability (6) of AI.

In addition, in a world where humans increasingly rely on computational support to deal with data (7), machines could help researchers find and analyse research data more easily, positively impacting the science ecosystem by freeing up time, increasing precision, volumes and velocity, and in terms of new insights drawn.

To help research organisations, funders and infrastructures assess the investments required to put FAIR data management in place, we also developed (as part of the same study) a measuring mechanism for the direct and indirect costs and benefits that derive from the implementation of FAIR research data. The European Commission makes the mechanism freely available to the EU research ecosystem.

(1) https://www.pwc.be/en/news-publications/publications/2019/artificial-intelligence-ai-predictions.html
(2) https://www.nature.com/articles/sdata201618
(3) https://www.force11.org/group/fairgroup/fairprinciples
(4)https://www.nature.com/articles/sdata201618
(5)https://publications.europa.eu/en/publication-detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1/language-en
(6)https://www.pwc.com/us/en/services/consulting/library/artificial-intelligence-predictions/explainable-ai.html
(7)https://www.go-fair.org/fair-principles/

Blog

FAIR research data: a high-value foundation for AI

Written by

Blog

FAIR research data: a high-value foundation for AI

Share

Written by