Standards Review and Survey Summary


The value of standards for health datasets in artificial intelligence-based applications

Dr Anmol Arora


The STANDING Together initiative aims to create a robust set of recommendations to encourage transparency (documentation of 'who' is represented, and 'how' they are represented) of health datasets used for healthcare AI development. 

What was this study about?

This Nature Medicine study is the first stage of the STANDING Together project, aiming to explore existing standards, frameworks and best practices relating to data diversity in health datasets. The study comprises two parts: a systematic review of existing literature and a survey of expert stakeholder views. In the literature search, we screened over 15,000 publications to identify 30 studies that proposed relevant standards or frameworks, allowing us to identify key principles that are recognised as being important for data diversity. This was well supplemented by the survey of stakeholders, which was able to offer some practical insights into how these can actually be operationalised. 

What did we find?

Key topics were identified: there is a need to document how data is collected, how missing data is handled and how datapoints are labelled. As expected, the need for dataset diversity was well established in literature. Furthermore, the surveyed experts generally favored the development of a robust set of guidelines, but there were mixed views about how these could be practically implemented. For example, there were varying views on how the recommendations could realistically be enforced and who might be responsible for overseeing them. It became clear that there is a need to develop a set of consensus-based recommendations that would be widely accepted, and indeed used, by the AI research community. The outputs of this study directly informed the development of the STANDING Together recommendations for transparency of  health datasets.