Dataset Use Standards
2 - Dataset Use Standards
These standards are in two parts:
Dataset Documentation Standards (also accessible using the bottom at the bottom of this page)
Dataset Use Standards (this page)
The primary purpose of Dataset Use Standards is to promote best practice for how datasets should be selected and used throughout the AI and digital health technology lifecycle, including recording how risks to relevant subgroups have been identified and mitigated. They are primarily for Data Users, as they require context and should be considered with a specific use case in mind.
Click the down arrow next to sections 2.1 to 2.4 below to view the standards.
2.1 - PROVIDE SUFFICIENT DATASET DOCUMENTATION
2.1a - Provide sufficient information about dataset(s) to allow traceability and auditability
Datasets used in the lifecycle of AI health technologies should be accompanied by documentation which conforms to Dataset Documentation Standards, enabling audit against these standards.
2.2 - EVALUATE IN THE CONTEXT OF RELEVANT SUBGROUPS
2.2a - Identify contextualised subgroups of interest who are particularly at risk of harm from the AI health technology under development
Data Users should identify contextualised subgroups of interest in advance: these are subgroups with shared attributes, identified as being relevant and important for the use case, and where they are known to have worse health outcomes or are subject to other systems driving health inequity related to the use case. Contextualised subgroups of interest may be discovered via multiple sources, including literature review, evidence from the development or use of similar AI health technologies, consultation with experts in health inequity, clinical practice, etc.
2.2b - Use appropriate datasets to support the intended use population, and intended purpose of the AI health technology
The intended use population should be adequately represented in the datasets used in an AI Health Technology. The contextualised subgroups of interest (see item 2.2a) should also be included where possible, and if not included this should be explicitly stated. Areas of under-representation should be identified and transparently reported.
2.2c - Identify discrepant performance of the AI health technology for contextualised subgroups of interest
Data Users should:
• Report performance of the AI Health Technology for contextualised subgroups of interest identified in 2.2a.
• Compare performance for contextualised subgroups of interest to aggregate performance in the overall study population.
• Report performance of the AI Health Technology for subgroup(s) who have the best pre-existing health outcomes in this clinical area, and compare this to performance for contextualised subgroups of interest.
2.2d - Evaluate performance of the AI health technology for subgroups experiencing vulnerability
If not already addressed by 2.2c, Data Users should report evaluation results across certain attributes (including age, gender identity, sex, race, ethnicity, socioeconomic status and sexual orientation), due to known associations with health outcomes and interactions with wider social factors. This may not always be possible or appropriate, in which case the reasons for not doing so should be documented.
2.3 - ACKNOWLEDGE KNOWN BIASES AND LIMITATIONS OF THE DATASET AND ANY IMPLICATIONS ON THE INTENDED USE OF THE AI HEALTH TECHNOLOGY
2.3a - Report limitations of datasets used, and any implications on the AI health technology
Data Users should report limitations of datasets used, and the implications of these on the target AI health technology. Data Users should investigate whether limitations are systematically different across relevant population subgroups, including those categorised as ‘unknown’ or ‘other’, and report differences which could result in worse performance on the AI Health Technology across groups.
2.3b - Report differences between the intended purposes of the AI health technology and datasets used during development, including the implications of discordance
Data users should report any intended purposes of datasets used (item 1.1c), and how these differ from the intended purpose of the AI Health Technology (item 2.2b). State implications of any discordance and provide justification regarding the suitability of the dataset, including assumptions made and aspects of the dataset which are not directly applicable.
2.3c - Report level of uncertainty for performance in subgroups when sample size is insufficient
Should sufficient sample size not be achieved in minority and/or intersectional subgroups, Data Users should report the level of uncertainty for performance in these subgroups (e.g. with confidence intervals). Where this may suggest additional risk, describe whether mitigation plans are in place to avoid harm to these groups.
2.3d - Report findings of pre-existing dataset assessments
Data Users should review any pre-existing assessments of the datasets which are available (e.g. algorithmic impact assessments1, equality impact assessments2, data protection impact assessments3, Datasheet4, Healthsheet5) and report how the findings may translate to harm for subgroups within the intended use population.
2.4 - ADDRESS UNCERTAINTIES AND RISKS WITH MITIGATION PLANS
2.4a - Address uncertainties and risks with mitigation plans
Where Data Users have identified uncertainty or potentially variable performance in subgroups, any clinical implications resulting from these findings must be clearly stated and reported as risks. The Data User should document plans to monitor these risks as part of the post-market clinical follow-up and post-market surveillance.
Proposed additional item in Dataset Use Standards
Early feedback, has suggested adding an additional item to the draft Dataset Use Standards. The suggested wording is below:
Report any statistical approaches (including ‘fairness methods/metrics’) used to intentionally modify performance across subgroups.
Data users should document any attempts during development and evaluation of the AI health technology, which attempt to make predictions more equitable across subgroups. Describe:
• The rationale and goals for doing so.
• The methods and metrics used.
• How thresholds were set, including whether these vary between subgroups of people.
Goves L, Brennan J, Peppin A, Strait A. Algorithmic impact assessment: a case study in healthcare [internet]. Ada Lovelace Institute, UK; 2022. Available from:
Pyper D. The Public Sector Equality Duty and Equality Impact Assessments. House of Commons Library, UK; 2020. Available from:
Data protection impact assessments [internet]. Information Commissioner’s Office, UK; 2022. Available from: https://ico.org.uk/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr/accountability-and-governance/data-protection-impact-assessments/
Gebru T, Morgenstern J, Vecchione B, et al. Datasheets for Datasets. arXiv; 2021. Available from: http://arxiv.org/abs/1803.09010
Rostamzadeh N, Mincu D, Roy S, et al. Healthsheet: Development of a Transparency Artifact for Health Datasets. arXiv:220213028 [cs]. 2022;(arXiv:2202.13028). Available from: http://arxiv.org/abs/2202.13028