About the Project

The Opportunity of Artificial Intelligence (AI) in Healthcare

The power of AI lies in its ability to learn patterns from large amounts of data, in a way that exceeds human abilities. However, this also means the reliability of AI algorithms is closely linked to the data it is trained upon, and may perform poorly when confronted by new data examples – a failure of ‘AI generalisability’. To be sure that algorithms work for everybody, we need to test them on datasets that represent the diverse range of people it is intended to be used in.

A thought bubble containing dots of different skin tones, with arrows connecting the dots

Dots of different skin tones, with some arrows connecting the dots, but some arrows unable to connect to a dot because there are empty spaces where a dot should be

The Problem

There is concern that many health datasets do not adequately represent minoritised ethnic groups. The extent of the problem is not yet known because many datasets do not provide detailed demographic information. This problem has arisen partly because the creators of large datasets for AI often prioritise quantity of data over quality, inclusivity or fairness. To address this, gathering of health data needs to be designed with inclusivity and diversity in mind. We need standards to guide how AI datasets should be composed (‘who’ is represented in the data) and transparency around the data composition (‘how’ they are represented).

Addressing the Problem Together

This project will develop standards that ensure datasets for training and testing AI systems are diverse, inclusive, and promote AI generalisability.

Patients, public, health professionals, researchers, ethicists and policy-makers will work together to agree what the essential criteria for datasets should be: both who is represented (dataset composition) and how this information is provided (dataset reporting). We will develop new recommendations for AI datasets, to help gatekeepers (regulators, commissioners, policy-makers and health data institutions) assess whether datasets and the algorithms developed by them are suitable for the target population. This means we will have better datasets for development and testing of AI and, and in the long-term, better health outcomes for all, and in particular minoritised populations.

By getting the data foundation right, STANDING Together ensures that 'no-one is left behind' as we seek to unlock the benefits of AI in healthcare.