A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability

Dr Saad Khan

The availability of health datasets has accelerated digital health research. Ophthalmology has been one of the leading areas of innovation, where several public datasets for ophthalmic imaging have been use in machine learning research. Datasets are a critical component for machine learning algorithm development, hence these need careful scrutiny prior to use. Prior to our review, it was previously unknown how many ophthalmic datasets existed, their degree of accessibility and what comprised them. Therefore, we undertook a global review of all publicly available ophthalmic imaging datasets, to create a central directory, detail their accessibility, describe which diseases and populations are represented, and report on the completeness of associated metadata.

What did we find?

  • 94 “open access” datasets, containing 507,724 images and 125 videos from at least 122,364 subjects

  • Most datasets originated from USA and China

  • Commonest disease represented was diabetic eye disease (35 datasets) and commonest imaging type was fundus photographs (54 datasets)

  • Clinical characteristics such as patient demographics were missing in >80% of datasets

  • We found no ophthalmic public datasets from 172 countries equating to nearly 45% of the global population

Why do these findings matter?

  • Our review provides greater visibility of a diverse collection of publicly available ophthalmic datasets

  • We highlight the unequal representation of diseases and poor reporting of accompanying clinical metadata across datasets

  • We identify the large proportion of the global population that is not represented across these datasets. This health data poverty results in a digital health divide, which exacerbates health inequalities as certain population groups would not be able to benefit from digital advances in healthcare

How can we address the gaps?

  • Update and expand existing datasets

  • Future datasets cover underrepresented eye diseases and populations with adequate reporting of metadata

  • Encourage underrepresented groups to participate in research

  • Increased data sharing and collaborative effort between institutions and countries.

Bar chart showing number of datasets associated with publication date. World map showing geographical distribution of datasets.
Bar chart showing diseases represented by datasets. Bar chart showing imaging modalities represented across datasets.

Figure 2 from the paper: Information associated with the publication date (A), geographical distribution (B), represented diseases (C), and image types (D) of the study datasets

Read the paper here

Source: Khan S, Liu X, Nath S, Korot E, Faes L, Wagner S et al. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability. The Lancet Digital Health. 2021;3(1):e51-e66.