1.1a - Dataset summary
Dataset curators should provide documentation for datasets they produce. This documentation should include a description of the contents, source and purpose of the dataset, and should be written in accessible language. The summary should help data users assess whether the dataset meets their needs.
1.1b - Dataset identity and access
Dataset documentation should include: dataset name, accessibility, date of release, version, licensing arrangements, and details of the data custodian(s). Where possible this documentation should adhere to FAIR principles.1
1.1c - Motivations for dataset creation and intended purpose(s)
Dataset documentation should include the reasons why this dataset was created, including any intended benefit(s), any purposes for which dataset use should be avoided, who created the dataset, and who funded it.
1.1d - Assumptions and preconceptions of the dataset curation team
Dataset documentation should describe how the curation team has considered the impact of their prior assumptions and preconceptions on biases in the dataset. This may include reflecting on the experiences of the dataset curators themselves, as well as any advice from governing and consultation groups (e.g. advisory boards, patient and public involvement and engagement groups).
1.1e - Origin and purpose of source data
Dataset documentation should describe the original source of data (e.g., patient records to provide clinical care, clinical trial, biobank) and what individuals were expecting to happen to their data (e.g., administrative action, participant in a research study).
1.1f - Data sampling, and aggregation from multiple sources
Dataset documentation should describe how data were sampled from the original data source, including an explanation of sampling strategies and their rationale. If the dataset has been compiled from multiple data sources, dataset documentation should describe how datasets were selected, and how decisions were made during data aggregation, particularly in the case of grouping populations and modification of demographic coding.
1.1g - Data shifts
For longitudinal datasets or datasets with versions, dataset documentation should describe any known or suspected changes over time relating to the population, medical practice, or how data were collected, which may contribute to data shifts.