Promoting high-quality research and informing policy
through free access to thousands of publications and datasets
Metadata Elements to Include in Your Data Submission
A list of the most important items to include in is presented below.
- Principal investigator name(s), and affiliation(s) at time of data collection
- Funding sources: names of funders, including grant numbers and related acknowledgments.
- Data collector/producer: persons or organizations responsible for data collection, and the date and location of data production.
- Project description: a description of the project, its intellectual goals, and how the data articulate with related datasets.
- Sample and sampling procedures: a description of the target population investigated and the methods used to sample it (assuming the entire population is not studied). The discussion of the sampling procedure should indicate whether standard errors based on simple random sampling are appropriate, or if more complex methods are required.
- Weighting: if weights are required, information on weight variables, how they were constructed, and how they should be used.
- Data source(s): if a dataset draws on resources other than surveys, citations to the original sources or documents from which data were obtained.
- Unit(s) of analysis/observation: a description of who or what is being studied.
- Variables: For each variable, the following information should be provided:
- The exact question wording or the exact meaning of the variable.
- The text of the question integrated into the variable text, if possible.
- Exact meaning of codes: the documentation should show the interpretation of the codes assigned to each variable.
- Missing data codes: codes assigned to represent data that are missing. Different types of missing data should have distinct codes.
- Unweighted frequency distribution or summary statistics: these distributions should show both valid and missing cases.
- Imputation and editing information: documentation should identify data that have been estimated or extensively edited.
- Details on constructed and weight variables: datasets often include variables constructed using other variables. Detailed information on the construction of weights should also be provided.
- Variable groupings: particularly for large datasets, it is useful to categorize variables into conceptual groupings.
- Related publications: citations to publications based on the data, by the principal investigators or others.
- Technical information on files: information on file formats, file linking, and similar information.
- Data collection instruments: copies of the original data collection forms and instruments. Other researchers often want to know the context in which a particular question was asked, and it is helpful to see the survey instrument as a whole. Copyrighted survey questions should be acknowledged with a citation so that users may access and give credit to the original survey and its author.