NIH Genomic Data Sharing

On August 27, 2014, the National Institutes of Health (NIH) published its final Genomic Data Sharing (GDS) Policy. This policy is designed to promote sharing, for research purposes, of large-scale human and non-human genomic data generated from NIH-funded research.

The GDS Policy applies to all competing NIH grant applications and proposals for NIH contracts submitted for the January 25, 2015 deadline and thereafter if the proposed research will generate large-scale human or non-human genomic data or will use these data for subsequent research. In such cases, the GDS Policy applies regardless of the funding level.

The Policy also will affect research performance progress reports for large-scale human or non-human genomic NIH funded studies awarded prior to the Policy’s effective date. Investigators will be expected to provide an updated genomic data sharing plan to the NIH funding Institute or Center (IC) when the research performance progress report is submitted.

Research Affected by the GDS Policy

Examples of large-scale genomic research projects that are subject to the GDS Policy are available from NIH: Does the Genomic Data Sharing Policy Apply to My Research?.

The GDS Policy does not apply to:

  • Institutional training grants (T32s, T34s, T35s, and TL2s);
  • K12 career development awards (KL2s);
  • Individual fellowships (Fs);
  • Resource grants and contracts (Ss);
  • Linked awards derived from previously reviewed applications (KL1, KL2, RL1, RL2, RL5, RL9, TL1, UL1);
  • Facilities or coordinating centers funded through related initiatives to provide genotyping, sequencing, or other core services in support of Genome Data Sharing; and
  • Smaller studies (e.g., sequencing the genomes of fewer than 100 human research participants) are generally not subject to this Policy.

Application Stage

Unless the Funding Opportunity Announcement states otherwise, applicants preparing NIH grant applications are expected to:

  • Contact the appropriate NIH Institute or Center (IC) Program Official or Project Officer as early as possible to discuss Genome Data Sharing expectations and timelines that would apply to their proposed research.
  • State in the cover letter with the application that the research proposed will generate large-scale human and/or non-human genomic data.
  • Select “NIH Genomic Data” under the Special Review tab in Phoebe.
  • Under the Questions tab in Phoebe answer “Yes” to the following questions:
    • Will this project involve large-scale human or non-human genomic data?
    • Will large-scale human or non-human genomic data be generated/used from NIH-funded research?
  • Include a Genomic Data Sharing plan in the Resource Sharing Plan section of the funding application or proposal. (A more detailed genomic data sharing plan will need to be provided to the funding IC prior to award.)
  • Outline in the budget section of their funding application the resources they will need to prepare the data for submission to appropriate repositories. NIH will provide additional guidance on these resources, as necessary.

Note: In situations in which the sharing of human data is not possible, applicants should provide a justification explaining why they cannot share these data and provide an alternative data sharing plan. Exceptions to NIH expectations for data submission to an NIH-designated data repository will be considered on a case-by-case basis by the NIH.

After the proposal application has been submitted, the principal investigator (PI) should contact the Office for Protection of Human Subjects (OPHS) to discuss next steps in ensuring that s/he has an approved protocol that encompasses the Data Sharing Plan as submitted and that Informed Consent documents permit the resulting data to be shared as described in the submitted plan. If the PI does not have an existing protocol that covers this proposed work, s/he should work with the OPHS staff to have a new protocol or amendment in preparation to enable quick completion, submission and subsequent CPHS review and approval when the institution and PI are notified that the proposal is at the Just-in-Time (JIT) stage.

Applicants who wish to use controlled-access human genomic data from NIH-designated data repositories (e.g., dbGaP) as a secondary user to achieve the specific aim(s) of the research proposed in the grant application should briefly address their plans for requesting access to the data and state their intention to abide by the NIH Genomic Data User Code of Conduct in the Research Plan of the application.

Note: Researchers should be aware that access to these data is dependent on an approval process that involves the relevant NIH Data Access Committee(s). Researchers may wish to secure access to the data prior to submitting their application for NIH support. Secondary users of controlled-access data are not expected to deposit their findings into NIH-designated data repositories, unless appropriate.

Research Funded Before the Effective Date of the GDS Policy

Although the GDS Policy does not apply to research submitted prior to the Policy’s effective date, NIH, nonetheless, strongly encourages investigators to comply with the expectations outlined in the Policy. Investigators should provide an updated genomic data sharing plan to the funding IC in the submission of the research performance progress report. For studies involving human participants that were initiated before the Policy’s effective date and used consents that do not meet the expectations of the GDS Policy, investigators are expected to plan to transition to a consent for future research uses and broad sharing, if possible, particularly for new or additional collections of specimens. There will be reasonable accommodation, determined on a case-by-case basis by the funding IC, for long-term projects ongoing at the time of the Policy’s effective date to come into alignment with NIH’s expectations for consent and data sharing. The goal is to bring these projects into alignment, to the extent possible, in a reasonable timeframe.

Determining if the GDS Policy Applies

Investigators with questions about whether the GDS Policy applies to their current or proposed research should consult the relevant Program Official or Program Officer or the IC’s Genomic Program Administrator (GPA). Names and contact information for GPAs are available through the NIH GDS website.

Peer Review Stage

During peer review, reviewers will be asked to comment on the genomic data sharing plan but will not factor the plan into the Overall Impact score, unless specified in the Funding Opportunity Announcement. After initial peer review, NIH Program Officers may accept the plan as provided, recommend changes to the applicant, or request additional information.

Just-in-Time/Award Stage

Following initial peer review and prior to award of a project that that will involve the collection and sharing of human data, potential grantee institutions will be asked to submit an Institutional Certification Form through the standard Just-in-Time process. The responsible Institutional Signing Official in SPO will sign and provide the Institutional Certification to the funding IC after the PI provides SPO with the (CPHS-approved) protocol number OR an explanation as to why the PI does not have a protocol number.

If the PI does not have a CPHS-approved protocol number, SPO will complete and submit a Provisional Institutional Certification Form. The PI will then have to obtain CPHS review and approval of the protocol.

Note: When an investigator is only using anonymized cells/data with no identifiers or codes (i.e., does not involve human subjects), CPHS will make a “not human subjects research” determination and inform the PI and SPO that an Institutional Certification does not need to be provided to NIH.

For information on the preparation of Informed Consent materials for new studies involving human genomic research as well as studies initiated prior to the effective date of the NIH GDS Policy, see the Committee for Protection of Human Subjects (CPHS) page on Informed Consent.

Post-Award Stage

Compliance with the GDS policy will be included as a special term and condition in the Notice of Award or the Contract Award. PIs should address compliance with genomic data sharing plans with their IC scientific leadership prior to initiating applicable research. PIs also are encouraged to contact their IC leadership or the Office of Intramural Research for guidance.

The funding NIH IC will typically review compliance with genomic data sharing plans at the time of annual progress reports or other appropriate scientific project reviews, or at other times, depending on the reporting requirements specified by the IC for specific programs or projects.

To ensure that the penalties for the misuse of data are clear for all data submitters, users, and research participants, the GDS Policy has been revised to clarify those secondary users in violation of the Policy or the Data Use Certification may face enforcement actions.

Sharing Non-Human Genomic Data

Large-scale non-human genomic data, including data from microbes, microbiomes, and model organisms, as well as relevant associated data (e.g., phenotype and exposure data), are to be shared in a timely manner. Genomic data undergo different levels of data processing, which provides the basis for NIH’s expectations for data submission. These expectations are provided in the Supplemental Information to the GDS Policy.

Non-human data may be made available through any widely used data repository, whether NIH-funded or not. NIH expects investigators to continue submitting data types to the same repositories that they submitted the data to before the effective date of the GDS Policy. Data types not previously submitted to any repositories may be submitted to these or other widely used repositories as agreed to by the funding IC.

Sharing Human Genomic Data

Respect for, and protection of the interests of, research participants are fundamental to NIH’s stewardship of human genomic data. Before submitting genomic data to a data repository, the PI and an the Authorized Institutional Official (in SPO) must sign and submit an Institutional Certification indicating that the data being submitted meets the expectations of the GDS Policy. Data use limitations and whether the aggregate-level data provided would appropriate for general research use also must be described in the Institutional Certification. NIH provides single-site and multicenter Institutional Certification Forms.

Investigators should de-identify human genomic data that they submit to NIH-designated data repositories according to the standards set forth in the HHS Regulations for the Protection of Human Subjects to ensure that the identities of research subjects cannot be readily ascertained with the data. Investigators should also strip the data of identifiers according to the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. The de-identified data should be assigned random, unique codes by the investigator, and the key to other study identifiers held by the submitting institution.

Although the data in the NIH database of Genotypes and Phenotypes (dbGaP) are de-identified by both the HHS Regulations for Protection of Human Subjects and HIPAA Privacy Rule standards, NIH has obtained a Certificate of Confidentiality for dbGaP as an additional precaution because genomic data can be re-identified. NIH encourages investigators and institutions submitting large-scale human genomic datasets to NIH-designated data repositories to seek a Certificate of Confidentiality as an additional safeguard to prevent compelled disclosure of any personally identifiable information they may hold.

Investigators should submit large-scale human genomic data as well as relevant associated data (e.g., phenotype and exposure data) to an NIH-designated data repository in a timely manner. Investigators should also submit any information necessary to interpret the submitted genomic data, such as study protocols, data instruments, and survey tools.

Genomic data undergo different levels of data processing, which provides the basis for NIH’s expectations for data submission and timelines for the release of the data for access by investigators. These expectations and timelines are provided in the Supplemental Information to the GDS Policy.

Requests for Controlled-Access Data

Controlled-access data in NIH-designated data repositories are made available for secondary research only after investigators have obtained approval from NIH to use the requested data for a particular project. Data in unrestricted-access repositories are publicly available to anyone.

To obtain access to these data, investigators must complete a Data Use Certification as well as a Data Access Request; co-signed by the investigators requesting the data and their Institutional Signing Official. UC Berkeley PIs should contact the Industry Alliances Office (IAO) for assistance with the dbGaP Data Use Certificate and the Data Access Request.

This certification specifies the conditions for the secondary research use of controlled-access data. NIH provides a model and modifiable Data Use Certificate.

By signing the Data Use Certificate, investigators are agreeing to:

  • The terms of access and user responsibilities specified in the Data Use Certificate;
  • Public posting of the user’s research use statement;
  • Not to identify or contact individual participants without specific IRB approval;
  • Retain control of the data and not to distribute the data to any entity or individual not covered in the Data Access Request;
  • Handle the requested dataset(s) according to the current dbGaP Security Best Practices;
  • Keep the data secure and confidential at all times;
  • Adhere to information technology practices in all aspects of data management to assure that only authorized individuals can gain access to NIH genomic datasets; and
  • Notify the appropriate Data Access Committee (DAC) of any unauthorized data sharing, breaches of data security, violations in the presentation and publication embargo period, or inadvertent data releases that may compromise data confidentiality within 24 hours of when the incident is identified.

Requests for controlled-access data are reviewed by NIH Data Access Committees (DACs). DACs will accept requests for proposed research uses beginning one month prior to the anticipated data release date. DAC decisions are based primarily upon conformance of the proposed research as described in the access request to the data use limitations established by the submitting institution’s Institutional Certification.

The access period for all controlled-access data is one year; at the end of each approved period, data users can request an additional year of access or close out the project. Although data are de-identified, approved users of controlled-access data are encouraged to consider whether a Certificate of Confidentiality could serve as an additional safeguard to prevent compelled disclosure of any genomic data they may hold. Investigators approved to download controlled-access data from NIH-designated data repositories and their institutions are expected to abide by the NIH Genomic Data User Code of Conduct through their agreement to the Data Use Certification.

Conditions for Use of Unrestricted-Access Data

Investigators who download unrestricted-access data from NIH-designated data repositories should not attempt to identify individual human research participants from whom the data were obtained. Investigators also are required to acknowledge in all oral or written presentations, disclosures, or publications the specific dataset(s) or applicable accession number(s) and the NIH-designated data repositories through which the investigator accessed any data.

Intellectual Property

NIH encourages patenting of technology suitable for subsequent private investment that may lead to the development of products that address public needs without impeding research. However, it is important to note that naturally occurring DNA sequences are not patentable in the United States. Therefore, basic sequence data and certain related information (e.g., genotypes, haplotypes, p-values, allele frequencies) are pre-competitive. Such data made available through NIH-designated data repositories, and all conclusions derived directly from them, should remain freely available, without any licensing requirements.

NIH encourages broad use of NIH-funded genomic data that is consistent with a responsible approach to management of intellectual property derived from downstream discoveries, as outlined in the NIH Best Practices for the Licensing of Genomic Inventions and Section 8.2.3, Sharing Research Resources, of the NIH Grants Policy Statement. NIH discourages the use of patents to prevent the use of or to block access to genomic or genotype-phenotype data developed with NIH support.