Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1993 Aug 11;21(16):3829-38.
doi: 10.1093/nar/21.16.3829.

A quality control algorithm for DNA sequencing projects

Affiliations
Free PMC article

A quality control algorithm for DNA sequencing projects

O White et al. Nucleic Acids Res. .
Free PMC article

Abstract

Heterologous DNA sequences from rearrangements with the genomes of host cells, genomic fragments from hybrid cells, or impure tissue sources can threaten the purity of libraries that are derived from RNA or DNA. Hybridization methods can only detect contaminants from known or suspected heterologous sources, and whole library screening is technically very difficult. Detection of contaminating heterologous clones by sequence alignment is only possible when related sequences are present in a known database. We have developed a statistical test to identify heterologous sequences that is based on the differences in hexamer composition of DNA from different organisms. This test does not require that sequences similar to potential heterologous contaminants are present in the database, and can in principle detect contamination by previously unknown organisms. We have applied this test to the major public expressed sequence tag (EST) data sets to evaluate its utility as a quality control measure and a peer evaluation tool. There is detectable heterogeneity in most human and C.elegans EST data sets but it is not apparently associated with cross-species contamination. However, there is direct evidence for both yeast and bacterial sequence contamination in some public database sequences annotated as human. Results obtained with the hexamer test have been confirmed with similarity searches using sequences from the relevant data sets.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Microbiol Rev. 1987 Jun;51(2):221-71 - PubMed
    1. Nat Genet. 1992 May;1(2):114-23 - PubMed
    1. Nature. 1989 Nov 2;342(6245):45-50 - PubMed
    1. Methods Enzymol. 1990;183:237-52 - PubMed
    1. J Biomol Struct Dyn. 1990 Jun;7(6):1251-68 - PubMed

Publication types