Genome Sequencing – Methods and Applications

Genome sequencing refers to the process of determining the order of the nucleotides bases— adenine, guanine, cytosine, and thymine in a molecule of DNA or the genome of an organism. The methods of sequencing have become a game-changer in modern biological and medical fields. DNA sequencing has accelerated not only biological research and discovery but also enhanced medical diagnostics and treatment of diseases. Information on exact sequences of nucleotides in DNA has aided in various applied fields of biology such as molecular and forensic biology, virology, medicine, recombinant DNA technology, biological systematics, and bioinformatics.

The earliest successful attempt of DNA sequencing dates back to the early 1970s. In 1973, Gilbert and Maxam reported the sequence of 24 base pairs using a method known as wandering-spot analysis. However, the first complete genome to be sequenced was that of bacteriophage ΦX174, which was succeeded by Sanger et. al. This first sequence of DNA to be known was obtained by methods based on 2-dimensional chromatography. Over the following decades, the development of dye-based sequencing methods with automated sequencing and analysis instruments, DNA sequencing has become much easier and faster with huge reductions in cost.

Methods of DNA sequencing

These are the different methods of DNA sequencing.

First Generation Sequencing

The first generation sequencing methods were the earliest sequencing technologies developed and are known as the basic methods of sequencing. There were particularly two significant sequencing techniques in the first generation.

1. Maxam-Gilbert Sequencing

This method was developed by Allan Maxam and Walter Gilbert in 1977 and is based on the chemical modification of DNA  and subsequent cleavage at specific bases. Thus, the method is also known as Chemical Cleavage Method. In the process, one end of the DNA fragment requires radioactive labeling. Chemical treatment is applied to create breaks at small proportions of one or two of the four nucleotides bases. This will create a series of fragments, each radiolabelled at one end. The next step is size separation by gel electrophoresis in which the fragments in the four reactions are arranged side by side. Then visualization of fragments is helped by autoradiography, from which the sequence may be inferred. The method is not in widespread use because of the development of advanced methods.

2. Sanger Sequencing

The method requires the ssDNA template, DNA primer, DNA polymerase, dNTPs, and ddNTPs. The ddNTPs may be radioactively/fluorescently labeled for detection in automated sequencing methods.︎ The DNA sample is divided into four separate sequencing reactions, containing all four of the standard deoxynucleotides (dATP, dGTP, dCTP, dTTP) and the DNA polymerase. Each reaction mix contains all the chemicals required but only one of the ddNTPs. When the primer, polymerase, and dNTPs are available, polymerase starts to extend the DNA︎. But, once ddNTP has been incorporated, the activity of DNA polymerase ceases, and the chain terminates. This is because ddNTPs lack 3’ OH group where the new incoming nucleotide would have bonded as a phosphodiester bond. Thus, the process is also known as the Chain Termination Method. Its main benefit is that it is a much simpler method of DNA sequencing and avoids the use of toxic chemicals than Maxam-Gilbert.

Next-Generation Sequencing

In the last few years, the first-gen sequencing methods have been supplemented by Next-Gen sequencing technologies, particularly for large-scale, automated genome analyses with enormous volumes of data cheaply and the development of advanced sequencers. Thus, they are also called as High Throughput Sequencing.

All technologies include a number of methods that are grouped broadly as template preparation, sequencing and imaging, and data analysis. A combination of unique specific protocols for different stages differentiates the technology and data produced.

1. Illumina/Solexa Sequencing

Illumina sequencing technology uses the principle of sequencing by synthesis. Sequencing templates are immobilized on a proprietary flow cell surface. Unlabeled nucleotides and enzymes are added to initiate solid-phase bridge amplification. The enzyme incorporates nucleotides to build double-stranded bridges on the solid-phase substrate. Denaturation leaves single-stranded templates anchored to the substrate. Several million dense clusters of double-stranded DNA are generated in each channel of the flow cell. The first sequencing cycle begins by adding four labeled reversible terminators, primers, and DNA polymerase. After laser excitation, the emitted fluorescence from each cluster is captured, and the first base is identified. The next cycle repeats the incorporation of four labeled reversible terminators, primers, and DNA polymerase. After laser excitation, the image is captured as before, and the identity of the second base is recorded. The sequencing cycles are repeated to determine the sequence of bases in a fragment, one base at a time. The data are aligned and compared to a reference, and sequencing differences are identified.

During each sequencing cycle, a single labeled dNTP is added to the nucleic acid chain. The nucleotide label serves as a terminator for polymerization, so after each dNTP incorporation, the fluorescent dye is imaged to identify the base and then enzymatically cleaved to allow incorporation of the next nucleotide.

Its optimum sequencing performance is 1000 Mb/run.

2. 454 Pyrosequencing

The basic principle of pyrosequencing is a single nucleotide addition or sequencing by synthesis. The process uses a bioluminescence method that measures the release of inorganic pyrophosphate by proportionally converting it into visible light using a series of enzymatic reactions. It manipulates DNA polymerase by the single addition of a dNTP in limiting amounts. Upon incorporation of the complementary dNTP, DNA polymerase extends the primer and pauses. DNA synthesis is re-initiated following the addition of the next complementary dNTP in the dispensing cycle.

In 454 pyrosequencing, the template bead is surrounded by/coated with sulfurylase and luciferase in picotiter well. Each picotiter well contains single clonally amplified template beads. Individual dNTPs are then streamed across the wells and dispensed in a predetermined sequential order. The bioluminescence is imaged with a charge-coupled device (CCD) camera. The order and intensity of the light peaks are recorded as flow grams, which reveal the underlying DNA sequence. Its sequencing performance is 400 Mb/run.

3. SOLiD Sequencing

SOLiD stands for Support Oligonucleotide Ligation Detection. It is based on the principle of sequencing by ligation. Like the 454 technology, the DNA template fragments are clonally amplified on beads. However, the beads are placed on the solid-phase of a flow cell, so greater density is achieved than in other approaches. In sequencing by ligation, a mixture of different fluorescently labeled dinucleotide probes is pumped into the flow cell. As the correct dinucleotide probe incorporates the template DNA, it is ligated onto the pre-built primer on the solid-phase. After the wash-out of the unincorporated probes, fluorescence is captured and recorded. Each fluorescence wavelength corresponds to a particular dinucleotide combination. Then the fluorescent dye is removed and washed, and the next sequencing cycle starts. Its sequencing performance is 2000 Mb/run.

Applications of Genome Sequencing

Here are some of the applications of genome sequencing.

1. Diagnostics and Medicine

DNA sequencing has elaborate applications in screening the risk of genetic diseases, gene therapy-based treatments, genetic engineering, and gene manipulation.

2. Evolutionary biology

The ability to sequence the whole genome of many related organisms has allowed large-scale comparative genomics, phylogenetic and evolutionary studies.

3. Forensic Science

DNA sequencing has widespread applications in DNA profiling, forensic sampling and identification, and paternity testing.

4. Metagenomics

Shotgun sequencing of complex communities of microorganisms, metagenome sequencing of environmental or human microbiomes, and environmental profiling.

5. Agriculture

Sequencing of microorganisms to engineer resistant genes in crops. Mapping and whole-genome sequencing of food plants to increase productivity and nutritional contents as well as environmental tolerance.

6. Molecular Biology

Study of genotypes, genes, and proteins; gene-based studies of cancers; construction of endonuclease maps; detection of mutations; construction of molecular evolution map, and transcriptome profiling.

In Conclusion

Apart from the sequencing methods mentioned above, many are available with certain modifications in protocols but the same basic principles. With possible applications in every field of life sciences, genome sequencing truly has a phenomenal impact on modern biotechnology.  Particularly the high-throughput sequencing technologies have become much beneficial in areas of molecular biological studies. With the advent of next-gen methods, sequencing has become uncomplicated in terms of process, data, time, and economy.

Sharing is caring!