|
Biotechnology and Society---Part
XVI
Sequencing the genes
Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning. -Winston Churchill (1874-1965), British statesman
Midway through the Second World
War, the then British prime minister spoke of the war efforts by the Allies with
the words written above. The same words are now applicable to the new biology,
viz., the human genome project (HGP). The year 2003 marked the end of the
beginning of the project. The entire human genome has been sequenced and the
data are available in a Genome Data Bank, thanks to the dedicated efforts of an
international team of scientists. The new discipline of genomics (for a
definition of certain terms used in this article please see ‘Definitions’ at
the end of the article) will change our understanding of the origin of the
diseases and enable us to predict, prevent, diagnose and treat or cure them. It
is a powerful tool in the drug discovery process.
History of HGP: HGP was
begun in 1990 as a cooperative venture between the National Institutes of Health
(NIH) and the Department of Energy (DOE) of the US government and culminated in
2003, two years ahead of schedule. The goals of the project were: identify some
30,000 genes in human DNA, sequence the 3 billion base-pairs, store the
information properly, improve the tools for data analysis (bioinformatics),
transfer the technology to private sector and address ethical, legal and social
issues that might arise. The budget was $3 billion and 16 institutions
participated in it. However, the bulk of the work was done by five institutions:
Washington University (St. Louis), Sanger Center (UK), Baylor College of
Medicine (Houston), Whitehead Institute and the DOE Joint Genome Institute
operated by University of California.
A working draft of the 24
chromosomes* in the human genome was published in June 2000 followed by an
analysis of the genes in February 2001. While the funding by the US government
took an international team of scientists, there was a simultaneous venture by a
private enterprise, Celera Genomics, which employed in-house scientists and
raced to finish the sequencing work independently of the public venture. The
announcement and publication of the sequence was done jointly by Celera Genomics
and the US government. The achievement is akin to the moon-landing venture of
the 1960s. However, unlike the waning of interest in the moon due to its
lifelessness, we have immense interest in the genome capped by the development
of new medical applications.
All diseases have a genetic
component, either inherited or from the body’s responses to environmental
stresses such as viruses or toxic substances. The completion of HGP enables us
to identify errors that contribute to the diseases. Once the identification is
done correctly, we can treat, cure, or even prevent such diseases in future.
Rational drug design based on gene identification is the way to develop
therapeutics. Rational drug design can also progress to pharmacogenomics to
tailor the therapeutic to different patient populations. Currently, several
diagnostic tests are available to indicate defective (mutated) or missing genes
which cause diseases. Gene testing can progress to gene therapy to replace
missing or defective genes as a remediation measure. Gene therapy is currently
in a nascent state and is expected to advance towards a viable tool in the
future.
HGP Timeline: The HGP
was done not just to sequence the human genome only. There were also other
genomes sequenced along the way, such as bacteria, yeast, worms, flies and mice.
We indicated in a previous article that the worm C. elegans has a gene which
induces cooperative behaviour in them while a single mutation in that gene
causes them to be anti-social. Understanding this and other gene characteristics
among species other than humans is important in gaining knowledge that is
crucial to understanding overall genetic function.
The gene sequences for
Mycoplasma genitalium (smallest bacterium in terms of the size of the genome),
and Haemophilus influenzae were completed in 1995. In 1996 the sequence of
Saccharomyces cerevisiae (the common yeast) was published, followed by that of
Escherichia coli (the bacteria in the gut) in 1997. The year 1998 saw the
completion of the gene sequences of C. elegans (a tiny worm), and that of
Mycobacterium tuberculosis (an organism which causes tuberculosis as the name
implies).
The complete sequence of
chromosome No 22 was published in 1999. The year 2000 witnessed the completion
of the gene sequence of Drosophila melanogaster (the fruit fly) as well as the
working draft of the entire human genome. Several other chromosomes were
sequenced in the ensuing period. With the determination of the remaining
chromosomes in 2003, the HGP was completed in April 2003. In addition to the
sequencing work, new technologies for synthesis of genes, separation of DNA and
computational methods were developed as part of the project.
The human genome has 3.12
billion base pairs (~30,000 genes). It is an enormous number. But let us take
note that the genomes of certain plants like trumpet lily (90 billion base
pairs), and corn (5 billion base pairs and 50,000 genes), certain amphibians,
some fish like marbled lungfish (139 billion base pairs), the warty newt (18.6
billion base pairs), and salamander (50 billion base pairs) are bigger (be not
proud, man!) than the human genome (see figure), although most of the DNA in all
the species is without any function (called junk DNA) but just a carryover. Who
said Nature is efficient? Humans use only 2 or 3 times as many genes (~30,000)
as the simple worm and 5-10 times as the simplest microbe. The pinnacle of the
evolutionary pyramid must clearly be due to a complicated architecture of the
regulatory network of genes in the human genome.
*The human haploid genome
contains 23 chromosomes, of which 22 are autosomes common for both female and
male. The two sex chromosomes are called X and Y. The egg cell contains only the
X chromosome while the sperm cell contains either X or Y. Every somatic cell
(other than germ cells) in the human body contains 46 chromosomes (22 autosomes
from each of the parents and a pair of either X-X (female) or X-Y (male). Since
there are two sets of the 22 chromosomes, it is enough to sequence only one set
of 22 chromosomes, and the two sex chromosomes, X, and Y, amounting to a total
of 24 chromosomes to be sequenced.
Definitions:
Autosome: Any chromosome
other than sex chromosome. Gene: An ordered sequence of nucleotides located in a
particular position on a particular chromosome that encodes specific functional
product such as an enzyme, protein, or RNA molecule.
Gene sequencing:
Determination of the order in which the nucleotides are strung together in a
gene. Genome: All the genetic material in the chromosomes of a particular
organism. The size is given as its total number of base pairs (nucleotides).
Genomics: The study of
genes and their functions. Genotype: The genetic characteristics or description
of an organism defined by the nucleotide sequence of the genome.
Haploid: A cell with
half the usual number of chromosomes or only one chromosome set. In the humans
it would be 23 chromosomes. Sperm cells and egg cells are haploid.
Pharmacogenomics:
The science of understanding the correlation between an individual
patient’s genetic make-up (genotype) and his/her response to drug
treatment. The efficacy of a given drug could vary between different
patient populations depending on the genotype.
|