The human genome is the genome of Homo sapiens, which is stored on 23 chromosome pairs. Twenty-two of these are autosomal chromosome pairs, while the remaining pair is sex-determining. The haploid human genome occupies a total of just over 3 billion DNA base pairs. The haploid human genome contains ca. 23,000 protein-coding genes, far fewer than had been expected before its sequencing. In fact, only about 1.5% of the genome codes for proteins, while the rest consists of non-coding RNA genes, regulatory sequences, introns, and (controversially named) "junk" DNA
The Human Genome Project (HGP) produced a reference sequence of the euchromatic human genome, which is used worldwide in biomedical sciences.
Surprisingly, the number of human genes seems to be less than a factor of two greater than that of many much simpler organisms, such as the roundworm and the fruit fly. However, human cells make extensive use of alternative splicing to produce several different proteins from a single gene, and the human proteome is thought to be much larger than those of the aforementioned organisms. Besides, most human genes have multiple exons, and human introns are frequently much longer than the flanking exon.
Human genes are distributed unevenly across the chromosomes. Each chromosome contains various gene-rich and gene-poor regions, which seem to be correlated with chromosome bands and GC-content. The significance of these nonrandom patterns of gene density is not well understood. In addition to protein coding genes, the human genome contains thousands of RNA genes, including tRNA, ribosomal RNA, microRNA, and other non-coding RNA genes.
The human genome has many different regulatory sequences which are crucial to controlling gene expression. These are typically short sequences that appear near or within genes. A systematic understanding of these regulatory sequences and how they together act as a gene regulatory network is only beginning to emerge from computational, high-throughput expression and comparative genomics studies. Some types of non-coding DNA are genetic "switches" that do not encode proteins, but do regulate when and where genes are expressed.
Identification of regulatory sequences relies in part on evolutionary conservation. The evolutionary branch between the human and mouse, for example, occurred 70–90 million years ago.
So computer comparisons of gene sequences that identify conserved non-coding sequences will be an indication of their importance in duties such as gene regulation.
Another comparative genomic approach to locating regulatory sequences in humans is the gene sequencing of the puffer fish. These vertebrates have essentially the same genes and regulatory gene sequences as humans, but with only one-eighth the "junk" DNA. The compact DNA sequence of the puffer fish makes it much easier to locate the regulatory genes.
ory gene sequences as humans, but with only one-eighth the "junk" DNA. The compact DNA sequence of the puffer fish makes it much easier to locate the regulatory genes
Protein-coding sequences (specifically, coding exons) comprise less than 1.5% of the human genome. Aside from genes and known regulatory sequences, the human genome contains vast regions of DNA the function of which, if any, remains unknown. These regions in fact comprise the vast majority, by some estimates 97%, of the human genome size. Much of this is composed of:
Tandem repeat: Tandem repeats occur in DNA when a pattern of two or more nucleotides is repeated and the repetitions are directly adjacent to each other.An example would be:
in which the sequence A-T-T-C-G is repeated three times.
Interspersed repetitive DNA is found in all eukaryotic genomes. Certain classes of these sequences propagate themselves by RNA mediated transposition, and they have been called retrotransposons.
The major difference of class II transposons from retrotransposons is that their transposition mechanism does not involve an RNA intermediate. Class II transposons usually move by a mechanism analogous to cut and paste, rather than copy and paste, using the transposase enzyme. Different types of transposase work in different ways. Some can bind to any part of the DNA molecule, and the target site can therefore be anywhere, while others bind to specific sequences. Transposase makes a staggered cut at the target site producing sticky ends, cuts out the transposon and ligates it into the target site.
However, there is also a large amount of sequence that does not fall under any known classification. Much of this sequence may be an evolutionary artifact that serves no present-day purpose, and these regions are sometimes collectively referred to as "junk" DNA. There are, however, a variety of emerging indications that many sequences within are likely to function in ways that are not fully understood. Recent experiments using microarrays have revealed that a substantial fraction of non-genic DNA is in fact transcribed into RNA which leads to the possibility that the resulting transcripts may have some unknown function. Also, the evolutionary conservation across the mammalian genomes of much more sequence than can be explained by protein-coding regions indicates that many, and perhaps most, functional elements in the genome remain unknown. The investigation of the vast quantity of sequence information in the human genome whose function remains unknown is currently a major avenue of scientific inquiry.
Author: Dr. Sujata Roy
Rajalakshmi Engineering College, Chennai