KEY CONCEPTS:
- Only 1% of the human genome consists of coding regions.
- The exons comprise ~5% of each gene, so genes (exons plus introns) comprise ~25% of the genome.
- The human genome has 30,000-40,000 genes.
- ~60% of human genes are alternatively spliced.
- Up to 80% of the alternative splices change protein sequence, so the proteome has ~50,000-60,000 members.
The human genome was the first vertebrate genome to be
sequenced (Venter et al., 2001, International Human Genome Sequencing Consortium.,
2001). This massive task has revealed a wealth of information about the
genetic makeup of our species, and about the evolution of the genome in general.
(Methods used for genome sequencing are reviewed in 32.12 Genome mapping.) Our
understanding is deepened further by the ability to compare the human genome
sequence with the more recently sequenced mouse genome (Waterston et al., 2002).
Mammal and rodent genomes generally fall into a narrow size
range, ~ 3 × 109 bp (see 3.5 Why are genomes so large?). The
mouse genome is ~14% smaller than the human genome, probably because it has had
a higher rate of deletion. The genomes contain similar gene families and genes,
with most genes having an ortholog in the other genome, but with differences in
the number of members of a family, especially in those cases where the functions
are specific to the species (see 3.10 The conservation of genome organization helps to identify
genes). The estimate of 30,000 genes for the mouse genome is at the lower
end of the range of estimates for the human genome. Figure
3.20 plots the distribution of the mouse genes. The 30,000 protein-coding
genes are accompanied by ~4000 pseudogenes. There are ~800 genes representing
RNAs that do not code for proteins; these are generally small (aside from the
rRNAs). Almost half of these genes code for tRNAs, for which a large number of
pseudogenes also have been identified.
The human (haploid) genome contains 22 autosomes plus the X
or Y. The chromosomes range in size from 45-279 Mb of DNA, making a total genome
content of 3,286 Mb (~3.3 × 109 bp). On
the basis of chromosome structure, the overall genome can be divided into
regions of euchromatin (potentially containing active genes) and heterochromatin
(see 19.7 Chromatin is divided into
euchromatin and heterochromatin). The euchromatin comprises the majority of
the genome, ~2.9 × 109 bp. The
identified genome sequence represents ~90% of the euchromatin. In addition to
providing information on the genetic content of the genome, the sequence also
identifies features that may be of structural importance (see 19.8 Chromosomes have banding
patterns).
Figure 3.21 shows that a tiny proportion (~1%) of the
human genome is accounted for by the exons that actually code for proteins. The
introns that constitute the remaining sequences in the genes bring the total of
DNA concerned with producing proteins to ~25%. As shown in Figure 3.22, the average human gene is 27 kb long, with 9
exons that include a total coding sequence of 1,340 bp. The average coding
sequence is therefore only 5% of the length of the gene.
Based on comparisons with other species and with known
protein-coding genes, there are ~24,000 clearly identifiable genes. Sequence
analysis identifies ~12,000 more potential genes. Two independent analyses have
produced estimates of ~30,000 and ~40,000 genes, respectively (Venter et al., 2001, International Human Genome Sequencing Consortium.,
2001). One measure of the accuracy of the analyses is whether they
identify the same genes. The surprising answer is that the overlap between the
two sets of genes is only ~50%, as summarized in Figure
3.23 (Hogenesch et al., 2001). An earlier analysis of the
human gene set based on RNA transcripts had identified ~11,000 genes, almost all
of which are present in both the large human gene sets, and which account for
the major part of the overlap between them. So there is no question about the
authenticity of half of each human gene set, but we have yet to establish the
relationship between the other half of each set. The discrepancies illustrate
the pitfalls of large scale sequence analysis! As the sequence is analyzed
further (and as other genomes are sequenced with which it can be compared), the
number of valid genes seems to decline, and is now generally thought to be
~30,000.
By any measure, the total human gene number is much less
than we had expected —most previous estimates had been ~100,000. It shows a
relatively small increase over flies and worms (13,600 and 18,500,
respectively), not to mention the plant Arabidopsis (25,000) (see Figure 3.9). However, we should not be particularly surprised
by the notion that it does not take a great number of additional genes to make a
more complex organism. The difference in DNA sequences between man and
chimpanzee is extremely small (there is >99%
similarity), so it is clear that the functions and interactions between a
similar set of genes can produce very different results. The functions of
specific groups of genes may be especially important, because detailed
comparisons of orthologous genes in man and chimpanzee suggest that there has
been accelerated evolution of certain classes of genes, including some involved
in early development, olfaction, hearing —all functions that are relatively
specific for the species (Clark et al., 2003).
The number of genes is less than the number of potential
proteins because of alternative splicing. The extent of alternative splicing is
greater in Man than in fly or worms; it may affect as many as 60% of the genes,
so the increase in size of the human proteome relative to the other eukaryotes
may be larger than the increase in the number of genes. A sample of genes from
two chromosomes suggests that the proportion of the alternative splices that
actually result in changes in the protein sequence may be as high as 80%. This
could increase the size of the proteome to 50,000-60,000 members.
In terms of the diversity of the number of gene families,
however, the discrepancy between Man and the other eukaryotes may not be so
great. Many of the human genes belong to families. An analysis of ~25,000 genes
identified 3500 unique genes and 10,300 gene pairs. As can be seen from Figure 3.15, this extrapolates to a number of gene families
only slightly larger than worm or fly.
Hello colleagues, pleasant article and good urging commented here, I am
ReplyDeletetruly enjoying by these.
Here is my blog post ... siteweb
It's perfect time to make a few plans for the long run and it's time
ReplyDeleteto be happy. I have read this post and if I could I want to counsel you
some attention-grabbing things or suggestions. Perhaps you
could write subsequent articles relating to this article. I wish to read
more things about it!
Here is my website: perfilweb