Inside the code: our DNA contains layers of "extra" information that constrains the direction evolution can take

Natural History, March, 2008 by Olivia Judson

AUGCUGCUGCUGGCG ... So begins the human version of the genetic instruction to make "sonic hedgehog," a protein essential for an embryo to grow properly. But how does that instruction work? The basic answer is that it codes for a string of amino acids--the small molecules that, linked together in long chains, form proteins. But there's a more complex answer, too. The instructions, it turns out, are more specific: they spell out how fast a protein should be made, which of several possible shapes it will take, and even which bits of a protein will be needed in a particular cell. Over the past several years, it's become clear that this "extra" information is extensive and important, both in terms of the role it can play in causing diseases, and for biologists' broader understanding of evolutionary processes.

[ILLUSTRATION OMITTED]

To understand where all the extra information comes from and why it is important, it helps to know exactly how the machinery of the cell translates the different genes in a length of DNA into their corresponding proteins. So let's take a moment to brush up on the workings of what I consider to be one of the seven wonders of the natural world: the genetic code. Trust me, this will come in handy.

The genetic code reached its present form early in the history of life, sometime before the evolution of the last universal common ancestor--the being from which all life forms on Earth today are descended. As a result, almost all living organisms use the same genetic code to translate their genes into proteins. (Among the few exceptions are the yeast Candida albicans, famous for causing infections in humans, and that staple of Bio 101, the single-celled, lozenge-shaped Paramecium; but such exceptions, without exception, use codes that are plainly derived from the regular code.) The near universality of the genetic code is the reason genetic engineering is possible. Put a gene from a jellyfish into a rabbit and the rabbit will begin to make the jellyfish protein, because jellyfish and rabbits--along with just about everything else--read their genes the same way.

Making a protein from a gene is a two-step process. Genes are written in DNA, but before they can be used to manufacture proteins, they have to be copied, or "transcribed," into a related molecule known as RNA. The cellular machinery takes that RNA template, known as messenger RNA (mRNA), and "translates" it into an actual protein. (The message given at the start for the sonic hedgehog protein is, accordingly, presented above in its RNA form.)

The differences between RNA and DNA are small but significant. Whereas DNA usually exists as a stable double-stranded molecule--the iconic double helix--RNA usually exists as a single strand, is relatively unstable, and has a tendency to knot up on itself. In DNA, the struts of the double helix are made of pairs of flat molecules known as bases. There are four bases in DNA: adenine and thymine, cytosine and guanine, which are usually referred to by the shorthand of A, T, C, and G. In RNA, the bases occur not in pairs but singly, and thymine is replaced by uracil--hence the presence of the letter U in the message above.

The way the RNA code works is simple. The message is read in groups of three bases, called codons; each codon specifies either an amino acid, or a "stop"--a punctuation mark that says, "the gene ends here." Thus, the beginning of the sonic hedgehog sequence specifies a chain of five amino acids--methionine (AUG), three leucines (the three CUGs), then alanine (GCG).The codons are read by small RNA molecules, called "transfer RNAs" (tRNAs). The cell deploys an array of tRNAs to carry the various types of amino acids. A tRNA molecule is loaded up with a particular amino acid at one end; at the other, it has a three-base sequence, the anticodon, that complements a given codon. By means of its anticodon, each tRNA molecule attaches to an appropriate place on the long mRNA molecule. The amino acid it carries is thus dragged into the correct position to add to the growing protein.

So much for the review. Now let's look at the source of all that extra information. The RNA code has two striking features. First, it has a lot of redundancy: most amino acids can be specified by more than one codon. Both UUU and UUC specify the amino acid phenylalanine, for example [see table above]. The reason for the redundancy is that there are sixty-four possible codons--each of the four bases can occur in any of the three positions--but only twenty-one entities (twenty amino acids and "stop") to be assigned to them. Second, the redundancy has a pattern: codons that correspond to a particular amino acid tend to be related. For instance, all four of the codons that start with AC specify threonine. That means a cell doesn't need sixty-four different tRNAs: for some amino acids the tRNAs are quite generic, matching only the first two letters of the codon. This redundancy has an important consequence for the impact that mutations--random changes to DNA--have on the amino acid composition of proteins.


 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
Click Here
advertisement
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with Thompson Gale