Understanding Nature's Machines
Proteins have an enormous role in creating and sustaining cellular function, and through evolution occur in functions that mimic almost every aspiration of nanotechnology. Like the miniature robots of our future, they scour our bodies for imperfections, cut and past themselves to create greater structures, modify incoming sensoria into useful information, and regulate themselves to suit the conditions of the task.

by Niles Donegan

    Proteins have an enormous role in creating and sustaining cellular function, and through evolution occur in functions that mimic almost every aspiration of nanotechnology. Like the miniature robots of our future, they scour our bodies for imperfections, cut and paste themselves to create greater structures, modify incoming sensoria into useful information, and regulate themselves to suit the conditions of the task. Unlike the nanotech machines that are just beginning to be sketched out, we have already assembled within us a powerful contingent of pre-existing molecules well tailored in their jobs. Yet understanding these molecules' construction is as great a challenge as a nanomachine's construction, for the multitude of subtle protein designs have been improved and retooled by eons of natural selection, and scientists are only now beginning the enormous task of comprehending these blueprints.

    By understanding how a protein is created, scientists hope to gain knowledge on how modifying known sub-structures might change specific qualities of a protein. Altered properties might result in affecting the efficiency of a reaction or in facilitating the transport of a molecule. A combination of these adjustments might then allow entirely new shapes and functions of proteins to be obtained. These results ultimately come through an understanding of the shape and interactions of a protein's components. It provides a view of the protein's workings, showing how certain spatial arrangements may react with each other. It also gives hints what sites in a protein are the best for modification. However, determining what to modify is very difficult and even though today much of protein modification is educated guesswork, these guesses contribute to a mosaic of information on how exactly proteins work.

    The background to these guesses starts with an understanding the lowest level of protein structure, the amino acids. Every protein is composed of long chains of these molecules and has the same backbone structure of carbon, nitrogen, oxygen, and hydrogen. Various side chains give each amino acid certain qualities of size, electric charge, polarity, and affinity for water. These chains may vary from a single hydrogen to entire carbon rings, and the interactions of these different groups on a similar backbone give a protein its specific function.

    In the cell, proteins are assembled from the information-coding messenger RNA with a piece of cellular machinery called the ribosome, a group of RNAs and proteins that help translate information from the nucleotides of RNA to proteins. Creation of a poly-amino acid chain occurs when two amino acids are brought close to each other in the ribosome and the electrons from carboxylic acid end of one amino acid are attracted to the positive nitrogen of the amine group of a second. This process repeats many times, and as the amino acids are attached and the chain lengthens, it wraps around itself into a shape specific to its amino acid sequence.

A Gentle Balance
    The chain is held in this bunched-up form by a combination of forces. Primarily, it occurs as a result of an aversion to mixing polar and apolar molecules, equivalent to the separation of oil and water. This causes the packing of oily (or hydrophobic) side chains as far away from polar water as possible, leaving the remaining polar and neutral groups exposed to water. This bundles the protein up into different shapes, from rods to sheets to amorphous globs, and these forms are often equipped with creases and niches hidden away from the water to bind specific molecules. The packing is very efficient in most proteins, and as a result of the proximity of many atoms in the interior to each other, attractive or electrostatic interactions occur between atoms of differing electron density. An electron-dense atom is attracted to a less dense, or more positive charge, and resists any attempt to break this interaction. As a result, atoms' distances are stabilized. Stability of this kind can be observed in van der Waals interactions, salt bridges, or hydrogen bonding. These types of bonds tend to be comparatively weak concerning the overall energy holding the protein together. However, these bonds are numerous, and in large numbers, contribute greatly to the stability of a conformation.

    Opposing these forces is the desire for the protein's backbone to wiggle around and change shape, also known as its entropy. The observed shape of a protein is just one of an enormous number that the chain of amino acids can rotate and form, and so considerable energy must be taken out of the protein to keep it from rotating out of orientation. This flux of energy is balanced by the packing and electrostatic interactions mentioned above, and in most proteins, the constructive and destructive energies nearly equal each other. This difference is observed in how temperature stable a protein is. At higher temperatures, molecules tend to move faster, and in a protein, the main chain moves about more. At a particular temperature, the packing and electrostatic forces are unable to constrain the extra energy of movement in the amino acid chain, and the chain becomes unpacked.

Levels of Understanding
    Considering this balance of energies, a protein creates a particular stability by wrapping its hydrophilic amino acids around the hydrophobic ones. This center, or core, is able to stabilize itself more by folding in certain manners above random coils. These forms provide extra stability as more amino acid charges are able to find a neutralizing opposite, so that few polar charges disrupt its hydrophobic core. This in turn leads to structures that are organized, strong and stable enough to have a particular function, thereby allowing proteins to have specific intermolecular functions.

    Interactions and folding of this sort are organized into several levels. The most basic is the direct amino acid sequence as the chain is pulled into a line, called the primary structure. Here, many bonds that link distant side chains cannot be seen, as they are linked only when the chains twist to bring them close to one another. This twisting is called the secondary structure, where the folding of the primary structure creates conformations that balance the electron density of the nitrogen and carbon- oxygen groups in the backbone. Interactions between these molecules and hydrogen bunch up the amino acids into structures called helices, strands, and turns. Certain amino acid sequences favor either helices or strands due to the size and charge of their side chains. When these small units fold around themselves, the overall structure of these creates the tertiary structure, and often, these aggregations may cause further secondary structure folding, altering the structure further. The final level of proteins is the collection of separate tertiary structures, whether connected or detached from each other, called the quaternary structure. At this level, its component parts may either have separate enzymatic or structural functions, or may combine to create one function.

Visions of the Microscopic
    Knowledge of the relationship between these levels of structure is important when attempting to predict other levels of structure. Primary structure is now perhaps the easiest to ascertain, for it can be learned by either sequencing its messenger RNA or by chemically breaking the protein down residue by residue, and identifying each piece. To determine what secondary structure a chain might fold into, the sequence is compared to other solved-structure sequences to determine if any residue similarities exist, and if so, then their folding can be predicted. For example, an amino acid sequence that has hydrophobic side chains every three or four residues may be interpreted as a set of straight helices that wrap around a complementary hydrophobic area. However, divination of structure is not certain, for some sequences fold both into sheets and helices, all depending on their tertiary environment.

    By continuing this study of structure, it is possible to compare proteins by entire sequences. Called protein homology, it can be used to approximate unknown tertiary structure. The idea is that through past mutations two copies of a protein that once arose by gene duplication will in time diverge from each other. In general, the hydrophobic core of a protein tends to be conserved, as most mutations will tend to make it more unstable, by disrupting bonds and packing, exposing the core to water. On the exterior, matches between two proteins' hydrophilic and neutral amino acids will be small, as many amino acids have nearly the same affinity for water, and mutations, or genetic drift, may occur without consequence. With this knowledge, a protein with an unknown tertiary structure may be entered in a sequence database, akin to secondary structure, and scanned for similarities. Related proteins need to be only about thirty percent conserved to be similar in their tertiary structures. So if a match does occur, a rough description of the protein may be extrapolated.

    However, often a new amino acid sequence does not match the limited set of solved structures, and so other techniques must be used to determine the structure. One such method is x-ray crystallography, in which radiation is diffracted through an ordered lattice of proteins to determine their structures. The imaging of a protein's structure by x-ray crystallography is analogous to a conventional light microscope, in which both focus diffracted light with a "lens": by scattering x-rays off a protein crystal, the scattering of the x-rays is observed from the interference and reinforcement of the radiation. With considerable effort, this is recombined into an electron density map by either a computer or a human. However, many complications often prohibit crystallography. Proteins are very difficult to crystallize due to their aqueous nature and irregular shape, and each crystal has its particular concentrations of buffer, ions, and pH required to crystallize. Once accomplished, the packing of the crystal might not be tight enough to give a high enough resolution for a sharp map. And portions of the information carried by the diffracted light are undetectable, and must be recalculated for each of the several thousand atoms present. Overcoming these limitations is possible, but they have restricted researchers to solve less than one structure per twenty known protein sequences.

    A second tool in determining a protein's structure is the use of magnetic fields to determine an atom's neighbors. Called nuclear magnetic resonance (NMR), a picture of all the environments that hydrogen atoms are in inside the molecule can be obtained by using just milligrams of very pure protein. The atoms are subjected to varying fields and, depending on their environments, resonate each at a particular frequency. From this, neighboring atoms can be determined, in sequence as well as in space, and a three-dimensional model can be created. However, limitations arise as the frequency data tends to blur together with an increasing number of residues, so NMR tends to be useful with smaller proteins.

Improvements on Nature
    These hindrances in determining a protein's structure create the limiting factor in modifying and designing new proteins. But despite the considerable effort required, many structures are being solved. And with each new identification of a folding pattern, the information can be applied elsewhere, and the process made easier for the next search.

    From this information of how a protein interacts with itself comes protein design and synthesis. Proteins may be modified or created by several techniques. In a process called site-directed mutagenesis, a gene coding for the protein of interest is obtained either by synthesizing the DNA that codes for the amino acids or by locating the native DNA that encodes the protein. The gene is then inserted into a plasmid, a circular piece of DNA that replicates autonomously. The nucleotide sequence encoding the protein can then be modified to make any conceivable variant of the original protein. The cloned gene is matched with a small strand of complementary DNA, that may contain any modification to the protein, as long as the DNA around the change is able to hybridize to the gene. Various techniques then remove the original strand, so that the plasmids produce only mutant protein, and the gene is over-expressed to produce the mutant.

    A second method assembles a protein artificially in a cell-free environment. By joining amino acids piece-wise in a protein synthesis machine in a lab, small chains of up to twenty residues can be synthesized. By linking these with the help of specialized enzymes, the product may be whatever sequence researchers require. As above in site-directed mutagenesis, modifications may be placed anywhere, but due to the piece-wise assembly in this approach, almost anything resembling an amino acid may be substituted. This feature may be used to insert residues not appearing in nature into the sequence to probe how a protein copes with such odd changes, or when understood better, to use non-amino acids for greater efficiency in form than their natural counterparts might supply.

 RecA protein catalyzing recombination in a triple helix of DNARight: An example of nature's machinery at work, a RecA protein catalyzing recombination in a triple helix of DNA. courtesy of Zhurkin, Raghunathan, Ulyanov, Camerini-Otero, Jernigan (NIH)

Dawn's Next Color
    The branch of science that arises from this research is true protein engineering, where given the shape or function of a protein, a string of amino acids may be picked that folds into and around this form. Even though this task is far from accomplished, great steps are being taken. Catalytic effects are being changed in proteins by substituting one amino acid for another in the active site of a protein. Side chains are altered from nitrogen to carbon to observe a structural change. With these apparent single variable experiments, hosts of unexpected interactions arise. As mentioned above, the forces that seek to fold or disrupt the protein are closely balanced and, due to the large number of atoms, are difficult to calculate exactly. To obtain them, many approximations must be made, such as for entropy, estimating the huge number rotations into which a chain may fold. Water and ions might unexpectedly play a role in a protein's folding, by carrying charges and repelling side chains. It is from small approximations and oversights such as these that a value of stability might be inaccurate, that a changed value might physically exert its force on a neighboring side chain, and cause it to rotate in an entirely different direction to minimize its energy, taking the rest of the secondary structure with it. The result often degrades or destroys the structure of a protein, and consequently, its function.

    However, progress in smaller systems is being achieved. Small proteins with new sequences are synthesized to act as off switches for genes. Helices are created that combine to span the cell membrane and probe how a cell regulates ions. Entire tertiary structures from different proteins are linked from the DNA level to create a quaternary structure where common enzymatic functions now have novel regulatory mechanisms.

    By using the above mentioned techniques to modify a protein, its functions are now open to analysis and have the potential to be both changed and improved. Keeping in mind the difficulties in projecting the alterations done to an amino acid sequence, the number of improvements to existing proteins is still enormous. The ability to either degrade or reinforce bonds and electrostatic forces open up an enormous number of possibilities. The durability of proteins in their environment is able to be adjusted. A therapeutic drug's lifespan can be regulated, an industrial enzyme's usefulness strengthened for adverse conditions, or photosynthesis made more efficient in plants by altering carbon dioxide uptake.

    Of all the sciences striving to create the first nanomachine, protein engineering may be closest to that goal, for already these machines exist in nature in a magnificent diversity of form and function. However, in understanding this vast array, the biologist's dilemma is one of discovering the shape of puzzle pieces from the completed picture. For while other methods attempt to create nanomachines atom by atom, biologists must wade through several billion years of redundancy, obsolescence, and innovation to discover their keys to construction. But with both the increasing computer power for modeling and knowledge on how proteins fold, that goal might not be far off at all.

About the Author... The artist formerly known as Niles Donegan is a senior studying biochemistry in the College of Arts & Sciences, happy to be surrounded by zinging protons of ambient knowledge here at Cornell.