Overview: Edit

The field of structural genomics aims to determine the 3-dimensional structure of all the proteins on a genome through experimental and modeling methods. The field depends on high-throughput approaches to determine protein structure, and recent technological advances have made these methods more efficient than ever before. The field has been growing exponentially over the last decade due to advances in technology and software that allow a high volume of sequences to be determined, stored in a database, and publicly shared.

Protein structure is closely related to protein function, and so there is a great amount of potential knowledge to be gained in drug development and medical technologies from understanding protein structure. A computer generated model of the protein structure also allows one to identify novel folds in the protein which are integral for the development of new pharmaceuticals. 

Structural genomics is closely related to structural bioinformatics which comprises of determining the function of a given protein based on its 3D structure. It's worth noting that there is currently a huge gap in the numbers of proteins sequences and protein structures, but as we gain more and more data the likelihood of homology and positive protein structures only increases.  

Methods: Edit

Structural genomics relies on several methods to determine protein structure, and they are typically broken into two categories experimental, which creates a completely new structure based on the sequence, and modeling which relies on homology to other sequences to create structures similar to protein structures that have been previously determined.     Edit

Experimental: Edit

X-ray crystallography

X-ray Crystallography: A narrow, parallel beam of x-rays is directed at a crystalized protein sample. The light that is scattered creates spots, and the collection of these spots forms a diffraction pattern that can be used with the sequence to produce a molecular model.

The experimental method of determining 3D structure is referred to as de novo, or from the beginning, it comprises of the cloning of every Open Reading Frame (ORF) in the sequence and the expression of these as proteins. These proteins are then crystallized and have their structures determined through either Nuclear Magnetic Resonance (NMR) or X-ray Crystallography. By using a whole genome approach all of the proteins are expressed at once, however this method takes a lot of time and energy, and is therefore done in few labs.

Modeling: Edit

Modeling methods depend on the similarity between the sequence of an unknown protein and the sequence of a previously solved, and stored on a database, protein structure. There are two main modeling methods that are performed in the field. Edit

Sequence Based Modeling uses the similarities between the two genome sequences, and based on the similarities utilizes the solved protein structure as a model for the new one. Accuracy is determined by the similarity of the amino acid sequences. However, the overall accuracy of this practice is not renowned because of the general lack of homologues within the system. The overall quality of this and other structural prediction methods is assessed by the Critical Assessment of Protein Structure Prediction (CASP).     

The other main modeling method is ab initio which is utilized when there are no known homologues of the given sequence. The technique utilizes physical and chemical interactions of encoded amino acids to predict structure. The forefront program for this is the Rosetta@home program, where individual project researchers can infer primary structure from amino acid sequences, which are crucial for understanding and developing protein structure.     

Goals: Edit

The majority of information generated in the field is posted onto open databases where it is immediately available for the community to view. Structural genomics is difficult because it generates a massive amount of structures and information that don't have verifying secondary sources for the information. However, with both the saturation of these databases and the power of computers utilized increasing there becomes a greater chance that there are homologous proteins in the database. Finding homologous proteins is key in the field for understanding the relation between a genome sequence and a subsequent protein structure and function.

References: === Alberts, Bruce. "Analyzing Protein Structure and Function." Analyzing Protein Structure and Function. U.S. National Library of Medicine, 18 Feb. 0000. Web. 03 Sept. 2014.

Baker, David. "Progress In Ab Initio Protein Structure Prediction."TheScientificWorldJOURNAL 2 (2002): 31. Zhang Lab. Web.

"CASP." Wikipedia. Wikimedia Foundation, 24 Aug. 2014. Web. 03 Sept. 2014.

Hura, G. L., and M. Hammel. "Robust, High-Throughput Analysis of Protein Structures." Robust, High-Throughput Analysis of Protein Structures. Berkeley Lab, 2003. Web. 03 Sept. 2014.

"Video - Kurt Wuthrich (2012) : Structural Genomics - Exploring the Protein Universe." Video - Kurt Wuthrich (2012) : Structural Genomics - Exploring the Protein Universe. Lindau Nobel Laureate Meetings, 2012. Web. 03 Sept. 2014.