PSI Structural Biology Knowledgebase

PSI | Structural Biology Knowledgebase
Header Icons
E-Collection

Related Articles
Drug Discovery: Solving the Structure of an Anti-hypertension Drug Target
July 2015
Retrospective: 7,000 Structures Closer to Understanding Biology
July 2015
Design and Evolution: Bespoke Design of Repeat Proteins
June 2015
Design and Evolution: Molecular Sleuthing Reveals Drug Selectivity
June 2015
Design and Evolution: Tunable Antibody Binders
June 2015
Design and Evolution: Unveiling Translocator Proteins
June 2015
Evolution of Photoconversion
June 2015
Families in Gene Neighborhoods
June 2015
Protein Folding and Misfolding: A TRiC-ster that Follows the Rules
March 2015
Protein Folding and Misfolding: Beneficial Aggregation
March 2015
Peptidyl-carrier Proteins
October 2014
Predicting Protein Crystal Candidates
October 2014
Protein and Peptide Synthesis: Coming Full Circle
October 2014
Protein and Peptide Synthesis: Sensing Energy Balance
October 2014
Mining Protein Dynamics
May 2014
Novel Proteins and Networks: Assigning Function
May 2014
Novel Proteins and Networks: Polysaccharide Metabolism in the Human Gut
May 2014
Design and Discovery: Evolutionary Dynamics
January 2014
Design and Discovery: Identifying New Enzymes and Metabolic Pathways
January 2014
Design and Discovery: Virtual Drug Screening
January 2014
Caught in the Act
December 2013
Microbiome: Insights into Secondary Bile Acid Synthesis
September 2013
Microbiome: Structures from Lactic Acid Bacteria
September 2013
The Immune System: A Brotherhood of Immunoglobulins
June 2013
The Immune System: Super Cytokines
June 2013
Design and Discovery: A Cocktail for Proteins Without ID
February 2013
Design and Discovery: Enzyme Reprogramming
February 2013
Design and Discovery: Extreme Red Shift
February 2013
Design and Discovery: Flexible Backbone Protein Redesign
February 2013
Designer Proteins
February 2013
Membrane Proteome: Sphingolipid Synthesis Selectivity
December 2012
Symmetry from Asymmetry
October 2012
Serum albumin diversity
August 2012
Pocket changes
July 2012
Predictive protein origami
July 2012
Targeting Enzyme Function with Structural Genomics
July 2012
Finding function for enolases
June 2012
Substrate specificity sleuths
April 2012
Disordered Proteins
February 2012
Metal mates
February 2012
Making invisible proteins visible
October 2011
Alpha/Beta Barrels
October 2010
Deducing function from small structural clues
February 2010
Extremely salty
February 2010
Membrane proteins spotted in their native habitat
January 2010
How does Dali work?
December 2009
Secretagogin
December 2009
Designing activity
September 2008

Research Themes Protein design

Predicting Protein Crystal Candidates

SBKB [doi:10.1038/sbkb.2014.225]
Technical Highlight - October 2014
Short description: An improved algorithm predicts whether a protein will be amenable to crystallization for structural studies.

Bioinformatic analyses can recognize biophysical features favoring well diffracting crystals (blue bars) and those leading to poorly diffracting ones (gray bars). Figure courtesy of Adam Godzik.

Inordinate amounts of time and money are spent on failed attempts to produce diffraction-quality protein crystals. Software such as the XtalPred server helps avoid wasted effort by predicting the likelihood of successful crystal formation based on a protein's physicochemical properties. Homologs can often be selected or variants engineered with better chances of forming high-quality crystals. XtalPred uses an “expert pooling” method to combine probabilities of success based on seven protein properties.

Godzik and colleagues (PSI JCSG and Sanford-Burnham Medical Research Institute) now test a number of machine-learning methods and update XtalPred with a random forests classifier that yields superior performance. Random forests is an ensemble method that searches large numbers of decision trees to predict the closest match to training data; each tree derives from a bootstrap data sample and random subsets of variables. When trained on the same features, XtalPred-RF (XtalPred with random forest) doubles the performance of its predecessor, as measured by the Matthews correlation coefficient.

XtalPred-RF also exploits a much larger training data set from the PSI TargetTrack database and incorporates additional surface features, including hydrophobicity, surface “ruggedness,” side-chain entropy and amino acid composition of the protein surface. The authors define ruggedness as the ratio of surface area (the sum of solvent accessibilities of individual residues) to the total accessible area estimated for a protein of a given mass.

To convert binary classifications into a ranking system, XtalPred-RF includes a number of independent classifiers, each trained on a data set with a different proportion of successful and failed attempts at structural determination (the number of failed proteins are undersampled to balance the data to differing extents).

The authors demonstrate XtalPred-RF on target selection from 271 Pfam families studied by the PSI JCSG. They estimate that, using the new software, 30% fewer structures would have been attempted, without affecting the number of families represented by solved structures. Thus, this new software promises to help identify individual targets, as well as facilitate high-throughput structural biology.

Tal Nawy

References

  1. S. Jahandideh, L. Jaroszewski & A. Godzik Improving the chances of successful protein structure determination with a random forest classifier.
    Acta Crystallogr D Biol Crystallogr. 70, 627-35 (2014). 10.1107/S1399004713032070

Structural Biology Knowledgebase ISSN: 1758-1338
Funded by a grant from the National Institute of General Medical Sciences of the National Institutes of Health