PSI Structural Biology Knowledgebase

PSI | Structural Biology Knowledgebase
Header Icons

Related Articles
Microbiome: Expanding the Gut Gene Catalog
November 2014
Complex Search
September 2014
Repairing a Rift
September 2014
iTRAQing the Ubiquitinome
July 2014
Immunity: Clustering Immunoglobulins
June 2014
Mining Protein Dynamics
May 2014
Design and Discovery: Identifying New Enzymes and Metabolic Pathways
January 2014
Epigenetics: Tracing Histone Demethylase Inhibitors
December 2013
Cancer Networks: Predicting Catalytic Residues from 3D Protein Structures
November 2013
Protein-Nucleic Acid Interaction: Inhibition Through Allostery
July 2013
Infectious Diseases: Targeting Meningitis
May 2013
Protein Interaction Networks: Reading Between the Lines
April 2013
Design and Discovery: A Cocktail for Proteins Without ID
February 2013
Targeting Enzyme Function with Structural Genomics
July 2012
More in one
June 2012
Disordered Proteins
February 2012
RNA Chaperone NMB1681
July 2011
Capsid assembly in motion
April 2011
One at a time
April 2011
A growing family
February 2011
Predicting functions within a superfamily
January 2011
Isoxanthopterin Deaminase
November 2010
Scaling up mutational scanning
November 2010
Alpha/Beta Barrels
October 2010
Mre11 Nuclease
May 2010
Assigning protein function: GeMMA
April 2010
Face off
October 2009

Technology Topics Annotation/Function

Assigning protein function: GeMMA

PSI-SGKB [doi:10.1038/th_psisgkb.2010.14]
Technical Highlight - April 2010
Short description: A new approach to automated functional subfamily classification works on huge superfamilies and is well suited to structural genomics.

It is not an easy task to group proteins according to their function. There are several reasons for this: the overriding one is the lack of good experimental data for many proteins, but lack of traceable author statements, and errors from function-prediction software also contribute. The new GeMMA (Genome Modelling and Model Annotation) protocol developed by Christine Orengo in partnership with PSI MCSG is, however, now able to classify very large and diverse superfamilies into functional subfamilies.

Although computational prediction of function has greatly improved over the past few years, most approaches still rely on sequence homology, but it is not clear what level of similarity is needed. This has led to the information being not very specific and an error rate for the annotation of complete genomes that is hard to determine, with some workers estimating that it is greater than 40% and others that it is less than 5%.

There are three ways to predict protein function: phylogenomics, pattern recognition and clustering. Phylogenomics relies on evolutionary relationships within a family of proteins and so compares whole protein sequences. Pattern recognition classifies proteins using locally conserved sequence patterns; an example of this approach is Pfam, a comprehensive collection of protein families that is used extensively to guide target selection in structural genomics. Clustering groups together sequences on the basis of their similarity and displays them as a hierarchical tree.

GeMMA uses two methods: pattern recognition and clustering. GeMMA is not the first hybrid method — SCI-PHY (Subfamily Classification In PHYlogenomics) is also a hybrid — but it is the first that does not require an initial multiple alignment of all sequences. The upshot is that much larger and more diverse superfamilies can be compared than before. In addition, GeMMA can be 'trained' on annotated protein families to establish similarity thresholds for low-quality annotated families.

When GeMMA was compared with SCI-PHY, Orengo's team found that SCI-PHY was optimized for high specificity at the expense of sensitivity. GeMMA, by contrast, achieves a balance between sensitivity and specificity. In future, it might well be that SCI-PHY and GeMMA are routinely used together, combining GeMMA's ability to handle large data sets and SCI-PHY's high specificity. A high-throughput version of GeMMA has also developed.

Related articles

Deducing function from small structural clues

Spot the pore

How does Dali work?

Maria Hodges


  1. David A. Lee, Robert Rentzsch & Christine Orengo. GeMMA: functional subfamily classification within superfamilies of predicted protein structure domains.
    Nucleic Acid Res. 38, 720-737 (2009). doi:10.1093/nar/gkp1049

Structural Biology Knowledgebase ISSN: 1758-1338
Funded by a grant from the National Institute of General Medical Sciences of the National Institutes of Health