Obtaining diffraction-quality crystals can be frustrating. For the PSI, with its focus on solving structural representatives of all major protein families, lack of useable crystals is a significant stumbling block.
Faced with such a huge task, structural genomics centers now routinely use a 'genome pool' strategy, whereby multiple members of a protein family are simultaneously purified and crystallized right from the start. Jaroszewski et al. from the PSI JCSG show that this approach works well and propose that only if it fails should new versions of the protein be designed and produced.
The authors examined crystallization information recorded in the PSI database TargetDB and produced a two-dimensional distribution of crystallization success rates plotted against sequence identity to the closest crystallized homolog and against sequence identity to the closest non-crystallized homolog. They show that features associated with difficulties in crystallization can be predicted because they are conserved between homologs, and that crystallization success is correlated only for the closest homologs.
For instance, only targets with greater than 75% sequence identity to any crystallized target have a very high change of success, but targets with even 40% sequence identity to ones that failed are unlikely to yield crystals.
The chance of solving at least one structure from a family increases with the number of sequences available per family.
Using data mining of TargetDB, coupled with experimental validation, the authors grouped target protein sequences into five crystallization classes from 1 (optimal) to 5 (very difficult) and analyzed known microbial genomes using this system. Their results showed, surprisingly, that there is little difference in the overall distribution of these classes among genomes, and most individual families with sufficient diversity exhibit a range of crystallization classes. But there are some families that have more 'optimal' targets and others that have very few. These 'difficult' families can still be dealt with by increasing the size of the 'genome pool' (see Figure), an approach that is feasible now that more than 1,000 genomes have been sequenced.
The genome pool approach of targeting multiple members of a given protein family is a simple way to improve the success rate of structure determination for that family. The likelihood of crystallization can be further increased by selecting target sequences with favorable characteristics; for example, using for guidance XtalPred, which was developed by the same group.