Metrics User Documentation

The PSI:Biology Metrics resource tracks the research efforts made the PSI:Biology Network. This user guide describes the source of the data presented on this resource [1]. Much of the information is taken from the experimental data tracking database, TargetTrack, as well as the PSI:Biology Materials Repository, the Structural Biology Knowledgebase (SBKB), the SBKB Technology Portal, the SBKB Publications Portal, and the Protein Data Bank archive.

PSI:Biology targets are identified in TargetTrack as satisfying these base requirements:

  • a Project List containing PSI:Biology or PSI:3
  • a target Date Updated entry later than June 30, 2010 to indicate that it has been actively pursued during PSI:Biology.

1. Quantitative Metrics

  1. 1.1 Number of total targets (Table 1, col 1) TT

    Counts the number of distinct targets from each center that satisfy the base requirements.

  2. 1.2 Number of proteins cloned (Table 1, col 2) TT

    Counts the number of distinct targets from each center that satisfy the base requirements on page 1 as well as the following information in TargetTrack:

    • a trial Status History List with a status value of "cloned", indicating the target has been successfully cloned
  3. 1.3 Number of proteins tested for expression (Table 1, coming soon) coming soon

    Counts the number of distinct targets from each center that satisfy the base requirements above as well as the following information in TargetTrack:

    • a trial Status History List with a status value of "expressed" OR a trial Status History List with a status value of "Work Stopped" and trial Stop Status value "expression failed".
  4. 1.4 Number of proteins successfully expressed (Table 1, col 3) TT

    Counts the number of distinct targets from each center that satisfy the base requirements above as well as the following information in TargetTrack:

    • a trial Status History List with a status value of "expressed", indicating the target has been successfully expressed
  5. 1.5 Numbers of protein purified (Table 1, col 4) TT

    Counts the number of distinct targets from each center that satisfy the base requirements above as well as the following information in TargetTrack:

    • a trial Status History List with a value of status "purified", indicating the target has been successfully purified
  6. 1.6 Number of proteins in structural pipeline and progress through various stages - X-ray/NMR/EM characterization through structure determination. (Table 1, col 5) TT

    Counts the number of distinct targets from each center that satisfy the base requirements as well as the following information in TargetTrack:

    • For Targets undergoing X-ray Crystallography studies: a Protocol Reference to a crystallization protocol OR a trial Status History List with any one of the following status values: "crystallized", "diffraction-quality crystals", "diffraction", "native diffraction-data", "phasing diffraction-data", or "crystal structure".

    • For Targets in Nuclear Magnetic Resonance (NMR) studies: a Protocol Reference to a NMR protocol OR a trial Status History List with any one of the following status values: "HSQC Satisfactory", "NMR assigned", "NMR backbone resonances assigned", "NMR sidechain resonances assigned", "NMR structure", or "in BMRB".

    • For Targets in Electron Cryo-Microscopy studies: a Protocol Reference to an EM protocol OR a trial Status History List with any one of the following status values: "EM images", "EM reconstruction", "EM reconstruction in EMDB", or "EM fitted model".

  7. 1.7 Number of targets in PDB (Table 1, column 6) TT PDB

    Counts the number of distinct targets from each center that satisfy the base requirements as well as the following information in TargetTrack:

    • a trial Status History List with a status value of "in PDB"
    • a PDB PSI:Biology structure deposition date (not release date) after June 30, 2010.

    Note: This list may count targets with unreleased structures, since reports to TargetTrack often precede the release of the structure from the PDB.

  8. 1.8 Number of structures in PDB (Table 1, column 7) PDB

    Counts the number of structures in the PDB archive with an SG_Project tag of PSI-Biology, filtered by center using the SG_Center tag. This tally includes only released structures.

  9. 1.9 - 1.10 Numbers of targets produced (Table 2, cols 1-6) TT

    Counts the number of distinct targets from each center that satisfy the base requirements as well as the following information in TargetTrack:

    • a Target Category List with a Target Category Name of:
      • PSI Biology Partnership (column 1) - indicating the target is under study with a PSI:Biology Partnership group

      • Biomedical (column 2) - indicating the target is of biomedical research relevance

      • Structural Coverage (column 3) - indicating the target is part of a protein domain family or proteome coverage study

      • Community Nominated (column 4) - indicating the target was nominated outside the depositing center by the scientific community

      • Metagenomic (column 5) - indicating the target was derived from an environmental sample

      • Membrane Protein (column 6) - indicating the target is a membrane protein

2. Structure Determination

This section counts the number of different structural and biological topics targeted by the PSI:Biology network as categorized in TargetTrack.

  1. 2.1 Types of protein targets (Table 3)

    This section counts the types of proteins that were structurally determined by the PSI:Biology Network as a whole. At the moment, these values are calculated using the methods below; obtaining the counts from the Target Protein Type data element in TargetTrack is under development.

    1. Single-domain proteins (row 1) SBKB

      This item is derived from the sequence in the PDB data. All target structures are analyzed at the SBKB using a program called ProteinDomainParser [2]. The number of structures shown returned a result of 1 (and only 1) domain. Structures that return a value of 0 domains are manually inspected.

    2. Multi-domain proteins (row 2) SBKB

      This item is derived from the sequence in the PDB data. All target structures are analyzed by SBKB using a program called ProteinDomainParser [2]. The number of structures returned a result of 2 or more domains.

    3. Oligomeric Proteins (row 3) PDB

      This item is derived from PDB data (REMARK 350) that states the non-crystallographic, biological symmetries within the protein structure. The PDB experimental evidence includes that provided by depositing user as well as computed oligomeric annotation obtained from the program PISA [3]. All structures with a value greater than monomeric (1) are counted.

    4. Membrane Proteins (row 4) PDB

      This item is derived from PDB data. A count of the number of PSI:Biology structures that have "membrane protein" as a keyword is conducted.

    5. Eukaryotic Proteins (row 5) PDB

      This item is derived from PDB data. A count of the number of PSI:Biology structures that have a eukaryotic organism reported as the source organism using NCBI Taxonomy as the comparative source.

    6. Protein-protein complexes (row 6) PDB

      This item is derived from PDB. All PSI:Biology structures with more than one unique protein entity within it are counted.

    7. Protein-nucleic acid complexes (row 7) PDB

      This item is derived from PDB. All PSI:Biology structures that have at least one protein and one nucleic acid entity within it are counted.

    8. Protein-ligand complexes (row 8) PDB

      This item is derived from PDB. All PSI:Biology structures that have at least one protein entity and one ligand record are counted.

    9. de novo/designed proteins (row 9) PDB

      This item is derived from PDB. All PSI:Biology structures that have the words "de novo" or "designed" in the title and/or keywords are counted.

  2. 2.2 Coverage of a defined field (Table 4, coming soon) coming soon TT

    PSI:Biology is a high-throughput structural biology (structural genomics) project which is suited for studies that cover large areas of biological space. This section counts the number of structures solved within the defined target categories.

    Counts the number of distinct targets from each center that contain the mentioned base constraints the following information in TargetTrack:

    • a Target Category List with a Target Category Name of:
      • first structure of class (row 1)
      • eukaryotic domain family (row 2)
      • general domain family (row 3)
      • protein family of high biological importance (row 4)
      • individual organism (row 5)
      • disease (row 6)
      • conformational state (row 7)
      • functional mutant (row 8)
      • complex with biological interaction partner (row 9)
  3. 2.3 Methods used for structure determination PDB

    This item is derived from PDB data ( EXPDTA) that identifies the technique used to solved PSI:Biology protein structures (i.e. X-ray crystallography, nuclear magnetic resonance, small angle X-ray scattering, electron cryo-microscopy, or others).

3. Network Building

One of the goals of the PSI:Biology network is to be a collaborative network for the greater biological community to rely on for structures and resources. This section counts the level of collaboration within the PSI:Biology Network and beyond it.

  1. 3.1 Collaboration

    The PSI:Biology project is a network of ~100 scientists within 25 centers and partnerships, and they also collaborate with other scientists outside of the Network.

    1. Internal to PSI TT coming soon
      1. Center & biological partners
      2. Membrane Network centers
      3. Centers for High-Throughput Structural Determination

      These values will be calculated in the future using TargetTrack data from the data element Target Partnership List.

    2. External to PSI coming soon
      1. Structural Community
      2. Biological Community

      Requirements are being determined for these metrics items for future development.

    3. Collaborations created by Community Target nominations SBKB

      Counts the number of proposals assigned to each participating center from the SBKB's community-nominated targets portal (http://sbkb.org/cnt/) after June 30, 2010.

    Note 1: if an investigator submitted separate proposals for separate projects, these are counted as separate collaborations.

  2. 3.2 Outreach

    The impact of the PSI:Biology can be increased through means of outreach to the greater scientific community. This section counts the number of ways in which the Network has

    1. Website hits - under development. SBKB coming soon

      This section will report the number of visits, unique visitors, new visitor and returning visitor rates for the SBKB (http://sbkb.org)

4. Resources

The PSI:Biology Network makes all of its research resources available to the public. The following resource counts come from center depositions and TargetTrack.

  1. 4.1 Data coming soon
    1. Expression
    2. Purification
    3. Crystallization results
    4. NMR solution conditions (HSQC spectra)
  2. 4.2 Protocols used by the PSI:Biology program TT

    Counts the number of protocols that are referenced (used by a center) by trials within TargetTrack. In additional to the base requirements, this section looks for:

    A Trial Protocol List that contains a Protocol Reference with one of the following Protocol Type values:

    1. expression (column 1)
    2. purification (column 2)
    3. crystallization (column 3)

    Note: The centers have deposited additional protocols for all steps of the protein production and structure determination pipeline. Visit http://sbkb.org/tt/protocolStats.html for further details.

  3. 4.3 Clones available from PSI-MR Materials Repository

    Counts the number of DNA plasmids available from the PSI:Biology-Materials Repository (http://psimr.asu.edu). This data is supplied by the PSI-MR.

  4. 4.4 Expressed proteins distributed from centers - Future development coming soon

    These data will be provided through a new TargetTrack element to be developed in the future.

5. Technologies

This section counts the technological advances made by the Network developed to break through typical (and not so typical) experimental bottlenecks. This data is collected from the PSI Technology Portal and is refreshed each Tuesday night.

  1. 5.0 PSI-1 and PSI-II Tech

    The first column shows technologies deposited by the PSI Center in previous PSI phases, since those technologies are likely still in use during PSI:Biology. These totals come from the declaration of project (selection: PSI-1 or PSI-2).

  2. 5.1 Devices Tech

    PSI:Biology technology totals come from the declaration of (1) project (selection: PSI:Biology) and (2) technology type (selection: device) in the Technology Portal report deposition form ( http://technology.sbkb.org/portal/; log-in required).

  3. 5.2 Software and Servers Tech

    PSI:Biology technology totals come from the declaration of (1) project (selection: PSI:Biology) and (2) technology type (selection: software) in the Technology Portal report deposition form ( http://technology.sbkb.org/portal/;log-in required).

  4. 5.3 New Technologies for Proteins and Protein Complexes Tech

    PSI:Biology technology totals come from the declaration of (1) project (selection: PSI:Biology) and (2) technology type (selection: protein) in the Technology Portal report deposition form ( http://technology.sbkb.org/portal/; log-in required).

  5. 5.4 New Technologies for Eukaryotic Proteins Tech

    PSI:Biology technology totals come from the declaration of (1) project (selection: PSI:Biology) and (2) technology type (selection: eukaryotic protein) in the Technology Portal report deposition form ( http://technology.sbkb.org/portal/;log-in required).

  6. 5.5 New Technologies for Membrane Proteins Tech

    PSI:Biology technology totals come from the declaration of (1) project (selection: PSI:Biology) and (2) technology type (selection: membrane protein) in the Technology Portal report deposition form ( http://technology.sbkb.org/portal/; log-in required).

  7. 5.6 New Technologies for Post-Translational Modifications Tech

    PSI:Biology technology totals come from the declaration of (1) project (selection: PSI:Biology) and (2) technology type (selection: post-translational modification) in the Technology Portal report deposition form ( http://technology.sbkb.org/portal/;log-in required).

6. Publications

Another mode of outreach is the primary scholarly activity of publications and PDB downloads. The Metrics site uploads this information from the Publications Portal each Tuesday.

  1. 6.1 Number of Publications Publications

    Each peer-reviewed PSI:Biology publication, (including journal articles, letters to the editor, and book chapters) is reported by each center to the PSI SBKB Publications Portal. This value is counted from the number of publication entries that have a project assignment of PSI:Biology.

  2. 6.2 Number of Times an Article is Cited Publications

    The number of citations for each PSI:Biology publication (e.g. the number of subsequent peerreviewed publications that cite the given work) is automatically extracted weekly from the Web of Knowledge database (Thomson Reuters; http://apps.webofknowledge.com) and stored by the Publications Portal.

  3. 6.3 Total Journal Impact Publications

    Most of the scientific journals publishing peer-reviewed PSI:Biology publications have known impact factors. The impact factor of a journal for a given year (e.g. 2010) is the mean number of times papers published in the journal during the previous two years (e.g. 2008 and 2009) are cited by other peer-reviewed publications in the given year. The impact factors for each journal are published in Journal Citation Reports yearly (Thomson Reuters). It should be noted that not all journals publishing peer-reviewed PSI:Biology publications have recorded impact factors, most notably the Journal of Structural and Functional Genomics (Springer).

    The Publications Portal tracks the most recent impact factors available (currently, the 2010 edition of Journal Citation Reports) for the publishing journal of each PSI:Biology paper and makes that information available to the Metrics report. The third column (Total Journal Impact) of the Metrics Publication table displays the weighted sum of the journal impact factors for all PSI:Biology publications by center (i.e. if two or more papers appear in the same journal, the impact factor for that journal is summed multiple times).

  4. 6.4 Impact of publications that cite PSI structures. under development coming soon Publications
  5. 6.5 PDB entries
    1. 6.5.1 Number of downloads from PDB to assess impact PDB

      This item counts the total number of times a structure solved by each PSI:Biology center was downloaded from the wwPDB, based on wwPDB download statistics data.

For future development:

The following metrics items are in the planning stages for data capture and metrics development.

3. Network Building

3.2. Outreach
3.2.1. Website hits - available soon from SBKB/Google Analytics
3.2.2. Meeting presence
3.2.3. Workshops
3.2.4. Recruitment of scientists of PSI network
3.3. Training
3.3.1 Collaborators
3.3.2 Students/postdocs
3.3.3 Visiting PI's

4. Resources

4.1 Data - a committee has been charged
4.1.1 Expression
4.1.2 Purification
4.1.3 Crystallization results
4.1.4 NMR solution conditions (HSQC spectra)

7. Discovery

7.1. Discovery and biochemical characterization of novel proteins and their functions
7.2. Discovery and biochemical characterization of protein complexes and their functions
7.3. Biophysical characterization enabled by large-scale protein expression and purification
7.3.1. Biophysical characterization enabled by large-scale protein expression/purification - structure determination or analysis by crystallography, NMR, SAXS, CryoEM, and other biophysical methods.
7.3.2. Biophysical characterization enabled by large-scale protein expression/purification -study of structure/function relationships.

8. Biology

8.1. Enabling of biological or biochemical studies of function through the availability of structure information or purified protein
8.1.1. Mutagenesis;
8.1.2. Structure/function studies
8.1.3. Biochemical and cell-based assays
8.1.4. Structure-based (or protein identification-based) modeling studies, identification of new compound classes, drug leads, docking of binding partners, modeling of related proteins
8.1.5 Elucidation of new biology/chemistry
8.2. Fulfillment of specific aims of the biological partners



[1] A note about formatting: Words in italics are the names of TargetTrack data elements. Please visit http://sbkb.org/tt/guidelines.html for more information about proper use of data elements.

[2] Alexandrov N, Shindyalov I. PDP: protein domain parser. Bioinformatics. (2003) Feb 12;19(3):429-30.

[3] Krissinel E, Henrick K. Inference of macromolecular assemblies from crystalline state. J. Mol. Biol. (2007) 372:774-797.