SUPERFAMILY 1.75 HMM library and genome assignments server

Domain-centric Phenotypye Annotations and Structural Domain Phenotype Ontology (SDPO)

Jump to [ Top · Domain2PO · SDPO · Data availability ]

This document explains the details behind phenotypic annotations of structural domains that are classified in the Structural Classification of Proteins (SCOP) database (Andreeva, et al., 2008). Like GO and any OBO-format ontologies, phenotypic ontologies (PO, as highlighted below) have been developed to classify and organize phenotypic information related to the Human/model organisms from the very general at the top to more specific terms in the DAG.

  • Disease Ontology (DO) is a standardized ontology for human disease by semantically integrates disease and medical vocabularies through extensive cross mapping of DO terms to MeSH, ICD, NCI’s thesaurus, SNOMED and OMIM (Schriml, et al., 2009). Also available are their mappings onto human genome (Osborne, et al., 2009).

  • Human Phenotype Ontology (HP) captures phenotypic abnormalities that are described in OMIM, along with the corresponding disease-causing genes (Robinson, et al., 2008). It includes three complementary biological concepts: Mode of Inheritance (MI), ONset and clinical course (ON), and Phenotypic Abnormality (PA).

  • Mammalian/Mouse Phenotype Ontology (MP) describes phenotypes of the mouse after a specific gene is genetically disrupted (Smith, et al., 2009). Using it, Mouse Genome Informatics (MGI) provides high-coverate gene-level phenotypes for the mouse.

  • Worm Phenotype Ontology (WP) classifies and organizes phenotype descriptions for C. elegans and other nematodes (Schindelman, et al., 2011). Using it, WormBase provides primary resource for phenotype annotations for C. elegans.

  • Yeast Phenotype Ontology (YP) is the major contributor to the ‘Ascomycete phenotype ontology’. Using it, Saccharomyces Genome Database (SGD) provides single mutant phenotypes for every gene in the yeast genome (Engel, et al., 2010).

  • Fly Phenotype Ontology (FP) refers to FlyBase controlled vocabulary. Specifically, a structured controlled vocabulary is used for the annotation of alleles (for their mutagen etc) in FlyBase (Grumbling, et al., 2006).

  • Fly Anatomy Ontology (FA) is a structured controlled vocabulary of the anatomy of Drosophila melanogaster, used for the description of phenotypes and where a gene is expressed (Grumbling, et al., 2006).

  • Zebrafish Anatomy Ontology (ZA) displays anatomical terms of the zebrafish using standard anatomical nomenclature, together with affected genes (Bradford, et al., 2011).

  • Xenopus Anatomy Ontology (XA) represents the lineage of tissues and the timing of development for frogs (Xenopus laevis and Xenopus tropicalis). It is used to annotate Xenopus gene expression patterns and mutant and morphant phenotypes (Bowes, et al., 2009).

  • Arabidopsis Plant Ontology (AP) is a major contributor to Plant Ontology which describes plant ANatomical and morphological structures (PAN) and growth and DEvelopmental stages (PDE). The Arabidopsis Information Resource (TAIR) provides arabidopsis plant ontology annotations for the model higher plant Arabidopsis thaliana (Ilic, et al., 2006; Pujar, et al., 2006).

To incorporate non-OBO-formated ontology, the approach is also applicable in other ontologies with fixed-length or much-simplified hierarchy:

  • Enzyme Commission (EC) is a resource focused on enzyme nomenclature, which is a system of naming enzymes (protein catalysts) with Cross-references to UniProts (Fleischmann et al., 2004). It uses four-digit EC number to define the reaction catalysed. The first three digits are to define the reaction catalysed and the fourth for a unique identifier (serial number).

  • DrugBank ATC code (DB) classifies at five different levels according to the organ or system (1st level, anatomical main group) on which they act and their therapeutic (2nd level, therapeutic subgroup), pharmacological (3rd level, pharmacological subgroup) and chemical properties (4th level, chemical subgroup; 5th level, chemical substance). Only drugs in DrugBank and with the Anatomical Therapeutic Chemical (ATC) classification system are considered (Knox et al., 2011).

  • UniProtKB KeyWords (KW) controlled vocabulary, providing a summary of the entry content and are used to index UniProtKB/Swiss-Prot entries based on 10 categories (the category "Technical term" being excluded here). Each keyword is attributed manually to UniProtKB/Swiss-Prot entries and automatically to UniProtKB/TrEMBL entries (according to specific annotation rules) (Bairoch et al., 2005).

  • UniProtKB UniPathway (UP) a fully manually curated resource for the representation and annotation of metabolic pathways, being used as controlled vocabulary for pathway annotation in UniProtKB (Morgat et al., 2012).

Together with genome-wide domain assignments for proteins in the SUPERFAMILY database (Gough, 2006), we have made statistical inference for detecting phenotype ontology relatedness to structural domains. Specifically, structural domains may bridge gaps between sequences of proteins and their phenotypic outcomes. We reason that mutations of genes, which encode proteins containing the same structural domain, lead to certain phenotype; it is probably due to mutations preferentially occurring in that domain. Based on this, domain-centric phenotypic annotations can be inferred from the protein/gene-level phenotype annotations. Moreover, we have initialized a trimmed-down version of phenotype ontology which is the most informative to annotate domains. This resource represents an ongoing effort to develop a Structural Domain Phenotype Ontology (SDPO). Promisingly, domain-centric phenotypic annotations can serve as an alternative starting point to explore genotype-phenotype relationships. Together with sTOL (sequenced Tree Of Life), this resource can also be exploited to look at the distribution of sets of domains annotated by any chosen phenotype term along the course of species evolution.


The pipeline of inferring PO annotations of SCOP domains

Jump to [ Top · Domain2PO · SDPO · Data availability ]

The motivations behind are: if a PO term tends to annotate proteins containing a domain, then such term should also confer phenotypic signals for that domain. Such phenotypic signals for a domain can be reversely inferred if the number of domain-containing PO-annotated proteins is significantly higher than would be expected by chance. Figure 1 summarizes the procedures how to generate domain-centric PO annotations from individual protein/gene-level annotations in the Human.

Figure 1. Flowchart of inferring domain-centric PO annotations using protein/gene-level PO annotations and domain assignments in SUPERFAMILY database.

    Data Source Protein/gene-level PO annotations is taken from phenotype ontology of interest. We only consider the longest transcript to ensure the one-gene-one-protein mapping is valid, as these phenotypic annotations are gene-orientated rather than protein-based. Unlike Domain2GO, associations between domains and phenotypes are only supported by all proteins (i.e., Gene2PO mapping matrix), due to the failure of statistical testing using insufficient number of singleton domain proteins in the human genome.

    Statistical Analysis For a Gene2PO mapping matrix, two types of enrichments are performed to infer the overall and relative associations between a domain and a PO term (Figure 2). The hierarchical structure of PO is organized as a directed acyclic graph (DAG) by viewing an individual term as a node and its relations to parental terms (allowing for multiple parents) as directed edges. Statistical inference of possible association between a PO term (say t) and a domain (say d), is performed not only in terms of our analyzable gene space, but also in the context of those genes annotated to all direct parents of that PO term. These dual constraints ensure that only those most informative PO terms are retained. When simultaneously comparing multiple hypothesis tests, statistical significance of domain-PO term associations can be assessed by the method of false discovery rate (FDR) (Benjamini and Hochberg, 1995). The resultant FDR is used to determine the significance of domain-PO term associations.

    Domain2PO The criteria for identifying the high-quality domain-PO associations are based on stringent FDR (<0.001). Since SCOP classifies evolutionary-related domains into superfamily level and family level, we have accordingly generated the domain-centric PO annotations at each of two domain levels.

Figure 2. The statistical significance of inference is assessed based on the hypergeometric distribution, generating overall over-representation in terms of the whole annotations (left panel) and relative over-presentation in terms of all direct parents (middle panel). Based on the maximal P-values, statistical significance of domain-PO term associations can be assessed by the method of FDR accounting for multiple hypothesis tests (right panel).


Initializing structural domain phenotype ontology

Jump to [ Top · Domain2PO · SDPO · Data availability ]

Based on high-quality Domain2PO, we have also initialized a trimmed-down version of PO which is the most informative to annotate structural domains (Figure 3).

Figure 3. Flowchart of creating structural domains phenotype ontology (SDPO) based on information theoretic analysis of Domain2PO annotation profiles.

    First, we apply information theory to define information content (IC) of a PO term: negative log10-transformation of the frequency of observing domains annotated to that term. For any domain, PO terms annotated to that domain constitute a domain-PO annotation profile in DAG, including direct annotations as well as inherited annotations according to the true-path rule. Considering the nature of dependencies among PO terms (or so-called true-path rule), a domain/protein directly annotated to a specific PO term (termed as direct annotations) should be inheritably annotated to its parental terms (terms as inherited annotations). PO annotations generated above can be considered as direct annotations. The complete PO annotations (direct and inherited) are used to calculate IC for all PO terms. Of note, those PO terms with similar IC can represent a partition of DAG in terms of Domain2PO.

    Second, given a predefined IC (say 1) as a seed and its corresponding the range (say, [0.75 1.25]), the proposed algorithm starts with initially unmarked all PO terms, and iteratively identifies unmarked PO terms closest to a predefined IC until all PO terms are marked (Figure 4). To make sure that one and only one PO term can be identified per path in DAG, the following constraints should be met: If multiple PO terms with identical IC are identified in the same path, those parental terms are filtered out; once a PO term is identified, all terms in the path in which that term is located will be marked for being immune from further search.

    Last, the outputs are those identified PO terms with IC falling in the range. We run the algorithm using each of four seed ICs (i.e., 0.5, 1, 1.5 and 2) to create SDPO, respectively corresponding to PO terms with four levels (least informative, moderately informative, informative, highly informative).

Figure 4. Illustration of the algorithm how to iteratively create structural domains mouse phenotype ontology (SDPO). I). Initially, all PO terms in DAG are unmarked (open circles); II). Identify those unmarked PO terms (filled in pink) with IC closest to a predefined IC (e.g., 1); III). Filter out those parental PO terms from identified PO terms in Step II. IV). Mark PO terms identified as well as all of their ancestors and descendants. V-VI). Continue the Steps II-IV to iteratively identify unmarked PO terms until all PO terms are marked. VII). Output only those identified PO terms with IC falling in the range (e.g., [0.75 1.25]) as SDPO.


Data Availability

Jump to [ Top · Domain2PO · SDPO · Data availability ]

In additional to two hierarchies (SCOP Hierarchy, or PO Hierarchy) for the browsing, we here also provide Domain2PO mapping results in two parsable formats (i.e., plain files and mysql tables). The meanings of abbreviations below are explained in the browsable hierarchies.

Domain2PO mapping plain files

    Domain2DO mapping results
    • High-coverage domain-centric DO annotations are available in the Domain2DO.txt file.

    • Statistics for the Domain2DO annotations are summarized in two forms: 1) SCOP hierarchy with the number of DO terms (direct and inherited), available in the Domain2DO_SCOP.obo file. 2) DO hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2DO_DO.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
    • DO terms which are regarded as SDDO (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDDO.txt file. We highly recommend users to use these DO terms and their annotating domains from Domain2DO.txt. Unlike the whole DO hierarchy, those DO terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDDO corresponds to each of two SCOP domain types (i.e., FA and SF ).
    Domain2HP mapping results
    • High-coverage domain-centric HP annotations are available in the Domain2HP.txt file.

    • Statistics for the Domain2HP annotations are summarized in two forms: 1) SCOP hierarchy with the number of HP terms (direct and inherited; three HP sub-ontologies: MI, ON and PA), available in the Domain2HP_SCOP.obo file. 2) HP hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2HP_HP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
    • HP terms which are regarded as SDHP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDHP.txt file. We highly recommend users to use these HP terms and their annotating domains from Domain2HP.txt. Unlike the whole HP hierarchy, those HP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDHP corresponds to each of three HP sub-ontologies (i.e., MI, ON and PA ) at each of two SCOP domain types (i.e., FA and SF ).
    Domain2MP mapping results
    • High-coverage domain-centric MP annotations are available in the Domain2MP.txt file.

    • Statistics for the Domain2MP annotations are summarized in two forms: 1) SCOP hierarchy with the number of MP terms (direct and inherited), available in the Domain2MP_SCOP.obo file. 2) MP hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2MP_MP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
    • MP terms which are regarded as SDMP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDMP.txt file. We highly recommend users to use these MP terms and their annotating domains from Domain2MP.txt. Unlike the whole MP hierarchy, those MP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDMP corresponds to each of two SCOP domain types (i.e., FA and SF ).
    Domain2WP mapping results
    • High-coverage domain-centric WP annotations are available in the Domain2WP.txt file.

    • Statistics for the Domain2WP annotations are summarized in two forms: 1) SCOP hierarchy with the number of WP terms (direct and inherited), available in the Domain2WP_SCOP.obo file. 2) WP hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2WP_WP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
    • WP terms which are regarded as SDWP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDWP.txt file. We highly recommend users to use these WP terms and their annotating domains from Domain2WP.txt. Unlike the whole WP hierarchy, those WP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDWP corresponds to each of two SCOP domain types (i.e., FA and SF ).
    Domain2YP mapping results
    • High-coverage domain-centric YP annotations are available in the Domain2YP.txt file.

    • Statistics for the Domain2YP annotations are summarized in two forms: 1) SCOP hierarchy with the number of YP terms (direct and inherited), available in the Domain2YP_SCOP.obo file. 2) YP hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2YP_YP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
    • YP terms which are regarded as SDYP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDYP.txt file. We highly recommend users to use these YP terms and their annotating domains from Domain2YP.txt. Unlike the whole YP hierarchy, those YP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDYP corresponds to each of two SCOP domain types (i.e., FA and SF ).
    Domain2FP mapping results
    • High-coverage domain-centric FP annotations are available in the Domain2FP.txt file.

    • Statistics for the Domain2FP annotations are summarized in two forms: 1) SCOP hierarchy with the number of FP terms (direct and inherited), available in the Domain2FP_SCOP.obo file. 2) FP hierarchy with the number of domains (direct and inherited; two SCOP levels: FP, SF, CF and CL), available in the Domain2FP_FP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
    • FP terms which are regarded as SDFP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDFP.txt file. We highly recommend users to use these FP terms and their annotating domains from Domain2FP.txt. Unlike the whole FP hierarchy, those FP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDFP corresponds to each of two SCOP domain types (i.e., FA and SF ).
    Domain2FA mapping results
    • High-coverage domain-centric FA annotations are available in the Domain2FA.txt file.

    • Statistics for the Domain2FA annotations are summarized in two forms: 1) SCOP hierarchy with the number of FA terms (direct and inherited), available in the Domain2FA_SCOP.obo file. 2) FA hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2FA_FA.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
    • FA terms which are regarded as SDFA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDFA.txt file. We highly recommend users to use these FA terms and their annotating domains from Domain2FA.txt. Unlike the whole FA hierarchy, those FA terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDFA corresponds to each of two SCOP domain types (i.e., FA and SF ).
    Domain2ZA mapping results
    • High-coverage domain-centric ZA annotations are available in the Domain2ZA.txt file.

    • Statistics for the Domain2ZA annotations are summarized in two forms: 1) SCOP hierarchy with the number of ZA terms (direct and inherited), available in the Domain2ZA_SCOP.obo file. 2) ZA hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2ZA_ZA.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
    • ZA terms which are regarded as SDZA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDZA.txt file. We highly recommend users to use these ZA terms and their annotating domains from Domain2ZA.txt. Unlike the whole ZA hierarchy, those ZA terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDZA corresponds to each of two SCOP domain types (i.e., FA and SF ).
    Domain2XA mapping results
    • High-coverage domain-centric XA annotations are available in the Domain2XA.txt file.

    • XA terms which are regarded as SDXA (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDXA.txt file. We highly recommend users to use these XA terms and their annotating domains from Domain2XA.txt. Unlike the whole XA hierarchy, those XA terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDXA corresponds to each of two XA sub-ontologies (i.e., XAN and XDE ) at each of two SCOP domain types (i.e., FA and SF ).
    Domain2AP mapping results
    • High-coverage domain-centric AP annotations are available in the Domain2AP.txt file.

    • Statistics for the Domain2AP annotations are summarized in two forms: 1) SCOP hierarchy with the number of AP terms (direct and inherited; two AP sub-ontologies: AN and DE), available in the Domain2AP_SCOP.obo file. 2) AP hierarchy with the number of domains (direct and inherited; two SCOP levels: FA and SF), available in the Domain2AP_AP.obo file. With the help of OBO-Edit, it is easy to browse these two obo format files.
    • AP terms which are regarded as SDAP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDAP.txt file. We highly recommend users to use these AP terms and their annotating domains from Domain2AP.txt. Unlike the whole AP hierarchy, those AP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDAP corresponds to each of two AP sub-ontologies (i.e., PAN and PDE ) at each of two SCOP domain types (i.e., FA and SF ).
    Domain2EC mapping results
    • High-coverage domain-centric EC annotations are available in the Domain2EC.txt file.

    • EC terms which are regarded as SDEC (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDEC.txt file. We highly recommend users to use these EC terms and their annotating domains from Domain2EC.txt. Unlike the whole EC hierarchy, those EC terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDEC corresponds to each of two SCOP domain types (i.e., FA and SF ).
    Domain2DB mapping results
    • High-coverage domain-centric DB annotations are available in the Domain2DB.txt file.

    • DB terms which are regarded as SDDB (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDDB.txt file. We highly recommend users to use these DB terms and their annotating domains from Domain2DB.txt. Unlike the whole DB hierarchy, those DB terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDDB corresponds to each of four SCOP domain types (i.e., FA and SF ).
    Domain2KW mapping results
    • High-coverage domain-centric KW annotations are available in the Domain2KW.txt file.

    • KW terms which are regarded as SDKW (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDKW.txt file. We highly recommend users to use these KW terms and their annotating domains from Domain2KW.txt. Unlike the whole KW hierarchy, those KW terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDKW corresponds to each of two SCOP domain types (i.e., FA and SF ).
    Domain2UP mapping results
    • High-coverage domain-centric UP annotations are available in the Domain2UP.txt file.

    • UP terms which are regarded as SDUP (four levels: least informative, moderately informative, informative, and highly informative ) can be found in the SDUP.txt file. We highly recommend users to use these UP terms and their annotating domains from Domain2UP.txt. Unlike the whole UP hierarchy, those UP terms at different granularity are representative and comprehensive in terms of their relevance to domains (not proteins). Keep it in mind that SDUP corresponds to each of two SCOP domain types (i.e., FA and SF ).
Domain2PO mapping MySQL tables
    We use four tables (Domain2PO.sql.gz) below to store info described above (i.e., Domain2PO mapping results):

    PO_info: containing info about PO terms.
        > DESC PO_info;
        +------------+---------------------+------+-----+---------+-------+
        | Field      | Type                | Null | Key | Default | Extra |
        +------------+---------------------+------+-----+---------+-------+
        | obo        | char(2)             | NO   | PRI | NULL    |       |
        | po         | varchar(20)         | NO   | PRI | NULL    |       |
        | namespace  | varchar(50)         | NO   |     | NULL    |       |
        | name       | varchar(255)        | NO   | MUL | NULL    |       |
        | synonym    | text                | YES  |     | NULL    |       |
        | definition | text                | YES  |     | NULL    |       |
        | distance   | tinyint(3) unsigned | NO   |     | NULL    |       |
        +------------+---------------------+------+-----+---------+-------+
        
    • The obo column indicates the type of PO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
    • The po column is the corresponding PO id. It is browsable via PO Hierarchy.
    • The namespace column can be one of three GO sub-ontologies, otherwise root.
    • The name column shows the full name of PO terms.
    • The synonym column is the synonym of PO terms.
    • The definition column is the definition of PO terms.
    • The distance column shows the distance of PO terms to the corresponding sub-ontology.

    PO_hie: containing info about PO hierarchy.
        > DESC PO_hie;
        +----------+---------------------+------+-----+---------+-------+
        | Field    | Type                | Null | Key | Default | Extra |
        +----------+---------------------+------+-----+---------+-------+
        | obo      | char(2)             | NO   | PRI | NULL    |       |
        | parent   | varchar(20)         | NO   | PRI | NULL    |       |
        | child    | varchar(20)         | NO   | PRI | NULL    |       |
        | distance | tinyint(3) unsigned | NO   | PRI | NULL    |       |
        +----------+---------------------+------+-----+---------+-------+
        
    • The obo column indicates the type of PO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
    • The parent column is the parental PO id.
    • The child column is the child PO id.
    • The distance column shows the distance of parental PO id to child PO id. 1 for direct parent-child relationships, others indicating the existance of a path between them (reachable but indirect).

    PO_mapping: containing info about Domain2PO annotations.
        > DESC PO_mapping;
        +----------------+---------------------------+------+-----+---------+-------+
        | Field          | Type                      | Null | Key | Default | Extra |
        +----------------+---------------------------+------+-----+---------+-------+
        | id             | mediumint(8) unsigned     | NO   | PRI | NULL    |       |
        | level          | enum('cl','cf','sf','fa') | NO   |     | NULL    |       |
        | obo            | char(2)                   | NO   |     | NULL    |       |
        | po             | varchar(20)               | NO   | PRI | NULL    |       |
        | all_score      | double                    | NO   |     | 1       |       |
        | inherited_from | text                      | YES  |     | NULL    |       |
        +----------------+---------------------------+------+-----+---------+-------+
        
    • The id is the SCOP unique identifier, sunid. It is browsable via SCOP Hierarchy.
    • The level in the SCOP hierarchy. Can be one of 'cl' for class, 'cf' for fold, 'sf' for superfamily, 'fa' for family.
    • The obo column indicates the type of PO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
    • The po column is the corresponding PO id.
    • The all_score column is the FDR supported by all longest-transcript human genes/proteins (including multidomain proteins).
    • The inherited_from column is to mark the status of Domain2PO predicted annotations. 1) If it is marked with 'directed' (i.e., 'all_score'<0.001), Domain2PO is significantly supported only by all longest-transcript human genes/proteins (including multidomain proteins). 2) If it is a comma separated list of PO id (the column 'all_score' is not less than 0.001), Domain2PO is inherited from any descentant PO terms (significantly associated) when applying true-path rule in DAG. 3) Empty otherwise. Hence, the lists of Domain2PO can be obtained by selecting the column 'inherited_from' with NOT EPOTY.

    PO_ic: containing info about SDPO.
        > DESC PO_ic;
        +---------+---------------------------+------+-----+---------+-------+
        | Field   | Type                      | Null | Key | Default | Extra |
        +---------+---------------------------+------+-----+---------+-------+
        | level   | enum('cl','cf','sf','fa') | NO   | PRI | NULL    |       |
        | obo     | char(2)                   | NO   |     | NULL    |       |
        | po      | varchar(20)               | NO   | PRI | NULL    |       |
        | ic      | double                    | YES  |     | NULL    |       |
        | include | tinyint(2)                | YES  | MUL | NULL    |       |
        +---------+---------------------------+------+-----+---------+-------+
        
    • The level in the SCOP hierarchy. Can be one of 'cl' for class, 'cf' for fold, 'sf' for superfamily, 'fa' for family.
    • The obo column indicates the type of PO. Can be one of 'DO' for 'Disease Ontology', 'HP' for 'Human Phenotype', 'MP' for 'Mouse Phenotype', 'WP' for 'Worm Phenotype', 'YP' for 'Yeast Phenotype', 'FP' for 'Fly Phenotype', 'FA' for 'Fly Anatomy', 'ZA' for 'Zebrafish Anatomy', 'AP' for 'Arabidopsis Plant'.
    • The po column is the corresponding PO id.
    • The ic column shows the infomration content of the PO term.
    • The include column indicates whether or not the PO term belongs to the SDPO. If the column is set to '0' then it is not a member of SDPO. Otherwise, '1' for least informative (i.e., the most general), '2' for moderately informative, '3' for informative, '4' for highly informative (i.e., the most specific).


References

Jump to [ Top · Domain2PO · SDPO · References ]

  • Andreeva, A., Howorth, D., Chandonia, J.M., Brenner, S.E., Hubbard, T.J., Chothia, C. and Murzin, A.G. (2008) Data growth and its impact on the SCOP database: new developments, Nucleic Acids Res, 36, D419-425. Abstract [ PubMed ]  
  • Bairoch, A., Apweiler, R., et al. (2005) The Universal Protein Resource (UniProt), Nucleic Acids Res, 33, D154-9. Abstract [ PubMed ]  
  • Benjamini, Y. and Hochberg, Y. (1995) Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing, Journal of the Royal Statistical Society Series B-Methodological, 57, 289-300. Abstract [ PubMed ]  
  • Bowes, J. B., Snyder, K. A., Segerdell, E., Jarabek, C. J., Azam, K., Zorn, A. M., and Vize, P. D. (2009) Xenbase: gene expression and improved integration, Nucleic Acids Res, 38, D607-12. Abstract [ PubMed ]  
  • Bradford, Y., Conlin, T., Dunn, N., et al. (2011) ZFIN: enhancements and updates to the Zebrafish Model Organism Database, Nucleic Acids Res, 39, D822-9. Abstract [ PubMed ]  
  • Engel, S. R., Balakrishnan, R., Binkley, G., et al. (2010) Saccharomyces Genome Database provides mutant phenotype data, Nucleic Acids Res, 38, D433-6. Abstract [ PubMed ]  
  • Fleischmann, A., Darsow, M., Degtyarenko, K., Fleischmann, W., Boyce, S., Axelsen, K.B., Bairoch, A., Schomburg, D., Tipton, K.F. and Apweiler, R. (2004) IntEnz, the integrated relational enzyme database, Nucleic Acids Res, 32, D434-7. Abstract [ PubMed ]  
  • Gough, J. (2006) Genomic scale sub-family assignment of protein domains, Nucleic Acids Res, 34, 3625-3633. Abstract [ PubMed ]  
  • Grumbling, G. and Strelets, V. (2006) FlyBase: anatomical data, images and queries, Nucleic Acids Res, 34, D484-8. Abstract [ PubMed ]  
  • Ilic, K., Kellogg, E. A., Jaiswal, P., et al. (2006) The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant, Plant Physiol, 143, 587-99. Abstract [ PubMed ]  
  • Morgat, A., Coissac, E., et al. (2006) UniPathway: a resource for the exploration and annotation of metabolic pathways, Nucleic Acids Res, 40, D761-9. Abstract [ PubMed ]  
  • Osborne,J.D., Flatow,J., Holko,M., Lin,S.M., Kibbe,W.A., Zhu,L.J., Danila,M.I., Feng,G. and Chisholm,R.L. (2009) Annotating the human genome with Disease Ontology. BMC Genomics, 10, S1–S6. Abstract [ PubMed ]  
  • Pujar, A., Jaiswal, P., Kellogg, E. A., et al. (2006) Whole-plant growth stage ontology for angiosperms and its application in plant biology, Plant Physiol, 142, 414-28. Abstract [ PubMed ]  
  • Robinson, P.N., Kohler, S., Bauer, S., Seelow, D., Horn, D. and Mundlos, S. (2008) The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet, 83, 610-615. Abstract [ PubMed ]  
  • Knox, C., Law, V., et al. (2011) DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Nucleic Acids Res, 39, D1035-41. Abstract [ PubMed ]  
  • Schindelman, G., Fernandes, J. S., Bastiani, C. A., Yook, K. and Sternberg, P. W. (2011) Worm Phenotype Ontology: integrating phenotype data within and beyond the C. elegans community, BMC Bioinformatics, 12:32. Abstract [ PubMed ]  
  • Schriml LM, Arze C, Nadendla S, et al. (2012) Disease Ontology: A backbone for disease semantic integration. Nucleic Acids Res, 40, D940-D946. Abstract [ PubMed ]  
  • Smith, C.L. and Eppig, J.T. (2009) The Mammalian Phenotype Ontology: enabling robust annotation and comparative analysis, Wiley Interdiscip Rev Syst Biol Med, 1, 390-399. Abstract [ PubMed ]