In Japanese
Last Updated $Date: 2007/08/15 02:56:42 $

About WoLF PSORT

WoLF PSORT predicts the subcellular localization sites of proteins based on their amino acid sequences. The method, which is a major extension to the venerable PSORTII program, makes predictions based on both known sorting signal motifs and some correlative sequence features such as amino acid content. Like PSORT and PSORTII, WoLF PSORT displays some information about detected sorting signals which is useful in helping users determine the reliability of the prediction in specific cases. Our experiments (presented at APBC06) show that the overall prediction accuracy of WoLF PSORT is over 80%. For common localization sites (e.g. cytosol, nucleus, mitochondria, etc) WoLF PSORT makes better than majority classifier predictions even for queries that do not have strong sequence similarity to any sequence in the dataset. Thus WoLF PSORT is a useful complement to tools such as BLAST. The current dataset used to train WoLF PSORT contains over 12,000 animal sequences and more than 2,000 plant and fungi sequences respectively. It was gathered mainly from Uniprot but several hundred Arabidopsis thaliana sequences from the Gene Ontology database were also included.

Citation

  • Server:

    Paul Horton, Keun-Joon Park, Takeshi Obayashi, Naoya Fujita, Hajime Harada, C.J. Adams-Collier, & Kenta Nakai,
    "WoLF PSORT: Protein Localization Predictor",
    Nucleic Acids Research, doi:10.1093/nar/gkm259, 2007.
    [Abstract] [Paper]
  • Prediction Method:

    Paul Horton, Keun-Joon Park, Takeshi Obayashi & Kenta Nakai,
    "Protein Subcellular Localization Prediction with WoLF PSORT",
    Proceedings of the 4th Annual Asia Pacific Bioinformatics Conference APBC06, Taipei, Taiwan. pp. 39-48, 2006.
    [Abstract] [Paper]

    Developers

    WoLF PSORT is being developed by

    Dataset

    The dataset is based mainly on annotation from Uniprot and Gene Ontology. The table below gives a correspondence between our localization site definitions and Gene Ontology. However, many of our entries are based solely on Uniprot "Subcellular Localization" field keywords and in some of these cases the site assignment may not be completely consistent with the GO cellular component annotation.

    Localization Sites and corresponding GO cellular components.
    AbbrevLocalization Site GO Cellular Component
    chlochloroplast 0009507, 0009543
    cytocytosol 0005829
    cyskcytoskeleton 0005856(2)
    E.R.endoplasmic reticulum 0005783
    extrextracellular 0005576, 0005618
    golgGolgi apparatus 0005794(1)
    lysolysosome 0005764
    mitomitochondria 0005739
    nuclnuclear 0005634
    peroperoxisome 0005777(2)
    plasplasma membrane 0005886
    vacuvacuolar membrane 0005774(2)

    Abbreviation, Localization Site, and corresponding GO Cellular Component(s) are given for each localization site. Numbers in parentheses, such as "0005856(2)" indicate that descendant "part_of" cellular components were also included, up to the specified depth (2 in this case). For example, all of the children and grandchildren of "GO:0005856" were included as "cysk".

    Stand alone Package

    WoLF PSORT package version 0.2 has been released September 2006. It is academic free and also relatively easy for industrial users to use as well. Please see the package documentation for details.

    Prediction Accuracy by Localization Site

    The accuracy varies greatly between different localization sites -- the general trend being that sites with few uniprot annotated proteins are seldom correctly predicted. In a separate localization accuracy by utility page, we have compiled some statistics to help answer this question quantitatively.

    What's in a name

    "WoLF" does not necessarily stand for anything. A rather dramatic mnemonic would be "Where Life Functions". Originally it was going to be "Learned Weight Features" but I wanted the acronym to be a pronouncable English word. Women only Love Fools.

    Acknowledgements


    seqTeam CBRC AIST Copyright (C) National Institute of Advanced Science and Technology (AIST), Computational Biology Research Center (CBRC). All Rights Reserved.