Protein Annotation Tools
Brian Tam
UNDER ACTIVE DEVELOPMENT!!
15 Aug 00 (v. 0.7.4)
Contact rik@cs.ucsd.edu
for its current status.
The goal of "Annotated BLAST" (AnnBlast) search is to exploit
annotation relations connecting the biomedical
literature to (gene and protein) sequence
databases to better discover new patterns in the data and a
deeper appreciation of what the texts mean. For example, newly
discovered homologies between proteins can mean that biologists
working on entirely different organisms and systems may have
actually been using two different vocabularies to describe
similar phenomena. New knowledge can be gained by recognizing
implicit connections that have gone previously unnoticed. By
establishing interrelationships among bits of information
scattered throughout the literature, we hope to supplement
routine protein alignment/literature searches with
identification of novel associations that might be of interest
to workers in the field.
We are actively developing software that exploits known
similarities among sequences and among texts, and then uses
annotations between these two different types of data as input
to adaptive mechanisms that respond to browsing users'
"relevance feedback." (A VERY preliminary version of the AnnBlast
interface is available.) In preparation we have attempted to
survey closely-related resources already existing on the WWW.
We have focused in particular on taxonomic classifications, both
as applied to sequence information and to the related
literatures. We hope the resulting list of bookmarks may also
be of use to others.
These references are organized below into two groups, the first
concerning macromolecule (sequence) data, and the second related
lexical (literature) data sources.
Document index
Macromolecular Analysis
Sequences
Classification tools
- MIPS:
Protein Classification
-
MIPS = Munich Information Center for Protein Sequences
This tool categorizes proteins in the PIR (Protein Information Resource)-International
Protein Sequence Database by sequence homology.
- GeneFIND
Family Identification System Home
-
GeneFIND = Gene Family Identification Network Design
Georgetown University
Dr. Cathy Wu
- ProClass Database
Home
-
Georgetown University
Dr. Cathy Wu
- ProtoMap
-
Hebrew University
Authors: Golan Yona et al.
An automatic hierarchical classification of all
SWISSPROT proteins.
- ProtoMap @ Stanford
automatic hierarchical classification of proteins
- ProteinInfo
-
Proteometrics, LLC,
New York, NY
A set of databases and tools for analyzing protein mass spectrometry data.
Search tools
Structure
Non-proteins
RNA
- The RNA World at IMB Jena
-
IMB = Institut für Molekulare Biotechnologie
Jürgen Sühnel
List of links.
- Bacterial RNase P RNA sequences
-
NC State University
James W. Brown
Ribonuclease P RNA is a ribozyme, RNA that is catalytically active.
In this case, it cleaves other RNA in a final processing step.
Viruses
- Virology Information
-
SCIENCE.ORG™ Virology Laboratory
- Viral Classification
and Replication
Peptides
Proteins
- EBI:
FSSP database, fold classification based on structure-structure alignment
of proteins
-
EBI = European Bioinformatics Institute, FSSP = Fold classification based
on Structure-Structure alignment of Proteins
European Molecular Biology Laboratory (EMBL), Heidelberg
L. Holm
- 3Dee - Database
of Protein Domain Definitions
-
3Dee = A Database of Protein Domain Definitions
Laboratory of Molecular Biophysics, Oxford, UK;
EMBL - European Bioinformatics Institute, Cambridge, UK
Authors: Asim S. Siddiqui, Uwe Dengler, Geoffrey J. Barton
- SCOP: Structural Classification of
Proteins
-
MRC Laboratory of Molecular Biology and Centre for Protein Engineering,
Cambridge, England
Authors: Alexey G. Murzin et al.
Function
General
- DEAMBULUM
: Protein families
-
INFOBIOGEN - Université René Descartes
List of hyperlinks on proteins organized according to structure, activity,
and biological function. Even amino acid peptides considered too small
to be proteins have pages linked here.
- ExPASy Molecular Biology Server
-
ExPASy = Expert Protein Analysis System
Swiss Institute of Bioinformatics (SIB)
- Additional Protein Resource
Sites
-
proWeb project, a WWW-based approach to protein family documentation
blocks.fhcrc.org
Contains links to pages listing proteins belonging to a specific classification;
e.g., a function like ATPases, or a domain like homeoboxes.
Enzymes
- ExPASy - ENZYME
-
ExPASy = Expert Protein Analysis System
Swiss Institute of Bioinformatics (SIB)
Kinases
- Protein Kinase Resource
-
San Diego Supercomputer Center
Transcription Factors
- TRANSFAC - The Transcription Factor Database
-
Center of Bioinformatics
Peking University
Transport proteins
- Transport
Protein Overview
-
Department of Biology, University of California, San Diego
Authors: Milton Saier, Ian Paulsen
Cf. Organism/General/Transport Protein Overview
Mitochondrial proteins
- MITOP -
Home
-
MITOP = MITOchondria Project
Collaboration of several German institutions, including MIPS (Munich
Information Center for Protein Sequences)
Bacterial proteins
- COG
-
COGs = Clusters of Orthologous Groups of proteins
National Center for Biotechnology Information (NCBI)
Phylogenetic classification of proteins encoded
in complete genomes
Cf. Organism/Bacteria/COG
Homeobox Genes
-
Homeobox genes are mainly DNA-binding proteins related to one another by
a conserved DNA motif, the "homeobox".
The homeobox page
-
Biozentrum of the University of Basel, Switzerland
Thomas R. Bürglin
An update on a book, plus references to the latest papers on the topic.
There is also a relationship tree, plus links to other homeobox pages.
Human Major Histocompatability Complex
- IMGT/HLA Database Nomenclature
Guidelines
-
IMGT = the international ImMunoGeneTics database, HLA = Human Leucocyte
Antigens
Centre Informatique National de l'Enseignement Supérieur (CINES),
Montpellier, France
Home page is IMGT/HLA Database.
Mixed Bags
-
This folder has pages about proteins from various classification groups,
not just one, though the groups may be inter-related somehow.
Introduction
-
PROLYSIS, a protease and protease inhibitor Web server
University of Tours, France
Creator: Dr. Thierry Moreau, Laboratory of Enzymology and Protein Chemistry,
-
University François Rabelais, Tours, France
Proteases, proteinases, and peptidases galore! Introduction page to these
classes of proteins.
Organism
General
- Transport Protein
Analysis
-
Milton Saier
Department of Biology, University of California, San Diego
Ian Paulsen
Cf. Function/Transport
proteins/Transport Protein Overview
- Welcome to MIPS
-
Munich Information Center for Protein Sequences
Bacteria
- COG
-
Clusters of Orthologous Groups of proteins
National Center for Biotechnology Information (NCBI)
Cf. Function/Bacterial proteins/COG
Yeast (S. cerevisiae)
- Sacch3D Home
-
an extension of the Saccharomyces Genome Database™
Stanford University
Steve A. Chervitz
- S. cerevisaiae
Protein Kinases
-
Protein Kinase Resource
San Diego Supercomputer Center
Tony Hunter, Gregory D. Plowman
Worm (C. Elegans)
- Caenorhabditis elegans WWW Server
-
Fly (Drosophila)
- FlyBase
-
Univ. Indiana
A database of the Drosophila Genome
Human
- OMIM Home Page -- Online Mendelian Inheritance in Man
-
National Center for Biotechnology Information (NCBI)
Dr. Victor A. McKusick et al., Johns Hopkins University
Links genes/proteins to diseases.
- GeneCards: human genes, maps, proteins and diseases (Weizmann)
-
Crown Human Genome Center and Bioinformatics Unit
Weizmann Institute of Science, Israel
Lexical Resources
The following resources have been organized according to the
amount of semantic rigor with which they attempt to define their
terms. Ontologies are most ambitious, defining concepts in
terms of concrete attributes with well-defined logical relations
among them. Systematics refers to pre-genomic classification
systems that have been used to organize biological species. It is
usually divided into two fields: phylogenetics, which deals with the
relationships between organisms, and taxonomy, which names and
classifies organisms. A
nomenclature is a set of rules for naming objects -- e.g.,
proteins here -- according to a certain classification.
Thesauri organize vocabularies using broader/narrower-term,
related-term and preferred term relationships. Dictionaries
provide natural language definitions for individual terms
Ontologies
- Gene Ontology
Consortium
-
Collaborative effort to unite terminologies across yeast, fly
and mouse
Michael Ashburner (EBI)
Suzanna Lewis (UCB)
Mike Cherry (Stanford)
Judy Blake (JAX)
- Distributed
Annotation System
-
(Not really an ontology, but...) An emerging effort to
coordinate the annotation activities across large groups
of individuals.
- ARROWSMITH: A MEDICAL DISCOVER SUPPORT SYSTEM
-
University of Chicago
Swanson's ARROWSMITH for scientific discovery.
- AbXtract server
-
EMBL-European Bioinformatics Institute (EBI)
Cambridge, UK
Keyword extraction for protein annotation
Systematics
Taxonomy
- Taxonomy
on the Web
- IWR:
Taxon Pages
-
IWR = Ichthyology Web Resources
Department of Biological Sciences, University of Alberta,
Canada
Keith L. Jackson
A classification of fishes. Go to Ichthyology
Web Resources for more ichthyology resource links.
- Some
Cephalopod Species
-
Dalhousie University, Halifax, Nova Scotia, Canada
James B. Wood
A classification of cephalopods (octopi,
squids, and other tentacled beasties!). Go to The
Cephalopod Page; Octopuses, Squid, Cuttlefish, and
Nautilus for more resource links.
Phylogenetics
- TreeBASE
-
Harvard University Herbaria
A relational database of phylogenetic information.
Builds phylogenetic trees for a query organism.
Nomenclature
Broad, general vocabularies
- nomlist.txt
-
Expert Protein Analysis System (ExPASy)
Swiss Institute of Bioinformatics (SIB)
- Human
Gene Nomenclature
-
HUGO Gene Nomenclature Committee (HGNC)
University College London
- Biochemical
Nomenclature Committees
-
International Union of
Pure and Applied Chemistry (IUPAC)
International Union
of Biochemistry and Molecular Biology (IUBMB)
IUPAC-IUBMB Joint Commission on Biochemical Nomenclature
(JCBN)
Nomenclature Committee of the IUBMB (NC-IUBMB)
Department of Chemistry, Queen Mary and Westfield
College, London, UK
Specific vocabularies
All pages herein are devoted to nomenclatures for
particular classes of proteins, rather than being
comprehensive.
- MT
page: classification
-
Universität Zürich
Pierre-Alain Binz, J.H.R. Kägi
Metallothioneins.
- Introduction
to Ski and Sno gene family
-
Pearson-White
Laboratory, Health Sciences Center, Charlottesville,
Virginia
A lab page on Ski and Sno nomenclature.
- EC
nomenclature
-
PROLYSIS, a protease and protease inhibitor Web
server
University of Tours, France
Proteases, proteinases, and peptidases
- Nomenclature
conventions / listing of gene family search tag
names
-
Medical Research Council/University of Leicester,
Center for Mechanisms of Human Toxicity, UK
Ion channels
Proposed/Under
review
- Gene
Family Nomenclature
-
HUGO Gene Nomenclature Committee (HGNC)
University College London
Thesauri
- Help: Life
Sciences Thesaurus
-
Cambridge Scientific Abstracts, Bethesda, MD
Terms are hyperlinked alphabetically. A service of
Cambridge Scientific Abstracts.
- The CERES
Thesaurus Effort
-
CERES = California Environmental Resources Evaluation System
California/Federal government (National Biological Information
Infrastructure (NBII)) project to compile a
thesaurus and search tool for environmental science
terminology. Contains links to web pages with such
terms.
- OMNI: Organising Medical Networked
Information
-
Search tool for links to pages on medical
conditions. Each entry has a list of related
keywords.
Cf. Dictionaries/Medicine/OMNI:
...
- MeSH
Browser
-
MeSH = Medical Subject Headings
National Libary of Medicine search tool that retrieves
relevant terms and references in hierarchical fashion on
a query.
Dictionaries
(Note: ' (*) ' means that many
definitions at the given site provide cross-reference
hyperlinks to other terms in the dictionary on the same
site.
'(#)' means that the pages in question
feature search engine-like tools to look up words in a
local database.)
General Biological
- BioTech's
Life Science Dictionary
-
Institute for Cellular and Molecular Biology
University of Texas
Austin, Texas
(#)
- BioABACUS
Search
-
BioABACUS = Biotechnology ABbreviation and ACronym Uncovering Service
Molecular Biology Program
New Mexico State University
Mendell Rimer, Mary O'Connell
Browse vocabulary list grouped by
biological subcategory. Also contains a search engine.
(*) (#)
- Harcourt:
AP Dictionary of Science and Technology: Life
Sciences
-
AP = Academic Press
Harcourt, Inc.
Vocab lists categorized by subfields.
Note: May be privately owned.
(#)
- Kimball's
Biology Pages
-
Dr. John. W. Kimball, former Harvard lecturer
May be based on proprietary data.
- Glossaries,
dictionaries, terminology & acronyms
-
BIOSIS, Zoological Society of London
Comprehensive list of links to terminology in many
biological subfields.
- Contents
-
International Union of Pure and Applied Chemistry (IUPAC)
Department of Chemistry
Queen Mary and Westfield College
London, UK
Glossary of terms used in inorganic chemistry
- The Biospace Glossary: Defining the Words that Define Biotechnology
-
Biospace.com, San Francisco, CA
Research fields
Subfields within the biological sciences have
specialized vocabulary. If a web site contains one or
more of these, rather than try to be comprehensive, it
is put here.
- A
Hypermedia Glossary of Genetic Terms
-
Technische Universität München - Weihenstephan
Weihenstephan Information and Documentation Centre IDW
Freising, Germany
Birgid Schlindwein
Look up
terms alphabetically yourself. Definitions contain
related terms.
(*) (#)
- A
Genetics Glossary
-
Biology Teaching Organisation
Edinburgh School of Biology
The University of Edinburgh
Terms grouped and hyperlinked alphabetically.
- Glossary
-
Human Genome Management Information System
Oak Ridge National Laboratory
Denise Casey, Dan Jacobson
All terms listed alphabetically on one page.
- The Genomics
Lexicon
-
Pharmaceutical Research and Manufacturers of
America (PhRMA),
Foundation for Genetic Medicine, Inc. (FGM)
Mostly genetic terms here. A few
cross-references. Grouped and hyperlinked
alphabetically. Also links to other specialized
glossaries.
- Glossary
of Biochemistry and Molecular Biology
-
Portland Press
David M. Glick
Need to mark a letter and press "search" to extract
all terms beginning with that letter.
(#)
- Search
-
The Forsyth Institute, Boston, MA
Dr. Tsute Chen
A microbiology dictionary.
(*) (#)
- The PPS
Hyperglossary
-
A small glossary of protein/genetic
structure terms.
- Definitions
and Abbreviations
-
List of Bacterial Names with Standing in Nomenclature
Ecole Nationale Vétérinaire de Toulouse
Toulouse, France
J.P. Euzéby
Mostly bacteria-related.
Cf. List
of bacterial names with standing in nomenclature
for links to bacteria nomenclature
- Dictionary
of Epidemiology
-
University of Cambridge
Alphabetically listed terms from ecological
epidemiology.
(*)
- Dictionary
of Cell Biology
-
Cell and Molecular Biology degree course
Glasgow University
Julian Dow
May be a proprietary site.
(#)
- BIOTECHNOLOGY
DICTIONARY
-
Department of Crop and Soil Environmental Sciences
College of Agriculture and Life Sciences
Virginia Polytechnic Institute and State University
Blacksburg, Virginia
Susan Allender-Hagedorn and Charles Hagedorn
Agricultural and environmental biotechnology annotated
dictionary
- Visionary
- A Dictionary of terminology in vision research
-
Dr. Lars Lidén
Dept. of Cognitive and Neural Systems
Boston University
Vision research, including machine vision.
- OceanLink: An
Interactive Information Page for the Marine
Sciences
-
Bamfield Marine Station
British Columbia, Canada
A marine science information and interaction web
site. Has a link to a glossary hyperlinked
alphabetically.
- Glossary
of Microscopy Terms
-
Characterization Facility
University of Minnesota
Probably not strictly biological, but biologically
related, for sure. Has a really horrid frames
interface, unfortunately; otherwise, this would have
been quite a useful site.
Plants
- Aquatic,
Wetland and Invasive Plant Glossary Title Page and
Contents
-
Univerisity of Florida - IFAS
Fort Lauderdale Research and Education Center
Fort Lauderdale, FL
Dave L. Sutton, Ph.D.
- Centre
for Plant Biodiversity Research
-
Centre for Plant Biodiversity Research and Australian National Herbarium
Canberra, Australia
Links to on-line glossaries of Australian flora.
- CalFlora
-
CalFlora Database Project
Member: University of California, Berkeley, Digital Library Project
California flora indexible by name.
(#)
- CAS
California Wildflowers
-
California Academy of Sciences
Common and Latin names of these flowers, plus families
Medicine
- On-line
Medical Dictionary
-
The Gray Laboratory Cancer Research Trust
Mount Vernon Hospital
Northwood, Middlesex, UK
(#)
- OnHealth:
Online Medical Dictionary
-
OnHealth Network Company
Terms hyperlinked alphabetically. Seems to be a copy of
On-line Medical Dictionary.
(#)
- Multilingual
Glossary of medical terms
-
Heymans Institute for Pharmacology, Medical School, University of Gent De Pintelaan,
and Mercator College, Department of Applied Linguistics
Gent, Belgium
Vocabularies in different languages.
- Pharmacology
Glossary
-
Department of Pharmacology and Experimental Therapeutics
Boston University
- OMNI:
Organising Medical Networked Information
-
OMNI / BIOME,
Greenfield Medical Library,
Queens Medical Centre,
Nottingham, UK
Search tool for links to pages on medical
conditions. Each entry has a list of related
keywords.
Cf. Thesauri/OMNI: ...
(#)
Generic language
References
Not specifically biology-related.
- ARTFL
Project: ROGET'S Thesaurus Search Form
-
ARTFL Project = Project for American and French Research
on the Treasury of the French Language
Division of the Humanities, University of Chicago
Director: Robert Morrissey
- WordNet
-
Cognitive Science Laboratory, Princeton University
An Electronic Lexical Database
- Eric Brill's tagger
-
Department of Computer Science
Johns Hopkins University
Part-of-speech taggers.
- Link Grammar
-
School of Computer Science, Carnegie Mellon University
Davy Temperley, Daniel Sleator, John Lafferty
The Link Grammar Parser (natural language parser)
- Rainbow
-
Department of Computer Science
Carnegie Mellon University
Andrew McCallum's package for text classification.
Last modified by: rik@cs.ucsd.edu 15 Aug
00