Protein Annotation Tools

Richard K. Belew

Brian Tam

UNDER ACTIVE DEVELOPMENT!!
15 Aug 00 (v. 0.7.4)

Contact rik@cs.ucsd.edu for its current status.


The goal of "Annotated BLAST" (AnnBlast) search is to exploit annotation relations connecting the biomedical literature to (gene and protein) sequence databases to better discover new patterns in the data and a deeper appreciation of what the texts mean. For example, newly discovered homologies between proteins can mean that biologists working on entirely different organisms and systems may have actually been using two different vocabularies to describe similar phenomena. New knowledge can be gained by recognizing implicit connections that have gone previously unnoticed. By establishing interrelationships among bits of information scattered throughout the literature, we hope to supplement routine protein alignment/literature searches with identification of novel associations that might be of interest to workers in the field.

We are actively developing software that exploits known similarities among sequences and among texts, and then uses annotations between these two different types of data as input to adaptive mechanisms that respond to browsing users' "relevance feedback." (A VERY preliminary version of the AnnBlast interface is available.) In preparation we have attempted to survey closely-related resources already existing on the WWW. We have focused in particular on taxonomic classifications, both as applied to sequence information and to the related literatures. We hope the resulting list of bookmarks may also be of use to others.

These references are organized below into two groups, the first concerning macromolecule (sequence) data, and the second related lexical (literature) data sources.



Document index


Macromolecular Analysis

Sequences

Classification tools

MIPS: Protein Classification
MIPS = Munich Information Center for Protein Sequences
This tool categorizes proteins in the PIR (Protein Information Resource)-International Protein Sequence Database by sequence homology.
GeneFIND Family Identification System Home
GeneFIND = Gene Family Identification Network Design
Georgetown University
Dr. Cathy Wu
ProClass Database Home
Georgetown University
Dr. Cathy Wu
ProtoMap
Hebrew University
Authors: Golan Yona et al.
An automatic hierarchical classification of all SWISSPROT proteins.
ProtoMap @ Stanford automatic hierarchical classification of proteins
ProteinInfo
Proteometrics, LLC,
New York, NY
A set of databases and tools for analyzing protein mass spectrometry data.

Search tools

Structure

Non-proteins

RNA

The RNA World at IMB Jena
IMB = Institut für Molekulare Biotechnologie
Jürgen Sühnel
List of links.
Bacterial RNase P RNA sequences
NC State University
James W. Brown Ribonuclease P RNA is a ribozyme, RNA that is catalytically active. In this case, it cleaves other RNA in a final processing step.

Viruses

Virology Information
SCIENCE.ORG™ Virology Laboratory
Viral Classification and Replication

Peptides

Proteins

EBI: FSSP database, fold classification based on structure-structure alignment of proteins
EBI = European Bioinformatics Institute, FSSP = Fold classification based on Structure-Structure alignment of Proteins
European Molecular Biology Laboratory (EMBL), Heidelberg
L. Holm
3Dee - Database of Protein Domain Definitions
3Dee = A Database of Protein Domain Definitions
Laboratory of Molecular Biophysics, Oxford, UK;
EMBL - European Bioinformatics Institute, Cambridge, UK
Authors: Asim S. Siddiqui, Uwe Dengler, Geoffrey J. Barton
SCOP: Structural Classification of Proteins
MRC Laboratory of Molecular Biology and Centre for Protein Engineering, Cambridge, England
Authors: Alexey G. Murzin et al.

Function

General

DEAMBULUM : Protein families
INFOBIOGEN - Université René Descartes
List of hyperlinks on proteins organized according to structure, activity,
and biological function. Even amino acid peptides considered too small
to be proteins have pages linked here.
ExPASy Molecular Biology Server
ExPASy = Expert Protein Analysis System
Swiss Institute of Bioinformatics (SIB)
Additional Protein Resource Sites
proWeb project, a WWW-based approach to protein family documentation
blocks.fhcrc.org
Contains links to pages listing proteins belonging to a specific classification; e.g., a function like ATPases, or a domain like homeoboxes.

Enzymes

ExPASy - ENZYME
ExPASy = Expert Protein Analysis System
Swiss Institute of Bioinformatics (SIB)

Kinases

Protein Kinase Resource
San Diego Supercomputer Center

Transcription Factors

TRANSFAC - The Transcription Factor Database
Center of Bioinformatics
Peking University

Transport proteins

Transport Protein Overview
Department of Biology, University of California, San Diego
Authors: Milton Saier, Ian Paulsen Cf. Organism/General/Transport Protein Overview

Mitochondrial proteins

MITOP - Home
MITOP = MITOchondria Project
Collaboration of several German institutions, including MIPS (Munich Information Center for Protein Sequences)

Bacterial proteins

COG
COGs = Clusters of Orthologous Groups of proteins
National Center for Biotechnology Information (NCBI)
Phylogenetic classification of proteins encoded in complete genomes
Cf. Organism/Bacteria/COG

Homeobox Genes

Homeobox genes are mainly DNA-binding proteins related to one another by a conserved DNA motif, the "homeobox".

The homeobox page

Biozentrum of the University of Basel, Switzerland
Thomas R. Bürglin
An update on a book, plus references to the latest papers on the topic. There is also a relationship tree, plus links to other homeobox pages.

Human Major Histocompatability Complex

IMGT/HLA Database Nomenclature Guidelines
IMGT = the international ImMunoGeneTics database, HLA = Human Leucocyte Antigens
Centre Informatique National de l'Enseignement Supérieur (CINES), Montpellier, France
Home page is IMGT/HLA Database.

Mixed Bags

This folder has pages about proteins from various classification groups,
not just one, though the groups may be inter-related somehow.

Introduction

PROLYSIS, a protease and protease inhibitor Web server
University of Tours, France
Creator: Dr. Thierry Moreau, Laboratory of Enzymology and Protein Chemistry,
University François Rabelais, Tours, France
Proteases, proteinases, and peptidases galore! Introduction page to these classes of proteins.

Organism

General

Transport Protein Analysis
Milton Saier
Department of Biology, University of California, San Diego
Ian Paulsen
Cf. Function/Transport proteins/Transport Protein Overview
Welcome to MIPS
Munich Information Center for Protein Sequences

Bacteria

COG
Clusters of Orthologous Groups of proteins
National Center for Biotechnology Information (NCBI)
Cf. Function/Bacterial proteins/COG

Yeast (S. cerevisiae)

Sacch3D Home
an extension of the Saccharomyces Genome Database™
Stanford University
Steve A. Chervitz
S. cerevisaiae Protein Kinases
Protein Kinase Resource
San Diego Supercomputer Center
Tony Hunter, Gregory D. Plowman

Worm (C. Elegans)

Caenorhabditis elegans WWW Server

Fly (Drosophila)

FlyBase
Univ. Indiana
A database of the Drosophila Genome

Human

OMIM Home Page -- Online Mendelian Inheritance in Man
National Center for Biotechnology Information (NCBI)
Dr. Victor A. McKusick et al., Johns Hopkins University
Links genes/proteins to diseases.
GeneCards: human genes, maps, proteins and diseases (Weizmann)
Crown Human Genome Center and Bioinformatics Unit
Weizmann Institute of Science, Israel

Lexical Resources

The following resources have been organized according to the amount of semantic rigor with which they attempt to define their terms. Ontologies are most ambitious, defining concepts in terms of concrete attributes with well-defined logical relations among them. Systematics refers to pre-genomic classification systems that have been used to organize biological species. It is usually divided into two fields: phylogenetics, which deals with the relationships between organisms, and taxonomy, which names and classifies organisms. A nomenclature is a set of rules for naming objects -- e.g., proteins here -- according to a certain classification. Thesauri organize vocabularies using broader/narrower-term, related-term and preferred term relationships. Dictionaries provide natural language definitions for individual terms

Ontologies

Gene Ontology Consortium
Collaborative effort to unite terminologies across yeast, fly and mouse
Michael Ashburner (EBI)
Suzanna Lewis (UCB)
Mike Cherry (Stanford)
Judy Blake (JAX)
Distributed Annotation System
(Not really an ontology, but...) An emerging effort to coordinate the annotation activities across large groups of individuals.
ARROWSMITH: A MEDICAL DISCOVER SUPPORT SYSTEM
University of Chicago
Swanson's ARROWSMITH for scientific discovery.
AbXtract server
EMBL-European Bioinformatics Institute (EBI)
Cambridge, UK
Keyword extraction for protein annotation

Systematics

Taxonomy

Taxonomy on the Web
IWR: Taxon Pages
IWR = Ichthyology Web Resources
Department of Biological Sciences, University of Alberta, Canada
Keith L. Jackson
A classification of fishes. Go to Ichthyology Web Resources for more ichthyology resource links.
Some Cephalopod Species
Dalhousie University, Halifax, Nova Scotia, Canada
James B. Wood
A classification of cephalopods (octopi, squids, and other tentacled beasties!). Go to The Cephalopod Page; Octopuses, Squid, Cuttlefish, and Nautilus for more resource links.

Phylogenetics

TreeBASE
Harvard University Herbaria
A relational database of phylogenetic information. Builds phylogenetic trees for a query organism.

Nomenclature

Broad, general vocabularies

nomlist.txt
Expert Protein Analysis System (ExPASy)
Swiss Institute of Bioinformatics (SIB)
Human Gene Nomenclature
HUGO Gene Nomenclature Committee (HGNC)
University College London
Biochemical Nomenclature Committees
International Union of Pure and Applied Chemistry (IUPAC)
International Union of Biochemistry and Molecular Biology (IUBMB)
IUPAC-IUBMB Joint Commission on Biochemical Nomenclature (JCBN)
Nomenclature Committee of the IUBMB (NC-IUBMB)
Department of Chemistry, Queen Mary and Westfield College, London, UK

Specific vocabularies

All pages herein are devoted to nomenclatures for particular classes of proteins, rather than being comprehensive.

MT page: classification
Universität Zürich Pierre-Alain Binz, J.H.R. Kägi
Metallothioneins.
Introduction to Ski and Sno gene family
Pearson-White Laboratory, Health Sciences Center, Charlottesville, Virginia
A lab page on Ski and Sno nomenclature.
EC nomenclature
PROLYSIS, a protease and protease inhibitor Web server
University of Tours, France
Proteases, proteinases, and peptidases
Nomenclature conventions / listing of gene family search tag names
Medical Research Council/University of Leicester, Center for Mechanisms of Human Toxicity, UK
Ion channels

Proposed/Under review

Gene Family Nomenclature
HUGO Gene Nomenclature Committee (HGNC)
University College London

Thesauri

Help: Life Sciences Thesaurus
Cambridge Scientific Abstracts, Bethesda, MD
Terms are hyperlinked alphabetically. A service of Cambridge Scientific Abstracts.
The CERES Thesaurus Effort
CERES = California Environmental Resources Evaluation System
California/Federal government (National Biological Information Infrastructure (NBII)) project to compile a thesaurus and search tool for environmental science terminology. Contains links to web pages with such terms.
OMNI: Organising Medical Networked Information
Search tool for links to pages on medical conditions. Each entry has a list of related keywords.
Cf. Dictionaries/Medicine/OMNI: ...
MeSH Browser
MeSH = Medical Subject Headings
National Libary of Medicine search tool that retrieves relevant terms and references in hierarchical fashion on a query.

Dictionaries

(Note: ' (*) ' means that many definitions at the given site provide cross-reference hyperlinks to other terms in the dictionary on the same site.
'(#)' means that the pages in question feature search engine-like tools to look up words in a local database.)

General Biological

BioTech's Life Science Dictionary
Institute for Cellular and Molecular Biology
University of Texas
Austin, Texas
(#)
BioABACUS Search
BioABACUS = Biotechnology ABbreviation and ACronym Uncovering Service
Molecular Biology Program
New Mexico State University
Mendell Rimer, Mary O'Connell
Browse vocabulary list grouped by biological subcategory. Also contains a search engine.
(*) (#)
Harcourt: AP Dictionary of Science and Technology: Life Sciences
AP = Academic Press
Harcourt, Inc.
Vocab lists categorized by subfields.
Note: May be privately owned.
(#)
Kimball's Biology Pages
Dr. John. W. Kimball, former Harvard lecturer
May be based on proprietary data.
Glossaries, dictionaries, terminology & acronyms
BIOSIS, Zoological Society of London
Comprehensive list of links to terminology in many biological subfields.
Contents
International Union of Pure and Applied Chemistry (IUPAC)
Department of Chemistry
Queen Mary and Westfield College
London, UK
Glossary of terms used in inorganic chemistry
The Biospace Glossary: Defining the Words that Define Biotechnology
Biospace.com, San Francisco, CA

Research fields

Subfields within the biological sciences have specialized vocabulary. If a web site contains one or more of these, rather than try to be comprehensive, it is put here.

A Hypermedia Glossary of Genetic Terms
Technische Universität München - Weihenstephan
Weihenstephan Information and Documentation Centre IDW
Freising, Germany
Birgid Schlindwein
Look up terms alphabetically yourself. Definitions contain related terms.
(*) (#)
A Genetics Glossary
Biology Teaching Organisation
Edinburgh School of Biology
The University of Edinburgh
Terms grouped and hyperlinked alphabetically.
Glossary
Human Genome Management Information System
Oak Ridge National Laboratory
Denise Casey, Dan Jacobson
All terms listed alphabetically on one page.
The Genomics Lexicon
Pharmaceutical Research and Manufacturers of America (PhRMA),
Foundation for Genetic Medicine, Inc. (FGM)
Mostly genetic terms here. A few cross-references. Grouped and hyperlinked alphabetically. Also links to other specialized glossaries.
Glossary of Biochemistry and Molecular Biology
Portland Press
David M. Glick
Need to mark a letter and press "search" to extract all terms beginning with that letter.
(#)
Search
The Forsyth Institute, Boston, MA
Dr. Tsute Chen
A microbiology dictionary.
(*) (#)
The PPS Hyperglossary
A small glossary of protein/genetic structure terms.
Definitions and Abbreviations
List of Bacterial Names with Standing in Nomenclature
Ecole Nationale Vétérinaire de Toulouse
Toulouse, France
J.P. Euzéby
Mostly bacteria-related.
Cf. List of bacterial names with standing in nomenclature for links to bacteria nomenclature
Dictionary of Epidemiology
University of Cambridge
Alphabetically listed terms from ecological epidemiology.
(*)
Dictionary of Cell Biology
Cell and Molecular Biology degree course
Glasgow University
Julian Dow
May be a proprietary site.
(#)
BIOTECHNOLOGY DICTIONARY
Department of Crop and Soil Environmental Sciences
College of Agriculture and Life Sciences
Virginia Polytechnic Institute and State University
Blacksburg, Virginia
Susan Allender-Hagedorn and Charles Hagedorn
Agricultural and environmental biotechnology annotated dictionary
Visionary - A Dictionary of terminology in vision research
Dr. Lars Lidén
Dept. of Cognitive and Neural Systems
Boston University
Vision research, including machine vision.
OceanLink: An Interactive Information Page for the Marine Sciences
Bamfield Marine Station
British Columbia, Canada
A marine science information and interaction web site. Has a link to a glossary hyperlinked alphabetically.
Glossary of Microscopy Terms
Characterization Facility
University of Minnesota
Probably not strictly biological, but biologically related, for sure. Has a really horrid frames interface, unfortunately; otherwise, this would have been quite a useful site.

Plants

Aquatic, Wetland and Invasive Plant Glossary Title Page and Contents
Univerisity of Florida - IFAS
Fort Lauderdale Research and Education Center
Fort Lauderdale, FL
Dave L. Sutton, Ph.D.
Centre for Plant Biodiversity Research
Centre for Plant Biodiversity Research and Australian National Herbarium
Canberra, Australia
Links to on-line glossaries of Australian flora.
CalFlora
CalFlora Database Project
Member: University of California, Berkeley, Digital Library Project
California flora indexible by name.
(#)
CAS California Wildflowers
California Academy of Sciences
Common and Latin names of these flowers, plus families

Medicine

On-line Medical Dictionary
The Gray Laboratory Cancer Research Trust
Mount Vernon Hospital
Northwood, Middlesex, UK
(#)
OnHealth: Online Medical Dictionary
OnHealth Network Company
Terms hyperlinked alphabetically. Seems to be a copy of On-line Medical Dictionary.
(#)
Multilingual Glossary of medical terms
Heymans Institute for Pharmacology, Medical School, University of Gent De Pintelaan,
and Mercator College, Department of Applied Linguistics
Gent, Belgium
Vocabularies in different languages.
Pharmacology Glossary
Department of Pharmacology and Experimental Therapeutics
Boston University
OMNI: Organising Medical Networked Information
OMNI / BIOME,
Greenfield Medical Library,
Queens Medical Centre,
Nottingham, UK
Search tool for links to pages on medical conditions. Each entry has a list of related keywords.
Cf. Thesauri/OMNI: ...
(#)

Generic language References

Not specifically biology-related.

ARTFL Project: ROGET'S Thesaurus Search Form
ARTFL Project = Project for American and French Research on the Treasury of the French Language
Division of the Humanities, University of Chicago
Director: Robert Morrissey
WordNet
Cognitive Science Laboratory, Princeton University
An Electronic Lexical Database
Eric Brill's tagger
Department of Computer Science
Johns Hopkins University
Part-of-speech taggers.
Link Grammar
School of Computer Science, Carnegie Mellon University
Davy Temperley, Daniel Sleator, John Lafferty
The Link Grammar Parser (natural language parser)
Rainbow
Department of Computer Science
Carnegie Mellon University
Andrew McCallum's package for text classification.

Last modified by: rik@cs.ucsd.edu 15 Aug 00