presentazione 2012 05_03
Post on 20-Jan-2017
169 Views
Preview:
TRANSCRIPT
Ph.D. Day2012/05/16
METAL PDBMETAL PDBA DATABASE OF METALLOPROTEINSA DATABASE OF METALLOPROTEINS
XXVI cycle of“International Doctorate in
Mechanistic and Structural Systems Biology”Serena Lorenzini, 2 nd year Ph.D. student
Tutor: Claudia Andreini
Ph.D. Day2012/05/16
METAL PDBMETAL PDBA DATABASE OF METALLOPROTEINSA DATABASE OF METALLOPROTEINS
XXVI cycle of“International Doctorate in
Mechanistic and Structural Systems Biology”Serena Lorenzini, 2 nd year Ph.D. student
Tutor: Claudia Andreini
Biological DatabasesWhy?
1. Biology has increasingly turned into a data-rich science.
2. To make biological data available to scientists. A particular type of information should be available in one single place
(book, site, database). Collecting data from literature is TIME-CONSUMING!
3. To organize data in order to produce knowledge
4. To make biological data available in computer-readable form. Analysis of biological data almost always involves computers.
Having the data in computer-readable form is a necessary first step.
Biological DatabasesWhy?
1. Biology has increasingly turned into a data-rich science.
2. To make biological data available to scientists. A particular type of information should be available in one single place
(book, site, database). Collecting data from literature is TIME-CONSUMING!
3. To organize data in order to produce knowledge
4. To make biological data available in computer-readable form. Analysis of biological data almost always involves computers.
Having the data in computer-readable form is a necessary first step.
1."Atlas of Protein Sequences and Structures" by Margaret Dayhoff and colleagues, 1965. (PIR database). 65 Sequences.2. Protein Data Bank (PDB); Join between CCDC and BNL. 1971. 9 structures.3. GenBank. December 1982. 606 sequences.
… Data require algorithms to be analyzed4. The FASTA algorithm is published by Pearson and Lipman. 19855. The BLAST program (Altschul,et.al.) is implemented. 1990
...The 'omics era6. The E.Coli genome is published. 1997
Today: 1783 Biological Databases (NAR database issue 20121)
1. http://www.oxfordjournals.org/nar/database/cap/
Biological DatabasesMilestones
Biological DatabasesMilestones
Biological Databases and MetalloproteinsA troubled relationship
Not informative Out of date(Lack of bio-inorganic background) (Difficult to update)
The problem:exceptional variability
lack of a formal description for metals in proteins
Few resources dedicated (10 on 1783 databases found using “metal” keyword in NAR database issue).
At least indicative.
Solution3D models of metal sites
Advantages:
Automatic Extraction of 3D models
Formal description of features
Systematic organization of data
Easy update
Metal sites must be thought as functional unitscomposed of the metal and its LOCAL environment
Basis for database architecture
Metal PDB Architecture
First level: Automatically filled1- Information onthe entire structuresfrom multiple
resources2- Information on
metal sites
Second level: Manually filledFunctional information on metal sites
PDB
Metal PDB ArchitectureMetal PDB Architecture
PfamPfam
SCOPSCOP
CATHCATHEC-PDBEC-PDB
GOGO
……
PDB PDB
FIRST LEVELInformation
on the entire protein structurePDB coderesolution
Protein nameUniprot code
Cluster 50% sequence identityCATH domain-sSCOP domain-sPFAM domain-s
Enzyme Classification number-sTaxonomy names
Organism of Expression
Metal PDB Architecture
Metal PDB Architecture
FIRST LEVELInformation
on the metal site onlyMetal/s type
NuclearityCoordination number
Bond distancesCoordination geometry1
LigandsProximal residues
Binding patternConservation rates of residues
Secondary Structure patternH bonds
............1. Andreini C., Cavallaro G., Lorenzini S., “FindGeo: a tool for determining metal coordination
geometry”, Bioinformatics, April 2012
Metal sitesMetal sites His 96His 96
His 94His 94 His 119His 119
Data from First LevelData from First Level
Metal PDB Architecture: from First to Second LevelMetal PDB Architecture: from First to Second Level
Problem:Problem: Metal sites in the PDB are 151683.151683.
28193 PDB entries actually bind metals.
Problem:Problem: Metal sites in the PDB are 151683.151683.
28193 PDB entries actually bind metals.
Superfamily 1: zincinsSuperfamily 1: zincins Superfamily 2: endostatinsSuperfamily 2: endostatins
Cluster 1Cluster 1 Cluster 2Cluster 2 Cluster 3Cluster 3 Cluster 4Cluster 4
Solution:Solution:Create clusters of equivalent sites (same function) which can be annotated
together 1,2
1. Andreini et al, “Structural analysis of metal sites in proteins: non-heme iron sites as a case study”, J Mol Biol 20092. Andreini et al, Minimal functional sites allow a classification of zinc sites in proteins. PloS one, 2011
Characterization of a Database Metal PDB
Type of dataMetal-containing 3D sub-structures
Data entry and quality controlAppointed curators add, remove and update data
Primary or derived dataSecondary databases: results of analysis of primary databasesLinks to other data itemsCombination of data
Technical designRelational database (SQL)
Maintainer statusAcademic group
AvailabilityPublicly available, no restrictions
THANK YOUFOR YOUR ATTENTION
Thanks to Professor Ivano Bertini and Professor Lucia Banci
Thanks toDr. Claudia Andreini
Dr. Gabriele CavallaroProf. Antonio RosatoTech. Enrico Morelli
THANK YOUFOR YOUR ATTENTION
Thanks to Professor Ivano Bertini and Professor Lucia Banci
Thanks toDr. Claudia Andreini
Dr. Gabriele CavallaroProf. Antonio RosatoTech. Enrico Morelli
Metal PDB Architecture: from First to Second Level
What are equivalent sites ?
1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,
PFAM, CLUSTER50%)3. Structural alignment among sites in the same group
(reference template is longest chain)
Metal PDB Architecture: from First to Second Level
What are equivalent sites ?
1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,
PFAM, CLUSTER50%)3. Structural alignment among sites in the same group
(reference template is longest chain)
Metal PDB Architecture: from First to Second Level
What are equivalent sites ?
1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,
PFAM, CLUSTER50%)3. Structural alignment among sites in the same group
(reference template is longest chain)4. Single linkage clustering to create sub-groups of structural similarity
Metal PDB Architecture: from First to Second Level
What are equivalent sites ?
1 cluster of structural similarity2 clusters of functional similarity
1 cluster of structural similarity2 clusters of functional similarity
Metal PDB Architecture: from First to Second Level
What are equivalent sites ?
1. Automatically extract all sites2. Single linkage clustering to create groups of sites (CATH, SCOP,
PFAM, CLUSTER50%)3. Structural alignment among sites in the same group
(reference template is longest chain)4. Single linkage clustering to create sub-groups of structural similarity
5. Cluster equivalent (nuclearity and type) metal sites among members of same structural similarity group
EQUIVALENT SITES ARE DEFINED
top related