artigo bioinformatica
TRANSCRIPT
-
7/25/2019 artigo bioinformatica
1/8
Available online at www.sciencedirect.com
Identifications of pathogensa bioinformatic point of viewRichard Christen
Over the past 15 years, microbiology has undergone a
momentous shift toward molecular methods. New sequences
appear daily in the public databases and new computer tools
and web servers are published on a regular basis. Major
advances in molecular identifications of pathogens have been
made because new biotechnology methods have appeared
that often require a thorough in silico analysis of sequences.
However, significant difficulties partly remain in developing
efficient methods because the public databases contain many
poorly annotated or partial sequences (often of environmental
origin) and also because there are few dedicated web servers
and curated databases.
Addresses
University of Nice Sophia-Antipolis and CNRS UMR 6543, Institute ofDevelopmental Biology and Cancer, Parc Valrose, Centre de Biochimie,
F 06108 Nice, France
Corresponding author: Christen, Richard ([email protected])
Current Opinion in Biotechnology 2008, 19:266273
This review comes from a themed issue on
Environmental BiotechnologyEdited by Carla Pruzzo and Pietro Canepari
Available online 29th May 2008
0958-1669/$ see front matter
# 2008 Elsevier Ltd. All rights reserved.
DOI 10.1016/j.copbio.2008.04.003
IntroductionIn microbiology, nucleic-acid-based diagnosticsgradually
are replacing culture-based methods [1,2,3,4].Procedures that rely on PCR of a single geneor multilocus
sequence typing [58] as well as arrays [9,10,1113]
require the design of oligomers for amplification and
hybridization. Mass sequencing [14,15,16,17] or 16S
rRNA mass cataloging [18] produce sequences that have
to be matched to a database of known sequences. Finallyentirely new methods are appearing [19]. The consortium
of DDBJ, EMBL & GenBank exchanges data on a daily
basis (URL: http://www.insdc.org) and contains almost
every known sequence. Blast [20] is used to retrieve similar
sequences, ACNUC [21], SRS [22] or Entrez [23] retrieve
sequences according to keywords. There are many
free utilities to align sequences, compute and display
phylogenetic trees (URL: http://www.bioinformatics.org/
). Finally design of primers and probes can be done
using many tools (URL: http://bioinfo.unice.fr/softwares/oligo_softwares.html ).
Retrieval of every necessary sequence can, however, bedifficult, while the design of primers and probes is tedious
and may result in lower quality results if the multiple
criteria for design are not properly handled. New
sequences are now flowing in, seemingly faster than
programs can deal with. It is for example no longer easy
to Blast the 16S rRNA gene sequence of a new isolate to
find out which well known bacteria it is related to because
most newly submitted rRNA sequences now originate
from uncultivated clones. Housekeeping genes and
pathogenicity gene sequences have been submitted in
large numbers, but full sequences are not easily retrieved
by Blast (because many are quite divergent) or by key-
words because their annotations are often poor or not
standard. Also, and in contrast to the community devoted
to analyses of complete genomes, there are few centra-
lized services or web servers that gather data, clean them
and post them on the web with good query and analysistools. Finally, bioinformaticians continuously publish
new tools, but there are very few studies to compare
them and in fact analyze how good these new tools are(see for example BALIBASE for estimating new aligne-
ment programs [24,25]).
Detailed analyses will be restricted to waterborne bac-
teria, for which we will review available sequences and
possible solutions for in silico analyses of diagnostic
methods before the real experiments.
Choice of a target geneTarget genes for bacterial identification can be the 16S
rRNA gene, a housekeeping gene or finally a pathogen-
icity gene. Some species are always pathogenic, and
targeting 16S rRNA gene sequences is often the solution
because many sequences have been published, PCRprimers and hybridization probes have usually been
described and tested; finally dedicated software and
web sites are available [26]. Cases of lateral transferts
[2729]) or too similar 16S rRNA gene sequences
(reviewed in reference [30]) have also highlighted the
need to use other or more rapidly evolving genes [31].Some of these genes have, however, been completely
sequenced in very few different strains or species, making
it dubious that truly universal or specific oligomers have
been really designed. Also the general absence of very
conserved domains renders primers and probes design
difficult. Finally, there is always the chance that yet
unknown variant sequences exist that will escape mol-
ecular detection because of mutations. The last case
applies to clones that become pathogenic only after
acquisition of pathogenicity genes [32,33,34] or whenpathogenicity depends upon the genetic content, that is,
Current Opinion in Biotechnology2008, 19:266273 www.sciencedirect.com
mailto:[email protected]://dx.doi.org/10.1016/j.copbio.2008.04.003http://www.insdc.org/http://www.bioinformatics.org/http://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://www.bioinformatics.org/http://www.insdc.org/http://dx.doi.org/10.1016/j.copbio.2008.04.003mailto:[email protected] -
7/25/2019 artigo bioinformatica
2/8
by differential regulation of some genes or integration ofgenes (or domains) that belong to the species or genus
gene pool but are not always present in a particular clone
[35,36]. In such cases, targeting pathogenicity genes is the
best choice, with difficulties similar to housekeeping
genes. For other approaches such as multilocus sequencetyping (MLST) and analyses of variable number of tan-
dem repeats (VNTR) see references [3740] for
examples.
For Eukaria (often protists), the approach is very similar,
but there are often many fewer sequences available from
different strains or species. On the contrary, one mayexpect less divergence (due to smaller population sizes
and slower division rates) to be present in a population.
Finally, viruses are a very different situation, since there is
no homologous housekeeping gene shared among viruses,and mutation rates are expected to be much higher.
Retrieval of sequence data for the majorwaterborne pathogensA list of pathogens likely to be found in aquatic environ-
ments was built (primarily based on WHO list
Identifications of pathogensa bioinformatic point of view Christen 267
Table 1
For each taxon, the number of entries (number of different submissions) of protein coding sequences (CDS) and of genomes projects was
analyzed
Taxon Entries nbr CDS Genomes
Adenoviridae (Atadenovirus, Aviadenovirus, Mastadenovirus, Siadeonvirus) 3644 5356 44
Atadenovirus (various adenoviruses) 69 198 5Astroviridae (Avastrovirus, Mamastrovirus) 1072 1100 6
Caliciviridae (Lagovirus, Nebraska-like virus, Norovirus, Sapovirus, Vesivirus) 6371 7410 16Hepeviridae (Hepatitis E virus) 2767 2611 1
Mamastrovirus (Astrovirus of various hosts) 809 834 3
Picornaviridae (Enterovirus, Hepatovirus, . . .) 21296 17975 40
Reoviridae (Aquareovirus, . . ., Rotavirus) 8232 8015 352Enterovirus 13057 10994 16
Hepatovirus 3605 3014 2Human enterovirus A 3010 2723 1
Human enterovirus B 6578 5425 1
Human enterovirus C 762 639 1
Human enterovirus D 161 125 1Human astrovirus 792 813 1
Rotavirus 5488 5254 33
Sapovirus 553 608 4
Burkholderia pseudomallei 568 27058 24
Campylobacter coli 346 367 1
Campylobacter jejuni 2153 11676 11Escherichia coli 39004 75822 35Legionella pneumophila 2451 15145 4
Legionella 3386 15545 4Pseudomonas aeruginosa 35034 22340 7Salmonella typhi 191 474 0
Salmonella 7953 41953 24Shigella 4149 31292 8Vibrio cholerae 2532 10833 17
Vibrio parahaemolyticus 882 5775 3Vibrio vulnificus 821 10549 4Vibrio 10358 40967 34Yersinia enterocolitica 445 5093 1
Acanthamoeba 33450 180 1Cryptosporidium parvum 11148 1262 2Cryptosporidium 39678 1796 4
Cyclospora cayetanensis 164 2 0Dracunculus medinensis 2 0 0Entamoeba histolytica 101006 706 0
Entamoeba 205767 843 6Giardia intestinalis 24507 1364 0Naegleria fowleri 67 38 0
Viruseswere also queried accordingto a highertaxonomic rank because thenames used to describe them canbe quite differentin differententries.A
table in additional materials also provides the list of most sequenced genes for a number of waterborne pathogens. Note: complete lists or synthetic
information on genome projects can be manually obtained from URL: http://www.ncbi.nlm.nih.gov/Genomes/or URL:http://www.genomesonli-
ne.org/. Genome numbers are for finished to in progress projects.
www.sciencedirect.com Current Opinion in Biotechnology2008, 19:266273
http://www.ncbi.nlm.nih.gov/Genomes/http://www.genomesonline.org/http://www.genomesonline.org/http://www.genomesonline.org/http://www.genomesonline.org/http://www.ncbi.nlm.nih.gov/Genomes/ -
7/25/2019 artigo bioinformatica
3/8
-
7/25/2019 artigo bioinformatica
4/8
Identifications of pathogensa bioinformatic point of view Christen 269
Figure 1
Heatmap analysis of oligomers used in references [43,44] to identify the presence of the mip gene. Tms were calculated using the nearest
neighbor algorithm and were then transformed into colors (corresponding Tm/color shown in Figure). Each column of the heatmap (on the right)
corresponds to an oligomer as indicated in the box Primers identifiers. A gray square is for a Tm below 40 8C, a white square for a sequence
www.sciencedirect.com Current Opinion in Biotechnology2008, 19:266273
-
7/25/2019 artigo bioinformatica
5/8
potentiator) gene that in Legionella encodes for a surfaceprotein, required for optimal infection of macrophages.
Querying the literature returned 44 publications that used
mip as a target for identification, and for the purpose of
this review, we analyzed only two recent studies [43,44].
We retrieved a total of 278 mip sequences in Legionellaspecies, only 146 of which were distinct (not contained in
a longer sequence). We evaluated how each oligomer
would bind to each variant of the mip gene sequences
(Table 3). It is particularly striking that primer Mip-R1
shows a mismatch for most sequences in first position, a
simple blast confirmed this problem. For the other oli-
gomers, this analysis demonstrates that a number of
variant sequences will probably not be well recognized.
We also analyzed if themipgene was present inLegionella
species different from L. pneumophila and coupled Tm
calculation with a phylogenetic tree (Figure 1). This
analysis demonstrates that some oligomers are indeed
specifically targeting the mip gene in L. pneumophila
and not in other species ofLegionella. The fact that themip gene is also present in other species ofLegionella is not
clearly stated in these publications (but see reference
[45]), and since lateral gene transfers are rather common
in bacteria, it is not clear whether present primers indeed
amplify mip genes in every L. pneumophila strain (see
Figure 1).
Bioinformatic toolsAside from the multipurposes tools available at NCBI,
EBI or elsewhere, a number of web servers or programs
may help analyses:
GreenGenes. The greengenes web application pro-
vides access to a 16S rRNA gene sequence alignment
for browsing, blasting, probing, and downloading:
URL:http://greengenes.lbl.gov.
PubMLST. This site hosts publicly accessible MLSTdatabases and software: URL: http://pubmlst.org, see
also reference [46].
Legionella mipgene Sequence Database. This database
allows the comparison of a new mip gene DNA
sequences with reference sequences from all described
species ofLegionella: URL:http://www.hpa.org.uk/cfi/
bioinformatics/ewgli/legionellamips.htm.
leBIBI. Blast on databases of SSU-rDNA, gyrB, recA,sodA, rpoB, tmRNA, tuf and groel2-hsp65 gene
sequences and tools for bacterial identification: URL:
http://umr5558-sud-str1.univ-lyon1.fr/lebibi/lebibi.cgi.
ICB. Identification and classification of bacteria
database using gyrB: URL: http://seasquirt.mbio.-
co.jp/icb/.
GPMS. Pathogenic bacteria strain genotyping essen-tially for epidemiological purposes based on poly-
morphic tandem repeat typing: URL: http://
minisatellites.u-psud.fr.
VNTR. Molecular typing of bacteria using variablenumber tandem repeats: URL:http://vntr.csie.ntu.e-
du.tw.
OHM. A tool that produces heatmaps representing in
a visual manner the Tm of primers on a set of
sequences (can be combined with TreeDyn [47]):
URL:http://bioinfo.unice.fr/ohm.
270 Environmental Biotechnology
Table 3
Evaluation of primers and probes recently used for the identifica-
tion of the mip genes in Legionella
For each oligomer: column (1) Tm in 8C estimated for each mip
sequence variant; column (2) the variant sequence; column (3) the
number of such sequences (about 270 mip sequences available, onlyexcerpts shown). F: forward primer, R: reverse primer.
(Figure 1 Legend Continued) too short to contain the oligomer. Upper Figure (A) excerpt ofL. pneumophila clade (possible cases of lateral
transfert in red). Lower Figure (B) excerpt of non-L. pneumophila clade. Primer #3 shows the highest predicted Tm, but will fail on some
sequences; primer #1 also shows quite a wide heterogeneity of predicted Tms. The full figure is available as supplementary material.
Current Opinion in Biotechnology2008, 19:266273 www.sciencedirect.com
http://greengenes.lbl.gov/http://pubmlst.org/http://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://umr5558-sud-str1.univ-lyon1.fr/lebibi/lebibi.cgihttp://seasquirt.mbio.co.jp/icb/http://seasquirt.mbio.co.jp/icb/http://minisatellites.u-psud.fr/http://minisatellites.u-psud.fr/http://vntr.csie.ntu.edu.tw/http://vntr.csie.ntu.edu.tw/http://bioinfo.unice.fr/ohmhttp://bioinfo.unice.fr/ohmhttp://vntr.csie.ntu.edu.tw/http://vntr.csie.ntu.edu.tw/http://minisatellites.u-psud.fr/http://minisatellites.u-psud.fr/http://seasquirt.mbio.co.jp/icb/http://seasquirt.mbio.co.jp/icb/http://umr5558-sud-str1.univ-lyon1.fr/lebibi/lebibi.cgihttp://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://pubmlst.org/http://greengenes.lbl.gov/ -
7/25/2019 artigo bioinformatica
6/8
A Blast server, to Blast 16S rRNA sequences on
cultured bacteria only: URL: http://bioinfo.unice.fr/
blast.
DDBJ. A Blast server to blast only on 16S rRNA genesequences only (fast): URL:http://blast.ddbj.nig.ac.jp/
top-e.html.
The list of prokaryotic names with standing innomenclature (now including 16S rRNA accession
numbers): URL:http://www.bacterio.cict.fr/.
Norovirus Molecular Epidemiology Database. The
norovirus database contains a collection of over 1000
sequences of norovirus strains and associated epi-
demiological data: URL: http://www.hpa.org.uk/cfi/
bioinformatics/norwalk/norovirus.htm.
ConclusionsIf none of the above servers can be used (this is not an
exhaustive list), sequence retrieval, alignments, phyloge-
nies, and design of primers can be quite time consuming
and tedious for scientists that cannot write computer
programs. Sequence retrieval using keywords is often
more efficient than a Blast. SRS (Advanced Search form)or even better ACNUC or specific tools [48] should be
preferred to Entrez, because they are more powerful for
sequence retrieval. Combining keywords for the gene or
gene products with species name or taxon ID and a filter
on sequence length (very short sequences are useless) is
often very efficient. Since annotations are not standard,
building a list of gene products is often necessary (see
additional materials). If there are many sequences, it is
possible to cluster these sequences at a given similarity
level (using blastclust or Cd-hit [49]) and align one
representative sequence per cluster. A visual inspectionof alignments reveals sequences that do not align well;
they are often the result of a wrong annotation or have to
be inverted-complemented. The remaining sequences
can then be added to this good alignment (using Clustal
profile option for example). For protein coding gene a
program such as Transalign [50] may be a good choice.
When retrieving primers from publications, older papersare often useless because primers were designed using a
very few numbers of sequences (primers can be analyzed
using the web server cited above, to produce figures
similar toFigure 1).
Finally, there is a large difference between amplificationusing DNA extracted from a pure culture and DNA
extracted from an environmental sample. Primer (P)
binds to its target DNA (T) according to the classical
equation [P][T]/[PT] = Km. The presence of one or two
differences between the P sequence and the T sequence
may strongly influence the value of Km. With DNA
extracted from a pure culture [T] may be sufficiently
high so that [PT] is large enough for the PCR to succeed.
With environmental DNA, and in the presence of mis-
match(es), the primer may bind to many other domains (atlow affinity but in many places) so that [PT] is not large
enough to allow a successful amplification. This is why,
for environmental studies, any published primers should
always be carefully checked by comparison to newly
published sequences.
AcknowledgementsThis work was supported by funds from the European Commission for theHEALTHY WATER project (FOOD-CT-2006-036306) and a CNRS PICSto R Christen. The authors are solely responsible for the content of thispublication. It does not represent the opinion of the European Commission.The European Commission is not responsible for any use that might bemade of data appearing therein.
Appendix A. Supplementary dataSupplementary data associated with this article can be
found, in the online version, at doi:10.1016/j.copbio.
2008.04.003.
Conflict of interestNone.
References and recommended readingPapers of particular interest, published within the annual period ofreview, have been highlighted as:
of special interest
of outstanding interest
1.
Barken KB, HaagensenJA, Tolker-Nielsen T:Advances in nucleicacid-based diagnostics of bacterial infections. Clin Chim Acta2007, 384:1-11.
This review describes a range of different nucleic-acid-based diagnosticmethodsand providesexamples of the use of these methodsfor detectionof common bacterial infections, with a focus on automated procedures.
2.
Abubakar I, Irvine L, Aldus CF, Wyatt GM, Fordham R, Schelenz S,Shepstone L, Howe A, Peck M, Hunter PR: A systematic reviewof the clinical, public health and cost-effectiveness of rapid
diagnostic tests for the detection and identification ofbacterial intestinal pathogens in faeces and food . HealthTechnol Assess 2007, 11:1-216.
This is a (230 pages long) review provided by the Health TechnologyAssessment (HTA) program, now part of the National Institute for HealthResearch (NIHR) and based on studies evaluating diagnostic accuracy ofrapid tests were retrieved using electronic databases and handsearchingreference lists and key journals, including cost assessments. Every studyis critically evaluated.
3.
Tenover FC: Rapid detection and identification of bacterialpathogens using novel molecular technologies: infectioncontrol and beyond. Clin Infect Dis 2007, 44:418-423.
A short (far from exhaustive) review comparing effectiveness of PNA-FISH, real-time PCR and pyrosequencing and discussing the use of FDA-cleared versus non-FDA-cleared assays (antibiotic resistance).
4.
Shneyer VS: On the species-specificityof DNA: fifty years later.Biochemistry (Mosc) 2007, 72:1377-1384.
A short historical review of the molecular methods used to identifyprokaryotes and eukaryotes.
5. Angenent LT, Kelley ST, St Amand A, Pace NR, Hernandez MT:Molecular identification of potential pathogens in water andair of a hospital therapy pool. Proc Natl Acad Sci USA 2005,102:4860-4865.
6. Best EL, Fox AJ, Frost JA, Bolton FJ: Real-time single-nucleotide polymorphism profiling using Taqman technologyfor rapid recognition of Campylobacter jejuniclonalcomplexes. J Med Microbiol2005, 54:919-925.
7. Lehmann LE, Hunfeld KP, Emrich T, Haberhausen G, Wissing H,Hoeft A, Stuber F: A multiplex real-time PCR assay for rapiddetection and differentiation of 25 bacterial and fungalpathogens from whole blood samples.Med Microbiol Immunol2007.
Identifications of pathogensa bioinformatic point of view Christen 271
www.sciencedirect.com Current Opinion in Biotechnology2008, 19:266273
http://bioinfo.unice.fr/blasthttp://bioinfo.unice.fr/blasthttp://blast.ddbj.nig.ac.jp/top-e.htmlhttp://blast.ddbj.nig.ac.jp/top-e.htmlhttp://www.bacterio.cict.fr/http://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://dx.doi.org/10.1016/j.copbio.2008.04.003http://dx.doi.org/10.1016/j.copbio.2008.04.003http://dx.doi.org/10.1016/j.copbio.2008.04.003http://dx.doi.org/10.1016/j.copbio.2008.04.003http://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://www.bacterio.cict.fr/http://blast.ddbj.nig.ac.jp/top-e.htmlhttp://blast.ddbj.nig.ac.jp/top-e.htmlhttp://bioinfo.unice.fr/blasthttp://bioinfo.unice.fr/blast -
7/25/2019 artigo bioinformatica
7/8
8. Ciammaruconi A, Grassi S, De Santis R, Faggioni G, Pittiglio V,DAmelio R, CarattoliA, Cassone A, Vergnaud G,ListaF: Fieldablegenotyping of Bacillus anthracis and Yersinia pestis based on25-loci multi locus VNTR analysis. BMC Microbiol2008, 8:21doi: 10.1186/1471-2180-8-21.
9. WangXW, Zhang L,Jin LQ, Jin M,Shen ZQ, AnS,Chao FH, LiJW:Development and application of an oligonucleotide
microarray for the detection of food-borne bacterialpathogens. Appl Microbiol Biotechnol2007, 76:225-233.
10.
DeSantis TZ, Brodie EL, Moberg JP, Zubieta IX, Piceno YM,Andersen GL: High-density universal 16S rRNA microarrayanalysis reveals broader diversity than typical clone librarywhen sampling the environment. MicrobEcol2007, 53:371-383.
Identification of pathogens in environmental samples often use parallel,multispecies detection systems, in order to detect any pathogens. In thisanalysis a DNA array with 2 97 851 probes was compared with 16Scloning and sequencing to evaluate the biodiversity, with the conclusionthat the array was more efficient. However, pyrosequencing technologiesare likely to replace both of the approaches compared in this work.
11. Wiesinger-Mayr H, Vierlinger K, Pichler R, Kriegner A, Hirschl AM,Presterl E, Bodrossy L, Noehammer C: Identification of humanpathogens isolated from blood using microarray hybridisationand signal pattern recognition. BMC Microbiol2007, 7:78doi:10.1186/1471-2180-7-78.
12. Hansen RR, Sikes HD, Bowman CN:Visual detection of labeledoligonucleotides using visible-light-polymerization-basedamplification. Biomacromolecules 2008, 9:355-362.
13. Lin YC, Sheng WH, Chang SC, Wang JT, Chen YC, Wu RJ,Hsia KC, Li SY: Application of a microsphere-based array forrapid identification of Acinetobacter spp. with distinctantimicrobial susceptibilities.J Clin Microbiol2008, 46:612-617.
14. Yang ZJ, Tu MZ, Liu J, Wang XL, Jin HZ: Comparison ofamplicon-sequencing, pyrosequencing and real-time PCR fordetection of YMDD mutants in patients with chronic hepatitisB. World J Gastroenterol2006, 12 :7192-7196.
15.
Kobayashi N, Bauer TW, Tuohy MJ, Lieberman IH, Krebs V,Togawa D, Fujishiro T, Procop GW: The comparison ofpyrosequencing molecular Gram stain, culture, andconventional Gram stain for diagnosing orthopaedicinfections. J Orthop Res 2006, 24:1641-1649.
Sequencing more efficient than staining to differentiate Gram-positivefrom Gram-negative bacteria. Who would have bet on it in 2005?
16. Luna RA, Fasciano LR, Jones SC, Boyanton BL Jr, Ton TT,Versalovic J: DNA pyrosequencing-based bacterial pathogenidentification in a pediatric hospital setting. J Clin Microbiol2007,45 :2985-2992.
17. Dowd SE, Sun Y, Secor PR, Rhoads DD, Wolcott BM, James GA,Wolcott RD: Survey of bacterial diversity in chronic woundsusing Pyrosequencing, DGGE, and full ribosome shotgunsequencing.BMC Microbiol2008,8:43doi: 10.1186/1471-2180-8-43.
18. Jackson GW, McNichols RJ, Fox GE, Willson RC: Bacterialgenotyping by 16S rRNA mass cataloging.BMC Bioinformatics2006,7:321doi: 10.1186/1471-2105-7-321.
19. Grun J, Manka CK, Nikitin S, Zabetakis D, Comanescu G, Gillis D,Bowles J: Identification of bacteria from two-dimensionalresonant-Raman spectra. Anal Chem 2007, 79:5489-5493.
20. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,Lipman DJ: Gapped BLAST and PSI-BLAST: a new generationof protein database search programs.Nucleic Acids Res 1997,25:3389-3402.
21. Gouy M, Delmotte S: Remote access to ACNUC nucleotide andprotein sequence databases at PBIL. Biochimie 2008, 90:555-562.
22. Etzold T, Ulyanov A, Argos P:SRS: information retrieval systemfor molecular biology data banks. Methods Enzymol1996,266:114-128.
23. Schuler GD, Epstein JA, Ohkawa H, Kans JA: Entrez: molecularbiology database and retrieval system.Methods Enzymol1996,266:141-162.
24. Thompson JD, Plewniak F, Poch O: BAliBASE: a benchmarkalignment database for the evaluation of multiple alignmentprograms. Bioinformatics 1999, 15:87-88.
25. Conery JS: Aligning sequences by minimum descriptionlength. EURASIP J Bioinform Syst Biol 2007:72936.
26. Kumar Y, Westram R, Kipfer P, Meier H, Ludwig W:Evaluation ofsequence alignments and oligonucleotide probes withrespect to three-dimensional structure of ribosomal RNAusing ARB software package. BMC Bioinformatics2006, 7:240doi: 10.1186/1471-2105-7-240.
27. van Berkum P, Terefework Z, Paulin L, Suomalainen S,Lindstrom K, Eardly BD: Discordant phylogenies within the rrnloci of Rhizobia. J Bacteriol2003, 185:2988-2998.
28. Schouls LM, Schot CS, Jacobs JA:Horizontal transfer ofsegments of the 16S rRNA genes between species of theStreptococcus anginosus group. J Bacteriol2003, 185 :7241-7246.
29. Dewhirst FE, Shen Z, Scimeca MS, Stokes LN, Boumenna T,Chen T, Paster BJ, Fox JG: Discordant 16S and 23S rRNA genephylogenies for the genus Helicobacter: implications forphylogenetic inference and systematics. J Bacteriol2005,187:6106-6118.
30. Janda JM, Abbott SL:16S rRNA gene sequencing for bacterialidentification in the diagnostic laboratory: pluses, perils, andpitfalls. J Clin Microbiol2007, 45:2761-2764.
31. Santos SR, Ochman H: Identification and phylogenetic sortingof bacterial lineages with universally conserved genes andproteins. Environ Microbiol2004, 6:754-759.
32. Smith DL, Wareing BM, Fogg PC, Riley LM, Spencer M, Cox MJ,Saunders JR, McCarthy AJ, Allison HE: Multilocuscharacterization scheme for shiga toxin-encodingbacteriophages. Appl Environ Microbiol2007, 73:8032-8040.
33.
Ogura Y, Ooka T, Asadulghani, Terajima J, Nougayrede JP,Kurokawa K, Tashiro K, Tobe T, Nakayama K, Kuhara S et al.:Extensive genomic diversity and selective conservation ofvirulence-determinants in enterohemorrhagicEscherichia colistrains of O157 and non-O157 serotypes. Genome Biol2007,8:R138doi: 10.1186/gb-2007-8-7-r138.
A systematic whole genome comparison between O157 and non-O157
EHEC strains using microarray and whole genome PCR scanning ana-lyses. An example of modern analyses and comparisons of whole gen-omes to understand phenotypes and their evolutions in time.
34.
Zhang Y, Laing C, Steele M, Ziebell K, Johnson R, Benson AK,Taboada E, GannonVP: Genome evolution in majorEscherichiacoliO157:H7 lineages. BMC Genomics2007, 8:121doi: 10.1186/1471-2164-8-121.
Same as reference [33], but using 6167 50-mer oligonucleotides whole-genome-based microarrays for E. coli.
35. Hsiao A, Liu Z, Joelsson A, Zhu J: Vibrio cholerae virulenceregulator-coordinated evasion of host immunity. Proc NatlAcad Sci USA2006, 103:14542-14547.
36. PangB,YanM, Cui Z,Ye X,Diao B,RenY,GaoS, Zhang L,Kan B:Genetic diversity of toxigenic and nontoxigenic Vibriocholerae serogroups O1 and O139 revealed by array-basedcomparative genomic hybridization.J Bacteriol2007, 189:4837-4849.
37. FoxAJ, Taha MK,Vogel U: Standardized nonculture techniquesrecommended for European reference laboratories. FEMSMicrobiol Rev2007, 31:84-88.
38. Turner KM, Feil EJ:The secret life of the multilocus sequencetype. Int J Antimicrob Agents 2007, 29:129-135.
39. Chang CH, Chang YC, Underwood A, Chiou CS, Kao CY:VNTRDB: a bacterial variable number tandem repeat locusdatabase. Nucleic Acids Res 2007, 35:D416-421.
40. MartensM, Dawyndt P,CoopmanR, Gillis M,De Vos P,WillemsA:Advantages of multilocus sequence analysis for taxonomicstudies: a case study using 10 housekeeping genes in thegenus Ensifer (including former Sinorhizobium).Int J Syst EvolMicrobiol2008, 58:200-214.
272 Environmental Biotechnology
Current Opinion in Biotechnology2008, 19:266273 www.sciencedirect.com
http://dx.doi.org/10.1186/1471-2180-8-21http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2105-7-321http://dx.doi.org/10.1186/1471-2105-7-240http://dx.doi.org/10.1186/gb-2007-8-7-r138http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/gb-2007-8-7-r138http://dx.doi.org/10.1186/1471-2105-7-240http://dx.doi.org/10.1186/1471-2105-7-321http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-8-21 -
7/25/2019 artigo bioinformatica
8/8
41. Stackebrandt E, Brambilla E, Richert K:Gene sequencephylogenies of the family microbacteriaceae. Curr Microbiol2007, 55:42-46.
42. Guo Y, Zheng W, Rong X, Huang Y: A multilocus phylogeny ofthe Streptomyces griseus 16S rRNA gene clade: use ofmultilocus sequence analysis for streptomycete systematics.Int J Syst Evol Microbiol2008, 58:149-159.
43. Diederen BM, de Jong CM, Marmouk F, Kluytmans JA,Peeters MF, Van der Zee A: Evaluation of real-time PCR for theearly detection of Legionella pneumophila DNA in serumsamples. J Med Microbiol2007, 56 :94-101.
44. Vervaeren H, Temmerman R, Devos L, Boon N, Verstraete W:Introduction of a boost of Legionella pneumophila into astagnant-water model by heat treatment.FEMS Microbiol Ecol2006, 58:583-592.
45. Ratcliff RM, Lanser JA, Manning PA, Heuzenroeder MW:Sequence-based classification scheme for the genusLegionella targeting the mip gene. J Clin Microbiol1998,36:1560-1567.
46. Jolley KA, Chan MS, Maiden MC: mlstdbNet-distributed multi-locus sequence typing (MLST) databases.BMC Bioinformatics2004, 5:86doi: 10.1186/1471-2105-5-86.
47. Chevenet F, Brun C, Banuls A-L, Jacq B, Christen R: TreeDyn:towards dynamic graphics and annotations for analyses oftrees. BMC Bioinformatics 2006, 7:439-448doi: 10.1186/1471-2105-7-439.
48. Croce O, Lamarre M, Christen R: Querying the public databasesfor sequences using complex keywords contained in thefeature lines.BMC Bioinformatics2006,7:45doi: 10.1186/1471-2105-7-45.
49. Li W, Godzik A: Cd-hit: a fast program for clustering andcomparing large sets of protein or nucleotide sequences .Bioinformatics2006, 22:1658-1659.
50. Bininda-Emonds OR:transAlign: using amino acids tofacilitate the multiple alignment of protein-coding DNAsequences.BMC Bioinformatics 2005, 6:156doi: 10.1186/1471-2105-6-156.
Identifications of pathogensa bioinformatic point of view Christen 273
www.sciencedirect.com Current Opinion in Biotechnology2008, 19:266273
http://dx.doi.org/10.1186/1471-2105-5-86http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-5-86