artigo bioinformatica

7/25/2019 artigo bioinformatica

1/8

Available online at www.sciencedirect.com

Identifications of pathogensa bioinformatic point of viewRichard Christen

Over the past 15 years, microbiology has undergone a

momentous shift toward molecular methods. New sequences

appear daily in the public databases and new computer tools

and web servers are published on a regular basis. Major

advances in molecular identifications of pathogens have been

made because new biotechnology methods have appeared

that often require a thorough in silico analysis of sequences.

However, significant difficulties partly remain in developing

efficient methods because the public databases contain many

poorly annotated or partial sequences (often of environmental

origin) and also because there are few dedicated web servers

and curated databases.

Addresses

University of Nice Sophia-Antipolis and CNRS UMR 6543, Institute ofDevelopmental Biology and Cancer, Parc Valrose, Centre de Biochimie,

F 06108 Nice, France

Corresponding author: Christen, Richard ([email protected])

Current Opinion in Biotechnology 2008, 19:266273

This review comes from a themed issue on

Environmental BiotechnologyEdited by Carla Pruzzo and Pietro Canepari

Available online 29th May 2008

0958-1669/$ see front matter

# 2008 Elsevier Ltd. All rights reserved.

DOI 10.1016/j.copbio.2008.04.003

IntroductionIn microbiology, nucleic-acid-based diagnosticsgradually

are replacing culture-based methods [1,2,3,4].Procedures that rely on PCR of a single geneor multilocus

sequence typing [58] as well as arrays [9,10,1113]

require the design of oligomers for amplification and

hybridization. Mass sequencing [14,15,16,17] or 16S

rRNA mass cataloging [18] produce sequences that have

to be matched to a database of known sequences. Finallyentirely new methods are appearing [19]. The consortium

of DDBJ, EMBL & GenBank exchanges data on a daily

basis (URL: http://www.insdc.org) and contains almost

every known sequence. Blast [20] is used to retrieve similar

sequences, ACNUC [21], SRS [22] or Entrez [23] retrieve

sequences according to keywords. There are many

free utilities to align sequences, compute and display

phylogenetic trees (URL: http://www.bioinformatics.org/

). Finally design of primers and probes can be done

using many tools (URL: http://bioinfo.unice.fr/softwares/oligo_softwares.html ).

Retrieval of every necessary sequence can, however, bedifficult, while the design of primers and probes is tedious

and may result in lower quality results if the multiple

criteria for design are not properly handled. New

sequences are now flowing in, seemingly faster than

programs can deal with. It is for example no longer easy

to Blast the 16S rRNA gene sequence of a new isolate to

find out which well known bacteria it is related to because

most newly submitted rRNA sequences now originate

from uncultivated clones. Housekeeping genes and

pathogenicity gene sequences have been submitted in

large numbers, but full sequences are not easily retrieved

by Blast (because many are quite divergent) or by key-

words because their annotations are often poor or not

standard. Also, and in contrast to the community devoted

to analyses of complete genomes, there are few centra-

lized services or web servers that gather data, clean them

and post them on the web with good query and analysistools. Finally, bioinformaticians continuously publish

new tools, but there are very few studies to compare

them and in fact analyze how good these new tools are(see for example BALIBASE for estimating new aligne-

ment programs [24,25]).

Detailed analyses will be restricted to waterborne bac-

teria, for which we will review available sequences and

possible solutions for in silico analyses of diagnostic

methods before the real experiments.

Choice of a target geneTarget genes for bacterial identification can be the 16S

rRNA gene, a housekeeping gene or finally a pathogen-

icity gene. Some species are always pathogenic, and

targeting 16S rRNA gene sequences is often the solution

because many sequences have been published, PCRprimers and hybridization probes have usually been

described and tested; finally dedicated software and

web sites are available [26]. Cases of lateral transferts

[2729]) or too similar 16S rRNA gene sequences

(reviewed in reference [30]) have also highlighted the

need to use other or more rapidly evolving genes [31].Some of these genes have, however, been completely

sequenced in very few different strains or species, making

it dubious that truly universal or specific oligomers have

been really designed. Also the general absence of very

conserved domains renders primers and probes design

difficult. Finally, there is always the chance that yet

unknown variant sequences exist that will escape mol-

ecular detection because of mutations. The last case

applies to clones that become pathogenic only after

acquisition of pathogenicity genes [32,33,34] or whenpathogenicity depends upon the genetic content, that is,

Current Opinion in Biotechnology2008, 19:266273 www.sciencedirect.com
mailto:[email protected]://dx.doi.org/10.1016/j.copbio.2008.04.003http://www.insdc.org/http://www.bioinformatics.org/http://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://www.bioinformatics.org/http://www.insdc.org/http://dx.doi.org/10.1016/j.copbio.2008.04.003mailto:[email protected]


2/8

by differential regulation of some genes or integration ofgenes (or domains) that belong to the species or genus

gene pool but are not always present in a particular clone

[35,36]. In such cases, targeting pathogenicity genes is the

best choice, with difficulties similar to housekeeping

genes. For other approaches such as multilocus sequencetyping (MLST) and analyses of variable number of tan-

dem repeats (VNTR) see references [3740] for

examples.

For Eukaria (often protists), the approach is very similar,

but there are often many fewer sequences available from

different strains or species. On the contrary, one mayexpect less divergence (due to smaller population sizes

and slower division rates) to be present in a population.

Finally, viruses are a very different situation, since there is

no homologous housekeeping gene shared among viruses,and mutation rates are expected to be much higher.

Retrieval of sequence data for the majorwaterborne pathogensA list of pathogens likely to be found in aquatic environ-

ments was built (primarily based on WHO list

Identifications of pathogensa bioinformatic point of view Christen 267

Table 1

For each taxon, the number of entries (number of different submissions) of protein coding sequences (CDS) and of genomes projects was

analyzed

Taxon Entries nbr CDS Genomes

Adenoviridae (Atadenovirus, Aviadenovirus, Mastadenovirus, Siadeonvirus) 3644 5356 44

Atadenovirus (various adenoviruses) 69 198 5Astroviridae (Avastrovirus, Mamastrovirus) 1072 1100 6

Caliciviridae (Lagovirus, Nebraska-like virus, Norovirus, Sapovirus, Vesivirus) 6371 7410 16Hepeviridae (Hepatitis E virus) 2767 2611 1

Mamastrovirus (Astrovirus of various hosts) 809 834 3

Picornaviridae (Enterovirus, Hepatovirus, . . .) 21296 17975 40

Reoviridae (Aquareovirus, . . ., Rotavirus) 8232 8015 352Enterovirus 13057 10994 16

Hepatovirus 3605 3014 2Human enterovirus A 3010 2723 1

Human enterovirus B 6578 5425 1

Human enterovirus C 762 639 1

Human enterovirus D 161 125 1Human astrovirus 792 813 1

Rotavirus 5488 5254 33

Sapovirus 553 608 4

Burkholderia pseudomallei 568 27058 24

Campylobacter coli 346 367 1

Campylobacter jejuni 2153 11676 11Escherichia coli 39004 75822 35Legionella pneumophila 2451 15145 4

Legionella 3386 15545 4Pseudomonas aeruginosa 35034 22340 7Salmonella typhi 191 474 0

Salmonella 7953 41953 24Shigella 4149 31292 8Vibrio cholerae 2532 10833 17

Vibrio parahaemolyticus 882 5775 3Vibrio vulnificus 821 10549 4Vibrio 10358 40967 34Yersinia enterocolitica 445 5093 1

Acanthamoeba 33450 180 1Cryptosporidium parvum 11148 1262 2Cryptosporidium 39678 1796 4

Cyclospora cayetanensis 164 2 0Dracunculus medinensis 2 0 0Entamoeba histolytica 101006 706 0

Entamoeba 205767 843 6Giardia intestinalis 24507 1364 0Naegleria fowleri 67 38 0

Viruseswere also queried accordingto a highertaxonomic rank because thenames used to describe them canbe quite differentin differententries.A

table in additional materials also provides the list of most sequenced genes for a number of waterborne pathogens. Note: complete lists or synthetic

information on genome projects can be manually obtained from URL: http://www.ncbi.nlm.nih.gov/Genomes/or URL:http://www.genomesonli-

ne.org/. Genome numbers are for finished to in progress projects.

www.sciencedirect.com Current Opinion in Biotechnology2008, 19:266273
http://www.ncbi.nlm.nih.gov/Genomes/http://www.genomesonline.org/http://www.genomesonline.org/http://www.genomesonline.org/http://www.genomesonline.org/http://www.ncbi.nlm.nih.gov/Genomes/


3/8


4/8


Figure 1

Heatmap analysis of oligomers used in references [43,44] to identify the presence of the mip gene. Tms were calculated using the nearest

neighbor algorithm and were then transformed into colors (corresponding Tm/color shown in Figure). Each column of the heatmap (on the right)

corresponds to an oligomer as indicated in the box Primers identifiers. A gray square is for a Tm below 40 8C, a white square for a sequence



5/8

potentiator) gene that in Legionella encodes for a surfaceprotein, required for optimal infection of macrophages.

Querying the literature returned 44 publications that used

mip as a target for identification, and for the purpose of

this review, we analyzed only two recent studies [43,44].

We retrieved a total of 278 mip sequences in Legionellaspecies, only 146 of which were distinct (not contained in

a longer sequence). We evaluated how each oligomer

would bind to each variant of the mip gene sequences

(Table 3). It is particularly striking that primer Mip-R1

shows a mismatch for most sequences in first position, a

simple blast confirmed this problem. For the other oli-

gomers, this analysis demonstrates that a number of

variant sequences will probably not be well recognized.

We also analyzed if themipgene was present inLegionella

species different from L. pneumophila and coupled Tm

calculation with a phylogenetic tree (Figure 1). This

analysis demonstrates that some oligomers are indeed

specifically targeting the mip gene in L. pneumophila

and not in other species ofLegionella. The fact that themip gene is also present in other species ofLegionella is not

clearly stated in these publications (but see reference

[45]), and since lateral gene transfers are rather common

in bacteria, it is not clear whether present primers indeed

amplify mip genes in every L. pneumophila strain (see

Figure 1).

Bioinformatic toolsAside from the multipurposes tools available at NCBI,

EBI or elsewhere, a number of web servers or programs

may help analyses:

GreenGenes. The greengenes web application pro-

vides access to a 16S rRNA gene sequence alignment

for browsing, blasting, probing, and downloading:

URL:http://greengenes.lbl.gov.

PubMLST. This site hosts publicly accessible MLSTdatabases and software: URL: http://pubmlst.org, see

also reference [46].

Legionella mipgene Sequence Database. This database

allows the comparison of a new mip gene DNA

sequences with reference sequences from all described

species ofLegionella: URL:http://www.hpa.org.uk/cfi/

bioinformatics/ewgli/legionellamips.htm.

leBIBI. Blast on databases of SSU-rDNA, gyrB, recA,sodA, rpoB, tmRNA, tuf and groel2-hsp65 gene

sequences and tools for bacterial identification: URL:

http://umr5558-sud-str1.univ-lyon1.fr/lebibi/lebibi.cgi.

ICB. Identification and classification of bacteria

database using gyrB: URL: http://seasquirt.mbio.-

co.jp/icb/.

GPMS. Pathogenic bacteria strain genotyping essen-tially for epidemiological purposes based on poly-

morphic tandem repeat typing: URL: http://

minisatellites.u-psud.fr.

VNTR. Molecular typing of bacteria using variablenumber tandem repeats: URL:http://vntr.csie.ntu.e-

du.tw.

OHM. A tool that produces heatmaps representing in

a visual manner the Tm of primers on a set of

sequences (can be combined with TreeDyn [47]):

URL:http://bioinfo.unice.fr/ohm.

270 Environmental Biotechnology

Table 3

Evaluation of primers and probes recently used for the identifica-

tion of the mip genes in Legionella

For each oligomer: column (1) Tm in 8C estimated for each mip

sequence variant; column (2) the variant sequence; column (3) the

number of such sequences (about 270 mip sequences available, onlyexcerpts shown). F: forward primer, R: reverse primer.

(Figure 1 Legend Continued) too short to contain the oligomer. Upper Figure (A) excerpt ofL. pneumophila clade (possible cases of lateral

transfert in red). Lower Figure (B) excerpt of non-L. pneumophila clade. Primer #3 shows the highest predicted Tm, but will fail on some

sequences; primer #1 also shows quite a wide heterogeneity of predicted Tms. The full figure is available as supplementary material.

http://greengenes.lbl.gov/http://pubmlst.org/http://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://umr5558-sud-str1.univ-lyon1.fr/lebibi/lebibi.cgihttp://seasquirt.mbio.co.jp/icb/http://seasquirt.mbio.co.jp/icb/http://minisatellites.u-psud.fr/http://minisatellites.u-psud.fr/http://vntr.csie.ntu.edu.tw/http://vntr.csie.ntu.edu.tw/http://bioinfo.unice.fr/ohmhttp://bioinfo.unice.fr/ohmhttp://vntr.csie.ntu.edu.tw/http://vntr.csie.ntu.edu.tw/http://minisatellites.u-psud.fr/http://minisatellites.u-psud.fr/http://seasquirt.mbio.co.jp/icb/http://seasquirt.mbio.co.jp/icb/http://umr5558-sud-str1.univ-lyon1.fr/lebibi/lebibi.cgihttp://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://pubmlst.org/http://greengenes.lbl.gov/


6/8

A Blast server, to Blast 16S rRNA sequences on

cultured bacteria only: URL: http://bioinfo.unice.fr/

blast.

DDBJ. A Blast server to blast only on 16S rRNA genesequences only (fast): URL:http://blast.ddbj.nig.ac.jp/

top-e.html.

The list of prokaryotic names with standing innomenclature (now including 16S rRNA accession

numbers): URL:http://www.bacterio.cict.fr/.

Norovirus Molecular Epidemiology Database. The

norovirus database contains a collection of over 1000

sequences of norovirus strains and associated epi-

demiological data: URL: http://www.hpa.org.uk/cfi/

bioinformatics/norwalk/norovirus.htm.

ConclusionsIf none of the above servers can be used (this is not an

exhaustive list), sequence retrieval, alignments, phyloge-

nies, and design of primers can be quite time consuming

and tedious for scientists that cannot write computer

programs. Sequence retrieval using keywords is often

more efficient than a Blast. SRS (Advanced Search form)or even better ACNUC or specific tools [48] should be

preferred to Entrez, because they are more powerful for

sequence retrieval. Combining keywords for the gene or

gene products with species name or taxon ID and a filter

on sequence length (very short sequences are useless) is

often very efficient. Since annotations are not standard,

building a list of gene products is often necessary (see

additional materials). If there are many sequences, it is

possible to cluster these sequences at a given similarity

level (using blastclust or Cd-hit [49]) and align one

representative sequence per cluster. A visual inspectionof alignments reveals sequences that do not align well;

they are often the result of a wrong annotation or have to

be inverted-complemented. The remaining sequences

can then be added to this good alignment (using Clustal

profile option for example). For protein coding gene a

program such as Transalign [50] may be a good choice.

When retrieving primers from publications, older papersare often useless because primers were designed using a

very few numbers of sequences (primers can be analyzed

using the web server cited above, to produce figures

similar toFigure 1).

Finally, there is a large difference between amplificationusing DNA extracted from a pure culture and DNA

extracted from an environmental sample. Primer (P)

binds to its target DNA (T) according to the classical

equation [P][T]/[PT] = Km. The presence of one or two

differences between the P sequence and the T sequence

may strongly influence the value of Km. With DNA

extracted from a pure culture [T] may be sufficiently

high so that [PT] is large enough for the PCR to succeed.

With environmental DNA, and in the presence of mis-

match(es), the primer may bind to many other domains (atlow affinity but in many places) so that [PT] is not large

enough to allow a successful amplification. This is why,

for environmental studies, any published primers should

always be carefully checked by comparison to newly

published sequences.

AcknowledgementsThis work was supported by funds from the European Commission for theHEALTHY WATER project (FOOD-CT-2006-036306) and a CNRS PICSto R Christen. The authors are solely responsible for the content of thispublication. It does not represent the opinion of the European Commission.The European Commission is not responsible for any use that might bemade of data appearing therein.

Appendix A. Supplementary dataSupplementary data associated with this article can be

found, in the online version, at doi:10.1016/j.copbio.

2008.04.003.

Conflict of interestNone.

References and recommended readingPapers of particular interest, published within the annual period ofreview, have been highlighted as:

of special interest

of outstanding interest

1.

Barken KB, HaagensenJA, Tolker-Nielsen T:Advances in nucleicacid-based diagnostics of bacterial infections. Clin Chim Acta2007, 384:1-11.

This review describes a range of different nucleic-acid-based diagnosticmethodsand providesexamples of the use of these methodsfor detectionof common bacterial infections, with a focus on automated procedures.

2.

Abubakar I, Irvine L, Aldus CF, Wyatt GM, Fordham R, Schelenz S,Shepstone L, Howe A, Peck M, Hunter PR: A systematic reviewof the clinical, public health and cost-effectiveness of rapid

diagnostic tests for the detection and identification ofbacterial intestinal pathogens in faeces and food . HealthTechnol Assess 2007, 11:1-216.

This is a (230 pages long) review provided by the Health TechnologyAssessment (HTA) program, now part of the National Institute for HealthResearch (NIHR) and based on studies evaluating diagnostic accuracy ofrapid tests were retrieved using electronic databases and handsearchingreference lists and key journals, including cost assessments. Every studyis critically evaluated.

3.

Tenover FC: Rapid detection and identification of bacterialpathogens using novel molecular technologies: infectioncontrol and beyond. Clin Infect Dis 2007, 44:418-423.

A short (far from exhaustive) review comparing effectiveness of PNA-FISH, real-time PCR and pyrosequencing and discussing the use of FDA-cleared versus non-FDA-cleared assays (antibiotic resistance).

4.

Shneyer VS: On the species-specificityof DNA: fifty years later.Biochemistry (Mosc) 2007, 72:1377-1384.

A short historical review of the molecular methods used to identifyprokaryotes and eukaryotes.

5. Angenent LT, Kelley ST, St Amand A, Pace NR, Hernandez MT:Molecular identification of potential pathogens in water andair of a hospital therapy pool. Proc Natl Acad Sci USA 2005,102:4860-4865.

6. Best EL, Fox AJ, Frost JA, Bolton FJ: Real-time single-nucleotide polymorphism profiling using Taqman technologyfor rapid recognition of Campylobacter jejuniclonalcomplexes. J Med Microbiol2005, 54:919-925.

7. Lehmann LE, Hunfeld KP, Emrich T, Haberhausen G, Wissing H,Hoeft A, Stuber F: A multiplex real-time PCR assay for rapiddetection and differentiation of 25 bacterial and fungalpathogens from whole blood samples.Med Microbiol Immunol2007.


http://bioinfo.unice.fr/blasthttp://bioinfo.unice.fr/blasthttp://blast.ddbj.nig.ac.jp/top-e.htmlhttp://blast.ddbj.nig.ac.jp/top-e.htmlhttp://www.bacterio.cict.fr/http://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://dx.doi.org/10.1016/j.copbio.2008.04.003http://dx.doi.org/10.1016/j.copbio.2008.04.003http://dx.doi.org/10.1016/j.copbio.2008.04.003http://dx.doi.org/10.1016/j.copbio.2008.04.003http://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://www.bacterio.cict.fr/http://blast.ddbj.nig.ac.jp/top-e.htmlhttp://blast.ddbj.nig.ac.jp/top-e.htmlhttp://bioinfo.unice.fr/blasthttp://bioinfo.unice.fr/blast


7/8

8. Ciammaruconi A, Grassi S, De Santis R, Faggioni G, Pittiglio V,DAmelio R, CarattoliA, Cassone A, Vergnaud G,ListaF: Fieldablegenotyping of Bacillus anthracis and Yersinia pestis based on25-loci multi locus VNTR analysis. BMC Microbiol2008, 8:21doi: 10.1186/1471-2180-8-21.

9. WangXW, Zhang L,Jin LQ, Jin M,Shen ZQ, AnS,Chao FH, LiJW:Development and application of an oligonucleotide

microarray for the detection of food-borne bacterialpathogens. Appl Microbiol Biotechnol2007, 76:225-233.

10.

DeSantis TZ, Brodie EL, Moberg JP, Zubieta IX, Piceno YM,Andersen GL: High-density universal 16S rRNA microarrayanalysis reveals broader diversity than typical clone librarywhen sampling the environment. MicrobEcol2007, 53:371-383.

Identification of pathogens in environmental samples often use parallel,multispecies detection systems, in order to detect any pathogens. In thisanalysis a DNA array with 2 97 851 probes was compared with 16Scloning and sequencing to evaluate the biodiversity, with the conclusionthat the array was more efficient. However, pyrosequencing technologiesare likely to replace both of the approaches compared in this work.

11. Wiesinger-Mayr H, Vierlinger K, Pichler R, Kriegner A, Hirschl AM,Presterl E, Bodrossy L, Noehammer C: Identification of humanpathogens isolated from blood using microarray hybridisationand signal pattern recognition. BMC Microbiol2007, 7:78doi:10.1186/1471-2180-7-78.

12. Hansen RR, Sikes HD, Bowman CN:Visual detection of labeledoligonucleotides using visible-light-polymerization-basedamplification. Biomacromolecules 2008, 9:355-362.

13. Lin YC, Sheng WH, Chang SC, Wang JT, Chen YC, Wu RJ,Hsia KC, Li SY: Application of a microsphere-based array forrapid identification of Acinetobacter spp. with distinctantimicrobial susceptibilities.J Clin Microbiol2008, 46:612-617.

14. Yang ZJ, Tu MZ, Liu J, Wang XL, Jin HZ: Comparison ofamplicon-sequencing, pyrosequencing and real-time PCR fordetection of YMDD mutants in patients with chronic hepatitisB. World J Gastroenterol2006, 12 :7192-7196.

15.

Kobayashi N, Bauer TW, Tuohy MJ, Lieberman IH, Krebs V,Togawa D, Fujishiro T, Procop GW: The comparison ofpyrosequencing molecular Gram stain, culture, andconventional Gram stain for diagnosing orthopaedicinfections. J Orthop Res 2006, 24:1641-1649.

Sequencing more efficient than staining to differentiate Gram-positivefrom Gram-negative bacteria. Who would have bet on it in 2005?

16. Luna RA, Fasciano LR, Jones SC, Boyanton BL Jr, Ton TT,Versalovic J: DNA pyrosequencing-based bacterial pathogenidentification in a pediatric hospital setting. J Clin Microbiol2007,45 :2985-2992.

17. Dowd SE, Sun Y, Secor PR, Rhoads DD, Wolcott BM, James GA,Wolcott RD: Survey of bacterial diversity in chronic woundsusing Pyrosequencing, DGGE, and full ribosome shotgunsequencing.BMC Microbiol2008,8:43doi: 10.1186/1471-2180-8-43.

18. Jackson GW, McNichols RJ, Fox GE, Willson RC: Bacterialgenotyping by 16S rRNA mass cataloging.BMC Bioinformatics2006,7:321doi: 10.1186/1471-2105-7-321.

19. Grun J, Manka CK, Nikitin S, Zabetakis D, Comanescu G, Gillis D,Bowles J: Identification of bacteria from two-dimensionalresonant-Raman spectra. Anal Chem 2007, 79:5489-5493.

20. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,Lipman DJ: Gapped BLAST and PSI-BLAST: a new generationof protein database search programs.Nucleic Acids Res 1997,25:3389-3402.

21. Gouy M, Delmotte S: Remote access to ACNUC nucleotide andprotein sequence databases at PBIL. Biochimie 2008, 90:555-562.

22. Etzold T, Ulyanov A, Argos P:SRS: information retrieval systemfor molecular biology data banks. Methods Enzymol1996,266:114-128.

23. Schuler GD, Epstein JA, Ohkawa H, Kans JA: Entrez: molecularbiology database and retrieval system.Methods Enzymol1996,266:141-162.

24. Thompson JD, Plewniak F, Poch O: BAliBASE: a benchmarkalignment database for the evaluation of multiple alignmentprograms. Bioinformatics 1999, 15:87-88.

25. Conery JS: Aligning sequences by minimum descriptionlength. EURASIP J Bioinform Syst Biol 2007:72936.

26. Kumar Y, Westram R, Kipfer P, Meier H, Ludwig W:Evaluation ofsequence alignments and oligonucleotide probes withrespect to three-dimensional structure of ribosomal RNAusing ARB software package. BMC Bioinformatics2006, 7:240doi: 10.1186/1471-2105-7-240.

27. van Berkum P, Terefework Z, Paulin L, Suomalainen S,Lindstrom K, Eardly BD: Discordant phylogenies within the rrnloci of Rhizobia. J Bacteriol2003, 185:2988-2998.

28. Schouls LM, Schot CS, Jacobs JA:Horizontal transfer ofsegments of the 16S rRNA genes between species of theStreptococcus anginosus group. J Bacteriol2003, 185 :7241-7246.

29. Dewhirst FE, Shen Z, Scimeca MS, Stokes LN, Boumenna T,Chen T, Paster BJ, Fox JG: Discordant 16S and 23S rRNA genephylogenies for the genus Helicobacter: implications forphylogenetic inference and systematics. J Bacteriol2005,187:6106-6118.

30. Janda JM, Abbott SL:16S rRNA gene sequencing for bacterialidentification in the diagnostic laboratory: pluses, perils, andpitfalls. J Clin Microbiol2007, 45:2761-2764.

31. Santos SR, Ochman H: Identification and phylogenetic sortingof bacterial lineages with universally conserved genes andproteins. Environ Microbiol2004, 6:754-759.

32. Smith DL, Wareing BM, Fogg PC, Riley LM, Spencer M, Cox MJ,Saunders JR, McCarthy AJ, Allison HE: Multilocuscharacterization scheme for shiga toxin-encodingbacteriophages. Appl Environ Microbiol2007, 73:8032-8040.

33.

Ogura Y, Ooka T, Asadulghani, Terajima J, Nougayrede JP,Kurokawa K, Tashiro K, Tobe T, Nakayama K, Kuhara S et al.:Extensive genomic diversity and selective conservation ofvirulence-determinants in enterohemorrhagicEscherichia colistrains of O157 and non-O157 serotypes. Genome Biol2007,8:R138doi: 10.1186/gb-2007-8-7-r138.

A systematic whole genome comparison between O157 and non-O157

EHEC strains using microarray and whole genome PCR scanning ana-lyses. An example of modern analyses and comparisons of whole gen-omes to understand phenotypes and their evolutions in time.

34.

Zhang Y, Laing C, Steele M, Ziebell K, Johnson R, Benson AK,Taboada E, GannonVP: Genome evolution in majorEscherichiacoliO157:H7 lineages. BMC Genomics2007, 8:121doi: 10.1186/1471-2164-8-121.

Same as reference [33], but using 6167 50-mer oligonucleotides whole-genome-based microarrays for E. coli.

35. Hsiao A, Liu Z, Joelsson A, Zhu J: Vibrio cholerae virulenceregulator-coordinated evasion of host immunity. Proc NatlAcad Sci USA2006, 103:14542-14547.

36. PangB,YanM, Cui Z,Ye X,Diao B,RenY,GaoS, Zhang L,Kan B:Genetic diversity of toxigenic and nontoxigenic Vibriocholerae serogroups O1 and O139 revealed by array-basedcomparative genomic hybridization.J Bacteriol2007, 189:4837-4849.

37. FoxAJ, Taha MK,Vogel U: Standardized nonculture techniquesrecommended for European reference laboratories. FEMSMicrobiol Rev2007, 31:84-88.

38. Turner KM, Feil EJ:The secret life of the multilocus sequencetype. Int J Antimicrob Agents 2007, 29:129-135.

39. Chang CH, Chang YC, Underwood A, Chiou CS, Kao CY:VNTRDB: a bacterial variable number tandem repeat locusdatabase. Nucleic Acids Res 2007, 35:D416-421.

40. MartensM, Dawyndt P,CoopmanR, Gillis M,De Vos P,WillemsA:Advantages of multilocus sequence analysis for taxonomicstudies: a case study using 10 housekeeping genes in thegenus Ensifer (including former Sinorhizobium).Int J Syst EvolMicrobiol2008, 58:200-214.

272 Environmental Biotechnology

http://dx.doi.org/10.1186/1471-2180-8-21http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2105-7-321http://dx.doi.org/10.1186/1471-2105-7-240http://dx.doi.org/10.1186/gb-2007-8-7-r138http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/gb-2007-8-7-r138http://dx.doi.org/10.1186/1471-2105-7-240http://dx.doi.org/10.1186/1471-2105-7-321http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-8-21


8/8

41. Stackebrandt E, Brambilla E, Richert K:Gene sequencephylogenies of the family microbacteriaceae. Curr Microbiol2007, 55:42-46.

42. Guo Y, Zheng W, Rong X, Huang Y: A multilocus phylogeny ofthe Streptomyces griseus 16S rRNA gene clade: use ofmultilocus sequence analysis for streptomycete systematics.Int J Syst Evol Microbiol2008, 58:149-159.

43. Diederen BM, de Jong CM, Marmouk F, Kluytmans JA,Peeters MF, Van der Zee A: Evaluation of real-time PCR for theearly detection of Legionella pneumophila DNA in serumsamples. J Med Microbiol2007, 56 :94-101.

44. Vervaeren H, Temmerman R, Devos L, Boon N, Verstraete W:Introduction of a boost of Legionella pneumophila into astagnant-water model by heat treatment.FEMS Microbiol Ecol2006, 58:583-592.

45. Ratcliff RM, Lanser JA, Manning PA, Heuzenroeder MW:Sequence-based classification scheme for the genusLegionella targeting the mip gene. J Clin Microbiol1998,36:1560-1567.

46. Jolley KA, Chan MS, Maiden MC: mlstdbNet-distributed multi-locus sequence typing (MLST) databases.BMC Bioinformatics2004, 5:86doi: 10.1186/1471-2105-5-86.

47. Chevenet F, Brun C, Banuls A-L, Jacq B, Christen R: TreeDyn:towards dynamic graphics and annotations for analyses oftrees. BMC Bioinformatics 2006, 7:439-448doi: 10.1186/1471-2105-7-439.

48. Croce O, Lamarre M, Christen R: Querying the public databasesfor sequences using complex keywords contained in thefeature lines.BMC Bioinformatics2006,7:45doi: 10.1186/1471-2105-7-45.

49. Li W, Godzik A: Cd-hit: a fast program for clustering andcomparing large sets of protein or nucleotide sequences .Bioinformatics2006, 22:1658-1659.

50. Bininda-Emonds OR:transAlign: using amino acids tofacilitate the multiple alignment of protein-coding DNAsequences.BMC Bioinformatics 2005, 6:156doi: 10.1186/1471-2105-6-156.


http://dx.doi.org/10.1186/1471-2105-5-86http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-5-86

artigo bioinformatica

Documents