artigo bioinformatica

Upload: rodolfomad

Post on 26-Feb-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/25/2019 artigo bioinformatica

    1/8

    Available online at www.sciencedirect.com

    Identifications of pathogensa bioinformatic point of viewRichard Christen

    Over the past 15 years, microbiology has undergone a

    momentous shift toward molecular methods. New sequences

    appear daily in the public databases and new computer tools

    and web servers are published on a regular basis. Major

    advances in molecular identifications of pathogens have been

    made because new biotechnology methods have appeared

    that often require a thorough in silico analysis of sequences.

    However, significant difficulties partly remain in developing

    efficient methods because the public databases contain many

    poorly annotated or partial sequences (often of environmental

    origin) and also because there are few dedicated web servers

    and curated databases.

    Addresses

    University of Nice Sophia-Antipolis and CNRS UMR 6543, Institute ofDevelopmental Biology and Cancer, Parc Valrose, Centre de Biochimie,

    F 06108 Nice, France

    Corresponding author: Christen, Richard ([email protected])

    Current Opinion in Biotechnology 2008, 19:266273

    This review comes from a themed issue on

    Environmental BiotechnologyEdited by Carla Pruzzo and Pietro Canepari

    Available online 29th May 2008

    0958-1669/$ see front matter

    # 2008 Elsevier Ltd. All rights reserved.

    DOI 10.1016/j.copbio.2008.04.003

    IntroductionIn microbiology, nucleic-acid-based diagnosticsgradually

    are replacing culture-based methods [1,2,3,4].Procedures that rely on PCR of a single geneor multilocus

    sequence typing [58] as well as arrays [9,10,1113]

    require the design of oligomers for amplification and

    hybridization. Mass sequencing [14,15,16,17] or 16S

    rRNA mass cataloging [18] produce sequences that have

    to be matched to a database of known sequences. Finallyentirely new methods are appearing [19]. The consortium

    of DDBJ, EMBL & GenBank exchanges data on a daily

    basis (URL: http://www.insdc.org) and contains almost

    every known sequence. Blast [20] is used to retrieve similar

    sequences, ACNUC [21], SRS [22] or Entrez [23] retrieve

    sequences according to keywords. There are many

    free utilities to align sequences, compute and display

    phylogenetic trees (URL: http://www.bioinformatics.org/

    ). Finally design of primers and probes can be done

    using many tools (URL: http://bioinfo.unice.fr/softwares/oligo_softwares.html ).

    Retrieval of every necessary sequence can, however, bedifficult, while the design of primers and probes is tedious

    and may result in lower quality results if the multiple

    criteria for design are not properly handled. New

    sequences are now flowing in, seemingly faster than

    programs can deal with. It is for example no longer easy

    to Blast the 16S rRNA gene sequence of a new isolate to

    find out which well known bacteria it is related to because

    most newly submitted rRNA sequences now originate

    from uncultivated clones. Housekeeping genes and

    pathogenicity gene sequences have been submitted in

    large numbers, but full sequences are not easily retrieved

    by Blast (because many are quite divergent) or by key-

    words because their annotations are often poor or not

    standard. Also, and in contrast to the community devoted

    to analyses of complete genomes, there are few centra-

    lized services or web servers that gather data, clean them

    and post them on the web with good query and analysistools. Finally, bioinformaticians continuously publish

    new tools, but there are very few studies to compare

    them and in fact analyze how good these new tools are(see for example BALIBASE for estimating new aligne-

    ment programs [24,25]).

    Detailed analyses will be restricted to waterborne bac-

    teria, for which we will review available sequences and

    possible solutions for in silico analyses of diagnostic

    methods before the real experiments.

    Choice of a target geneTarget genes for bacterial identification can be the 16S

    rRNA gene, a housekeeping gene or finally a pathogen-

    icity gene. Some species are always pathogenic, and

    targeting 16S rRNA gene sequences is often the solution

    because many sequences have been published, PCRprimers and hybridization probes have usually been

    described and tested; finally dedicated software and

    web sites are available [26]. Cases of lateral transferts

    [2729]) or too similar 16S rRNA gene sequences

    (reviewed in reference [30]) have also highlighted the

    need to use other or more rapidly evolving genes [31].Some of these genes have, however, been completely

    sequenced in very few different strains or species, making

    it dubious that truly universal or specific oligomers have

    been really designed. Also the general absence of very

    conserved domains renders primers and probes design

    difficult. Finally, there is always the chance that yet

    unknown variant sequences exist that will escape mol-

    ecular detection because of mutations. The last case

    applies to clones that become pathogenic only after

    acquisition of pathogenicity genes [32,33,34] or whenpathogenicity depends upon the genetic content, that is,

    Current Opinion in Biotechnology2008, 19:266273 www.sciencedirect.com

    mailto:[email protected]://dx.doi.org/10.1016/j.copbio.2008.04.003http://www.insdc.org/http://www.bioinformatics.org/http://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://bioinfo.unice.fr/softwares/oligo_softwares.htmlhttp://www.bioinformatics.org/http://www.insdc.org/http://dx.doi.org/10.1016/j.copbio.2008.04.003mailto:[email protected]
  • 7/25/2019 artigo bioinformatica

    2/8

    by differential regulation of some genes or integration ofgenes (or domains) that belong to the species or genus

    gene pool but are not always present in a particular clone

    [35,36]. In such cases, targeting pathogenicity genes is the

    best choice, with difficulties similar to housekeeping

    genes. For other approaches such as multilocus sequencetyping (MLST) and analyses of variable number of tan-

    dem repeats (VNTR) see references [3740] for

    examples.

    For Eukaria (often protists), the approach is very similar,

    but there are often many fewer sequences available from

    different strains or species. On the contrary, one mayexpect less divergence (due to smaller population sizes

    and slower division rates) to be present in a population.

    Finally, viruses are a very different situation, since there is

    no homologous housekeeping gene shared among viruses,and mutation rates are expected to be much higher.

    Retrieval of sequence data for the majorwaterborne pathogensA list of pathogens likely to be found in aquatic environ-

    ments was built (primarily based on WHO list

    Identifications of pathogensa bioinformatic point of view Christen 267

    Table 1

    For each taxon, the number of entries (number of different submissions) of protein coding sequences (CDS) and of genomes projects was

    analyzed

    Taxon Entries nbr CDS Genomes

    Adenoviridae (Atadenovirus, Aviadenovirus, Mastadenovirus, Siadeonvirus) 3644 5356 44

    Atadenovirus (various adenoviruses) 69 198 5Astroviridae (Avastrovirus, Mamastrovirus) 1072 1100 6

    Caliciviridae (Lagovirus, Nebraska-like virus, Norovirus, Sapovirus, Vesivirus) 6371 7410 16Hepeviridae (Hepatitis E virus) 2767 2611 1

    Mamastrovirus (Astrovirus of various hosts) 809 834 3

    Picornaviridae (Enterovirus, Hepatovirus, . . .) 21296 17975 40

    Reoviridae (Aquareovirus, . . ., Rotavirus) 8232 8015 352Enterovirus 13057 10994 16

    Hepatovirus 3605 3014 2Human enterovirus A 3010 2723 1

    Human enterovirus B 6578 5425 1

    Human enterovirus C 762 639 1

    Human enterovirus D 161 125 1Human astrovirus 792 813 1

    Rotavirus 5488 5254 33

    Sapovirus 553 608 4

    Burkholderia pseudomallei 568 27058 24

    Campylobacter coli 346 367 1

    Campylobacter jejuni 2153 11676 11Escherichia coli 39004 75822 35Legionella pneumophila 2451 15145 4

    Legionella 3386 15545 4Pseudomonas aeruginosa 35034 22340 7Salmonella typhi 191 474 0

    Salmonella 7953 41953 24Shigella 4149 31292 8Vibrio cholerae 2532 10833 17

    Vibrio parahaemolyticus 882 5775 3Vibrio vulnificus 821 10549 4Vibrio 10358 40967 34Yersinia enterocolitica 445 5093 1

    Acanthamoeba 33450 180 1Cryptosporidium parvum 11148 1262 2Cryptosporidium 39678 1796 4

    Cyclospora cayetanensis 164 2 0Dracunculus medinensis 2 0 0Entamoeba histolytica 101006 706 0

    Entamoeba 205767 843 6Giardia intestinalis 24507 1364 0Naegleria fowleri 67 38 0

    Viruseswere also queried accordingto a highertaxonomic rank because thenames used to describe them canbe quite differentin differententries.A

    table in additional materials also provides the list of most sequenced genes for a number of waterborne pathogens. Note: complete lists or synthetic

    information on genome projects can be manually obtained from URL: http://www.ncbi.nlm.nih.gov/Genomes/or URL:http://www.genomesonli-

    ne.org/. Genome numbers are for finished to in progress projects.

    www.sciencedirect.com Current Opinion in Biotechnology2008, 19:266273

    http://www.ncbi.nlm.nih.gov/Genomes/http://www.genomesonline.org/http://www.genomesonline.org/http://www.genomesonline.org/http://www.genomesonline.org/http://www.ncbi.nlm.nih.gov/Genomes/
  • 7/25/2019 artigo bioinformatica

    3/8

  • 7/25/2019 artigo bioinformatica

    4/8

    Identifications of pathogensa bioinformatic point of view Christen 269

    Figure 1

    Heatmap analysis of oligomers used in references [43,44] to identify the presence of the mip gene. Tms were calculated using the nearest

    neighbor algorithm and were then transformed into colors (corresponding Tm/color shown in Figure). Each column of the heatmap (on the right)

    corresponds to an oligomer as indicated in the box Primers identifiers. A gray square is for a Tm below 40 8C, a white square for a sequence

    www.sciencedirect.com Current Opinion in Biotechnology2008, 19:266273

  • 7/25/2019 artigo bioinformatica

    5/8

    potentiator) gene that in Legionella encodes for a surfaceprotein, required for optimal infection of macrophages.

    Querying the literature returned 44 publications that used

    mip as a target for identification, and for the purpose of

    this review, we analyzed only two recent studies [43,44].

    We retrieved a total of 278 mip sequences in Legionellaspecies, only 146 of which were distinct (not contained in

    a longer sequence). We evaluated how each oligomer

    would bind to each variant of the mip gene sequences

    (Table 3). It is particularly striking that primer Mip-R1

    shows a mismatch for most sequences in first position, a

    simple blast confirmed this problem. For the other oli-

    gomers, this analysis demonstrates that a number of

    variant sequences will probably not be well recognized.

    We also analyzed if themipgene was present inLegionella

    species different from L. pneumophila and coupled Tm

    calculation with a phylogenetic tree (Figure 1). This

    analysis demonstrates that some oligomers are indeed

    specifically targeting the mip gene in L. pneumophila

    and not in other species ofLegionella. The fact that themip gene is also present in other species ofLegionella is not

    clearly stated in these publications (but see reference

    [45]), and since lateral gene transfers are rather common

    in bacteria, it is not clear whether present primers indeed

    amplify mip genes in every L. pneumophila strain (see

    Figure 1).

    Bioinformatic toolsAside from the multipurposes tools available at NCBI,

    EBI or elsewhere, a number of web servers or programs

    may help analyses:

    GreenGenes. The greengenes web application pro-

    vides access to a 16S rRNA gene sequence alignment

    for browsing, blasting, probing, and downloading:

    URL:http://greengenes.lbl.gov.

    PubMLST. This site hosts publicly accessible MLSTdatabases and software: URL: http://pubmlst.org, see

    also reference [46].

    Legionella mipgene Sequence Database. This database

    allows the comparison of a new mip gene DNA

    sequences with reference sequences from all described

    species ofLegionella: URL:http://www.hpa.org.uk/cfi/

    bioinformatics/ewgli/legionellamips.htm.

    leBIBI. Blast on databases of SSU-rDNA, gyrB, recA,sodA, rpoB, tmRNA, tuf and groel2-hsp65 gene

    sequences and tools for bacterial identification: URL:

    http://umr5558-sud-str1.univ-lyon1.fr/lebibi/lebibi.cgi.

    ICB. Identification and classification of bacteria

    database using gyrB: URL: http://seasquirt.mbio.-

    co.jp/icb/.

    GPMS. Pathogenic bacteria strain genotyping essen-tially for epidemiological purposes based on poly-

    morphic tandem repeat typing: URL: http://

    minisatellites.u-psud.fr.

    VNTR. Molecular typing of bacteria using variablenumber tandem repeats: URL:http://vntr.csie.ntu.e-

    du.tw.

    OHM. A tool that produces heatmaps representing in

    a visual manner the Tm of primers on a set of

    sequences (can be combined with TreeDyn [47]):

    URL:http://bioinfo.unice.fr/ohm.

    270 Environmental Biotechnology

    Table 3

    Evaluation of primers and probes recently used for the identifica-

    tion of the mip genes in Legionella

    For each oligomer: column (1) Tm in 8C estimated for each mip

    sequence variant; column (2) the variant sequence; column (3) the

    number of such sequences (about 270 mip sequences available, onlyexcerpts shown). F: forward primer, R: reverse primer.

    (Figure 1 Legend Continued) too short to contain the oligomer. Upper Figure (A) excerpt ofL. pneumophila clade (possible cases of lateral

    transfert in red). Lower Figure (B) excerpt of non-L. pneumophila clade. Primer #3 shows the highest predicted Tm, but will fail on some

    sequences; primer #1 also shows quite a wide heterogeneity of predicted Tms. The full figure is available as supplementary material.

    Current Opinion in Biotechnology2008, 19:266273 www.sciencedirect.com

    http://greengenes.lbl.gov/http://pubmlst.org/http://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://umr5558-sud-str1.univ-lyon1.fr/lebibi/lebibi.cgihttp://seasquirt.mbio.co.jp/icb/http://seasquirt.mbio.co.jp/icb/http://minisatellites.u-psud.fr/http://minisatellites.u-psud.fr/http://vntr.csie.ntu.edu.tw/http://vntr.csie.ntu.edu.tw/http://bioinfo.unice.fr/ohmhttp://bioinfo.unice.fr/ohmhttp://vntr.csie.ntu.edu.tw/http://vntr.csie.ntu.edu.tw/http://minisatellites.u-psud.fr/http://minisatellites.u-psud.fr/http://seasquirt.mbio.co.jp/icb/http://seasquirt.mbio.co.jp/icb/http://umr5558-sud-str1.univ-lyon1.fr/lebibi/lebibi.cgihttp://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://www.hpa.org.uk/cfi/bioinformatics/ewgli/legionellamips.htmhttp://pubmlst.org/http://greengenes.lbl.gov/
  • 7/25/2019 artigo bioinformatica

    6/8

    A Blast server, to Blast 16S rRNA sequences on

    cultured bacteria only: URL: http://bioinfo.unice.fr/

    blast.

    DDBJ. A Blast server to blast only on 16S rRNA genesequences only (fast): URL:http://blast.ddbj.nig.ac.jp/

    top-e.html.

    The list of prokaryotic names with standing innomenclature (now including 16S rRNA accession

    numbers): URL:http://www.bacterio.cict.fr/.

    Norovirus Molecular Epidemiology Database. The

    norovirus database contains a collection of over 1000

    sequences of norovirus strains and associated epi-

    demiological data: URL: http://www.hpa.org.uk/cfi/

    bioinformatics/norwalk/norovirus.htm.

    ConclusionsIf none of the above servers can be used (this is not an

    exhaustive list), sequence retrieval, alignments, phyloge-

    nies, and design of primers can be quite time consuming

    and tedious for scientists that cannot write computer

    programs. Sequence retrieval using keywords is often

    more efficient than a Blast. SRS (Advanced Search form)or even better ACNUC or specific tools [48] should be

    preferred to Entrez, because they are more powerful for

    sequence retrieval. Combining keywords for the gene or

    gene products with species name or taxon ID and a filter

    on sequence length (very short sequences are useless) is

    often very efficient. Since annotations are not standard,

    building a list of gene products is often necessary (see

    additional materials). If there are many sequences, it is

    possible to cluster these sequences at a given similarity

    level (using blastclust or Cd-hit [49]) and align one

    representative sequence per cluster. A visual inspectionof alignments reveals sequences that do not align well;

    they are often the result of a wrong annotation or have to

    be inverted-complemented. The remaining sequences

    can then be added to this good alignment (using Clustal

    profile option for example). For protein coding gene a

    program such as Transalign [50] may be a good choice.

    When retrieving primers from publications, older papersare often useless because primers were designed using a

    very few numbers of sequences (primers can be analyzed

    using the web server cited above, to produce figures

    similar toFigure 1).

    Finally, there is a large difference between amplificationusing DNA extracted from a pure culture and DNA

    extracted from an environmental sample. Primer (P)

    binds to its target DNA (T) according to the classical

    equation [P][T]/[PT] = Km. The presence of one or two

    differences between the P sequence and the T sequence

    may strongly influence the value of Km. With DNA

    extracted from a pure culture [T] may be sufficiently

    high so that [PT] is large enough for the PCR to succeed.

    With environmental DNA, and in the presence of mis-

    match(es), the primer may bind to many other domains (atlow affinity but in many places) so that [PT] is not large

    enough to allow a successful amplification. This is why,

    for environmental studies, any published primers should

    always be carefully checked by comparison to newly

    published sequences.

    AcknowledgementsThis work was supported by funds from the European Commission for theHEALTHY WATER project (FOOD-CT-2006-036306) and a CNRS PICSto R Christen. The authors are solely responsible for the content of thispublication. It does not represent the opinion of the European Commission.The European Commission is not responsible for any use that might bemade of data appearing therein.

    Appendix A. Supplementary dataSupplementary data associated with this article can be

    found, in the online version, at doi:10.1016/j.copbio.

    2008.04.003.

    Conflict of interestNone.

    References and recommended readingPapers of particular interest, published within the annual period ofreview, have been highlighted as:

    of special interest

    of outstanding interest

    1.

    Barken KB, HaagensenJA, Tolker-Nielsen T:Advances in nucleicacid-based diagnostics of bacterial infections. Clin Chim Acta2007, 384:1-11.

    This review describes a range of different nucleic-acid-based diagnosticmethodsand providesexamples of the use of these methodsfor detectionof common bacterial infections, with a focus on automated procedures.

    2.

    Abubakar I, Irvine L, Aldus CF, Wyatt GM, Fordham R, Schelenz S,Shepstone L, Howe A, Peck M, Hunter PR: A systematic reviewof the clinical, public health and cost-effectiveness of rapid

    diagnostic tests for the detection and identification ofbacterial intestinal pathogens in faeces and food . HealthTechnol Assess 2007, 11:1-216.

    This is a (230 pages long) review provided by the Health TechnologyAssessment (HTA) program, now part of the National Institute for HealthResearch (NIHR) and based on studies evaluating diagnostic accuracy ofrapid tests were retrieved using electronic databases and handsearchingreference lists and key journals, including cost assessments. Every studyis critically evaluated.

    3.

    Tenover FC: Rapid detection and identification of bacterialpathogens using novel molecular technologies: infectioncontrol and beyond. Clin Infect Dis 2007, 44:418-423.

    A short (far from exhaustive) review comparing effectiveness of PNA-FISH, real-time PCR and pyrosequencing and discussing the use of FDA-cleared versus non-FDA-cleared assays (antibiotic resistance).

    4.

    Shneyer VS: On the species-specificityof DNA: fifty years later.Biochemistry (Mosc) 2007, 72:1377-1384.

    A short historical review of the molecular methods used to identifyprokaryotes and eukaryotes.

    5. Angenent LT, Kelley ST, St Amand A, Pace NR, Hernandez MT:Molecular identification of potential pathogens in water andair of a hospital therapy pool. Proc Natl Acad Sci USA 2005,102:4860-4865.

    6. Best EL, Fox AJ, Frost JA, Bolton FJ: Real-time single-nucleotide polymorphism profiling using Taqman technologyfor rapid recognition of Campylobacter jejuniclonalcomplexes. J Med Microbiol2005, 54:919-925.

    7. Lehmann LE, Hunfeld KP, Emrich T, Haberhausen G, Wissing H,Hoeft A, Stuber F: A multiplex real-time PCR assay for rapiddetection and differentiation of 25 bacterial and fungalpathogens from whole blood samples.Med Microbiol Immunol2007.

    Identifications of pathogensa bioinformatic point of view Christen 271

    www.sciencedirect.com Current Opinion in Biotechnology2008, 19:266273

    http://bioinfo.unice.fr/blasthttp://bioinfo.unice.fr/blasthttp://blast.ddbj.nig.ac.jp/top-e.htmlhttp://blast.ddbj.nig.ac.jp/top-e.htmlhttp://www.bacterio.cict.fr/http://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://dx.doi.org/10.1016/j.copbio.2008.04.003http://dx.doi.org/10.1016/j.copbio.2008.04.003http://dx.doi.org/10.1016/j.copbio.2008.04.003http://dx.doi.org/10.1016/j.copbio.2008.04.003http://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://www.hpa.org.uk/cfi/bioinformatics/norwalk/norovirus.htmhttp://www.bacterio.cict.fr/http://blast.ddbj.nig.ac.jp/top-e.htmlhttp://blast.ddbj.nig.ac.jp/top-e.htmlhttp://bioinfo.unice.fr/blasthttp://bioinfo.unice.fr/blast
  • 7/25/2019 artigo bioinformatica

    7/8

    8. Ciammaruconi A, Grassi S, De Santis R, Faggioni G, Pittiglio V,DAmelio R, CarattoliA, Cassone A, Vergnaud G,ListaF: Fieldablegenotyping of Bacillus anthracis and Yersinia pestis based on25-loci multi locus VNTR analysis. BMC Microbiol2008, 8:21doi: 10.1186/1471-2180-8-21.

    9. WangXW, Zhang L,Jin LQ, Jin M,Shen ZQ, AnS,Chao FH, LiJW:Development and application of an oligonucleotide

    microarray for the detection of food-borne bacterialpathogens. Appl Microbiol Biotechnol2007, 76:225-233.

    10.

    DeSantis TZ, Brodie EL, Moberg JP, Zubieta IX, Piceno YM,Andersen GL: High-density universal 16S rRNA microarrayanalysis reveals broader diversity than typical clone librarywhen sampling the environment. MicrobEcol2007, 53:371-383.

    Identification of pathogens in environmental samples often use parallel,multispecies detection systems, in order to detect any pathogens. In thisanalysis a DNA array with 2 97 851 probes was compared with 16Scloning and sequencing to evaluate the biodiversity, with the conclusionthat the array was more efficient. However, pyrosequencing technologiesare likely to replace both of the approaches compared in this work.

    11. Wiesinger-Mayr H, Vierlinger K, Pichler R, Kriegner A, Hirschl AM,Presterl E, Bodrossy L, Noehammer C: Identification of humanpathogens isolated from blood using microarray hybridisationand signal pattern recognition. BMC Microbiol2007, 7:78doi:10.1186/1471-2180-7-78.

    12. Hansen RR, Sikes HD, Bowman CN:Visual detection of labeledoligonucleotides using visible-light-polymerization-basedamplification. Biomacromolecules 2008, 9:355-362.

    13. Lin YC, Sheng WH, Chang SC, Wang JT, Chen YC, Wu RJ,Hsia KC, Li SY: Application of a microsphere-based array forrapid identification of Acinetobacter spp. with distinctantimicrobial susceptibilities.J Clin Microbiol2008, 46:612-617.

    14. Yang ZJ, Tu MZ, Liu J, Wang XL, Jin HZ: Comparison ofamplicon-sequencing, pyrosequencing and real-time PCR fordetection of YMDD mutants in patients with chronic hepatitisB. World J Gastroenterol2006, 12 :7192-7196.

    15.

    Kobayashi N, Bauer TW, Tuohy MJ, Lieberman IH, Krebs V,Togawa D, Fujishiro T, Procop GW: The comparison ofpyrosequencing molecular Gram stain, culture, andconventional Gram stain for diagnosing orthopaedicinfections. J Orthop Res 2006, 24:1641-1649.

    Sequencing more efficient than staining to differentiate Gram-positivefrom Gram-negative bacteria. Who would have bet on it in 2005?

    16. Luna RA, Fasciano LR, Jones SC, Boyanton BL Jr, Ton TT,Versalovic J: DNA pyrosequencing-based bacterial pathogenidentification in a pediatric hospital setting. J Clin Microbiol2007,45 :2985-2992.

    17. Dowd SE, Sun Y, Secor PR, Rhoads DD, Wolcott BM, James GA,Wolcott RD: Survey of bacterial diversity in chronic woundsusing Pyrosequencing, DGGE, and full ribosome shotgunsequencing.BMC Microbiol2008,8:43doi: 10.1186/1471-2180-8-43.

    18. Jackson GW, McNichols RJ, Fox GE, Willson RC: Bacterialgenotyping by 16S rRNA mass cataloging.BMC Bioinformatics2006,7:321doi: 10.1186/1471-2105-7-321.

    19. Grun J, Manka CK, Nikitin S, Zabetakis D, Comanescu G, Gillis D,Bowles J: Identification of bacteria from two-dimensionalresonant-Raman spectra. Anal Chem 2007, 79:5489-5493.

    20. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,Lipman DJ: Gapped BLAST and PSI-BLAST: a new generationof protein database search programs.Nucleic Acids Res 1997,25:3389-3402.

    21. Gouy M, Delmotte S: Remote access to ACNUC nucleotide andprotein sequence databases at PBIL. Biochimie 2008, 90:555-562.

    22. Etzold T, Ulyanov A, Argos P:SRS: information retrieval systemfor molecular biology data banks. Methods Enzymol1996,266:114-128.

    23. Schuler GD, Epstein JA, Ohkawa H, Kans JA: Entrez: molecularbiology database and retrieval system.Methods Enzymol1996,266:141-162.

    24. Thompson JD, Plewniak F, Poch O: BAliBASE: a benchmarkalignment database for the evaluation of multiple alignmentprograms. Bioinformatics 1999, 15:87-88.

    25. Conery JS: Aligning sequences by minimum descriptionlength. EURASIP J Bioinform Syst Biol 2007:72936.

    26. Kumar Y, Westram R, Kipfer P, Meier H, Ludwig W:Evaluation ofsequence alignments and oligonucleotide probes withrespect to three-dimensional structure of ribosomal RNAusing ARB software package. BMC Bioinformatics2006, 7:240doi: 10.1186/1471-2105-7-240.

    27. van Berkum P, Terefework Z, Paulin L, Suomalainen S,Lindstrom K, Eardly BD: Discordant phylogenies within the rrnloci of Rhizobia. J Bacteriol2003, 185:2988-2998.

    28. Schouls LM, Schot CS, Jacobs JA:Horizontal transfer ofsegments of the 16S rRNA genes between species of theStreptococcus anginosus group. J Bacteriol2003, 185 :7241-7246.

    29. Dewhirst FE, Shen Z, Scimeca MS, Stokes LN, Boumenna T,Chen T, Paster BJ, Fox JG: Discordant 16S and 23S rRNA genephylogenies for the genus Helicobacter: implications forphylogenetic inference and systematics. J Bacteriol2005,187:6106-6118.

    30. Janda JM, Abbott SL:16S rRNA gene sequencing for bacterialidentification in the diagnostic laboratory: pluses, perils, andpitfalls. J Clin Microbiol2007, 45:2761-2764.

    31. Santos SR, Ochman H: Identification and phylogenetic sortingof bacterial lineages with universally conserved genes andproteins. Environ Microbiol2004, 6:754-759.

    32. Smith DL, Wareing BM, Fogg PC, Riley LM, Spencer M, Cox MJ,Saunders JR, McCarthy AJ, Allison HE: Multilocuscharacterization scheme for shiga toxin-encodingbacteriophages. Appl Environ Microbiol2007, 73:8032-8040.

    33.

    Ogura Y, Ooka T, Asadulghani, Terajima J, Nougayrede JP,Kurokawa K, Tashiro K, Tobe T, Nakayama K, Kuhara S et al.:Extensive genomic diversity and selective conservation ofvirulence-determinants in enterohemorrhagicEscherichia colistrains of O157 and non-O157 serotypes. Genome Biol2007,8:R138doi: 10.1186/gb-2007-8-7-r138.

    A systematic whole genome comparison between O157 and non-O157

    EHEC strains using microarray and whole genome PCR scanning ana-lyses. An example of modern analyses and comparisons of whole gen-omes to understand phenotypes and their evolutions in time.

    34.

    Zhang Y, Laing C, Steele M, Ziebell K, Johnson R, Benson AK,Taboada E, GannonVP: Genome evolution in majorEscherichiacoliO157:H7 lineages. BMC Genomics2007, 8:121doi: 10.1186/1471-2164-8-121.

    Same as reference [33], but using 6167 50-mer oligonucleotides whole-genome-based microarrays for E. coli.

    35. Hsiao A, Liu Z, Joelsson A, Zhu J: Vibrio cholerae virulenceregulator-coordinated evasion of host immunity. Proc NatlAcad Sci USA2006, 103:14542-14547.

    36. PangB,YanM, Cui Z,Ye X,Diao B,RenY,GaoS, Zhang L,Kan B:Genetic diversity of toxigenic and nontoxigenic Vibriocholerae serogroups O1 and O139 revealed by array-basedcomparative genomic hybridization.J Bacteriol2007, 189:4837-4849.

    37. FoxAJ, Taha MK,Vogel U: Standardized nonculture techniquesrecommended for European reference laboratories. FEMSMicrobiol Rev2007, 31:84-88.

    38. Turner KM, Feil EJ:The secret life of the multilocus sequencetype. Int J Antimicrob Agents 2007, 29:129-135.

    39. Chang CH, Chang YC, Underwood A, Chiou CS, Kao CY:VNTRDB: a bacterial variable number tandem repeat locusdatabase. Nucleic Acids Res 2007, 35:D416-421.

    40. MartensM, Dawyndt P,CoopmanR, Gillis M,De Vos P,WillemsA:Advantages of multilocus sequence analysis for taxonomicstudies: a case study using 10 housekeeping genes in thegenus Ensifer (including former Sinorhizobium).Int J Syst EvolMicrobiol2008, 58:200-214.

    272 Environmental Biotechnology

    Current Opinion in Biotechnology2008, 19:266273 www.sciencedirect.com

    http://dx.doi.org/10.1186/1471-2180-8-21http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2105-7-321http://dx.doi.org/10.1186/1471-2105-7-240http://dx.doi.org/10.1186/gb-2007-8-7-r138http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/1471-2164-8-121http://dx.doi.org/10.1186/gb-2007-8-7-r138http://dx.doi.org/10.1186/1471-2105-7-240http://dx.doi.org/10.1186/1471-2105-7-321http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2180-8-43http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-7-78http://dx.doi.org/10.1186/1471-2180-8-21
  • 7/25/2019 artigo bioinformatica

    8/8

    41. Stackebrandt E, Brambilla E, Richert K:Gene sequencephylogenies of the family microbacteriaceae. Curr Microbiol2007, 55:42-46.

    42. Guo Y, Zheng W, Rong X, Huang Y: A multilocus phylogeny ofthe Streptomyces griseus 16S rRNA gene clade: use ofmultilocus sequence analysis for streptomycete systematics.Int J Syst Evol Microbiol2008, 58:149-159.

    43. Diederen BM, de Jong CM, Marmouk F, Kluytmans JA,Peeters MF, Van der Zee A: Evaluation of real-time PCR for theearly detection of Legionella pneumophila DNA in serumsamples. J Med Microbiol2007, 56 :94-101.

    44. Vervaeren H, Temmerman R, Devos L, Boon N, Verstraete W:Introduction of a boost of Legionella pneumophila into astagnant-water model by heat treatment.FEMS Microbiol Ecol2006, 58:583-592.

    45. Ratcliff RM, Lanser JA, Manning PA, Heuzenroeder MW:Sequence-based classification scheme for the genusLegionella targeting the mip gene. J Clin Microbiol1998,36:1560-1567.

    46. Jolley KA, Chan MS, Maiden MC: mlstdbNet-distributed multi-locus sequence typing (MLST) databases.BMC Bioinformatics2004, 5:86doi: 10.1186/1471-2105-5-86.

    47. Chevenet F, Brun C, Banuls A-L, Jacq B, Christen R: TreeDyn:towards dynamic graphics and annotations for analyses oftrees. BMC Bioinformatics 2006, 7:439-448doi: 10.1186/1471-2105-7-439.

    48. Croce O, Lamarre M, Christen R: Querying the public databasesfor sequences using complex keywords contained in thefeature lines.BMC Bioinformatics2006,7:45doi: 10.1186/1471-2105-7-45.

    49. Li W, Godzik A: Cd-hit: a fast program for clustering andcomparing large sets of protein or nucleotide sequences .Bioinformatics2006, 22:1658-1659.

    50. Bininda-Emonds OR:transAlign: using amino acids tofacilitate the multiple alignment of protein-coding DNAsequences.BMC Bioinformatics 2005, 6:156doi: 10.1186/1471-2105-6-156.

    Identifications of pathogensa bioinformatic point of view Christen 273

    www.sciencedirect.com Current Opinion in Biotechnology2008, 19:266273

    http://dx.doi.org/10.1186/1471-2105-5-86http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-6-156http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-7-45http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-7-439http://dx.doi.org/10.1186/1471-2105-5-86