acelerando la bioinformatica con el grid computing

21
Acelerando la bioinformatica con el GRID computing Angel Merino Centro Nacional de Biotecnología, Unidad de Biocomputación

Upload: finn

Post on 22-Feb-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Acelerando la bioinformatica con el GRID computing. Angel Merino Centro Nacional de Biotecnología, Unidad de Biocomputación. Qué contar …. Microscopia Electrónica Qué es la EM. Cuál es el proceso de trabajo. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Acelerando la bioinformatica con el GRID computing

Acelerando la bioinformaticacon el GRID computing

Angel MerinoCentro Nacional de Biotecnología,

Unidad de Biocomputación

Page 2: Acelerando la bioinformatica con el GRID computing

Qué contar ….• Microscopia Electrónica

– Qué es la EM.– Cuál es el proceso de trabajo.

• Que se está resolviendo con la GRID: Procesos/Aplicaciones que se han “gridificado”– Maximum Likelihood– Estimación de la CTF

• Superando la barrera de potencial– Web-portal– Web/Grid Services & Workflows

• Otras aplicaciones del mundillo

Page 3: Acelerando la bioinformatica con el GRID computing

Que es la EM (I)

• La EM es una técnica de análisis estructural.

• Nos permite adentrarnos en el entorno molecular de las partículas a estudiar.

Page 4: Acelerando la bioinformatica con el GRID computing

Cual es el proceso de trabajo

Preparación de muestras.

Obtención de las imágenes.

Procesado de las imágenes y cálculo de volúmenes 3D

Page 5: Acelerando la bioinformatica con el GRID computing

Biological Material- High H2O content- Elevated radiation damage

Negative Tint- Dehydration- Structural changes / Crushing- Image comes from metal mold

Cryomicroscopy- Hydrated / Biologic-friendly- Less distorsions- Image comes from biological specimen

Que es la EM (II)

Page 6: Acelerando la bioinformatica con el GRID computing

Que es la EM (III)

Tinción negativa

Criomicroscopía

Page 7: Acelerando la bioinformatica con el GRID computing

Aberrations in the microscope optics affect the experimental images (blurring). These effect may be described by the CTF.

CTF-estimation in Xmipp may take up to half a day per micrograph. Moreover per experiment, a user processes about 100 micrographs. Therefore, grid computing is necessary.

Estimation of the CTF allows correction of the blurred images.

Estimación de la CTF (I)

Page 8: Acelerando la bioinformatica con el GRID computing

Estimación de la CTF(II)

Page 9: Acelerando la bioinformatica con el GRID computing

Estimación de la CTF (III)Por micrografía

Page 10: Acelerando la bioinformatica con el GRID computing

1000x

Maximum-Likelihood

Page 11: Acelerando la bioinformatica con el GRID computing

Maximum-Likelihood (I)Ejecución “lenta”

1 iteración

Page 12: Acelerando la bioinformatica con el GRID computing

Maximum-Likelihood(II)Ejecución “rapida” (MPI)

Page 13: Acelerando la bioinformatica con el GRID computing

Desarrollo de Maximum-Likelihood usando EGEE-GRID vs local cluster

Usando EGEE GRID

Grid

Durante el pasado mes de Noviembre se consumieron 17160 horas de CPU (casi 2 años!)

23 CPUs tiempo completo

Usando nuestro cluster local (50%) (jumilla.cnb.uam.es), para la misma actividad

20 cpu´s

Tiempo de uso real = 50% del tiempo total debido a la actividad de desarrollo que se estaba realizando

46 CPUs!!!

0

0,5

1

1,5

2

2,5M

onth

grid jumilla cluster

Environments

Page 14: Acelerando la bioinformatica con el GRID computing

Superando la barrera de potencial4 simple steps to run all jobs that you need for your experiment

1º Select your application 2º Login into the UI 3º Upload your necessary files

4º Submit your experiment, giving a notification e-mail address and your password certificate

Page 15: Acelerando la bioinformatica con el GRID computing

Superando la barrera de potencial (I)

Input from Grid portal

C++ O

bject

Submit joband publish the data(first time)Checking statusGet Output and retrieve

the output data.

JDLs

Required scripts (3)

Required input tar´s

For each JDL

Aborted or not submitted

Done (success)

First script

Second script

Third script

Run the job and publish the output data when job finishes.

Send e-mail to the notification e-mail address

El motor del portal

Page 16: Acelerando la bioinformatica con el GRID computing

Superando la barrera de potencial (II)

Workflows & Grid Services

Page 17: Acelerando la bioinformatica con el GRID computing

Grid Protein Structure Analysis Scientific objectives

Bioinformatic analysis of data produced by complete genome sequencing projects is one of the major challenge of the next years. Integrating up-to-date databanks and relevant algorithms is a clear requirement of such an analysis. Grid computing, such as the infrastructure provided by the EGEE European project, would be a viable solution to distribute data, algorithms, computing and storage resources for Genomics. Providing bioinformatician with a good interface to grid infrastructure will also be a challenge that should be successful. GPS@ web portal, Grid Protein Sequence Analysis, aims to be such an user-friendly interface for these grid genomic resources on the EGEE grid.

MethodA well-known web interface eases the access to the algorithms offered.Protein databases are stored on grid storage as flat files.Most protein sequence analysis tools are reference legacy code that is run unchanged. This tools are wrapped in grid jobs to be executed on grid resources.The algorithms output are analysed and displayed in graphic format through the web interface.

Otras aplicaciones

Page 18: Acelerando la bioinformatica con el GRID computing

Otras aplicaciones(I)

Scientific objectivesProvide docking information helping in search for new drugs.Biological goal: propose new inhibitors (drug candidates) addressed to neglected diseases.Bioinformatics goal: in silico virtual screening of drug candidate DBs.Grid goal : demonstrate to the research communities active in the area of drug discovery the relevance of grid infrastructures through the deployment of a compute intensive application.

MethodLarge scale molecular docking on malaria to compute million of potential drugs with some software and parameters settings. Docking is about computing the binding energy of a protein target to a library of potential drugs using a scoring algorithm.

In silico Drug Discovery

Page 19: Acelerando la bioinformatica con el GRID computing

Genome evolution modeling Scientific objectives

Study human evolutionary genetics and answer questions such as the geographic origin of modern human populations, the genetic signature of expanding populations, the genetic contacts between modern humans and Neanderthals, and the expected null distributions of genetic statistics applied on genome-wide data sets.

MethodSimulate the past demography (growth and migrations) of human populations into a geographically realistic landscape, by taking into account the spatial and temporal heterogeneity of the environment. Generate the molecular diversity of several samples of genes drawn at any location of the current human's range, and compare it to the observed contemporary molecular diversity.SPLATCHE uses a region sampling Bayesian framework that requires105 independent demographic and genetic simulations.

Otras aplicaciones (II)

Page 20: Acelerando la bioinformatica con el GRID computing

Para mas infoXmipp web page: www.cnb.uam.es/~bioinfo

Unit web page: http://biocomp.cnb.uam.es

NA4 EGEE biomed applications home: http://egee-na4.ct.infn.it/biomed/index.php

[email protected]

Page 21: Acelerando la bioinformatica con el GRID computing

Gracias