ISMB 2011: Workshop on Analysis Pipelines for High Throughput Sequencing

From BioWiki
Jump to: navigation, search

Automated or Supervised Analysis?


The volume of data coming through core facilities from High Throughput Sequencing has been steadily increasing to the point where many facilities are now limited by the amount of time it takes to analyze this data. This session therefore explored the use of automated pipelines for the routine analysis of this kind of data. The aim of the session was to take a critical look at whether this kind of automated analysis was reliable enough to pass its output on to end users, and to see what we might be missing by taking such an automated approach versus a more detailed and supervised approach.


Simon's talk presented a number of case studies which demonstrated how automated analysis pipelines can give misleading information and where closer examination of the results can find either interesting scientific observations, or technical artefacts which can confuse the pipeline. Examples ranged from basic data collection and problems with the Illumina sequencing pipeline (which is the one automated pipeline most core facilities both run and trust), to problems with mapping and quantiation.