ISMB 2013: BioinfoCoreWorkshop

From BioWiki
Revision as of 02:41, 21 July 2013 by Bgrichter (talk | contribs)
Jump to navigationJump to search

The ISMB Bioinfo Core Workshop call proposal can be found at on twitter at #WK02

Introduction

The 2013 ISMB meeting will be held 19 - 23 July in Berlin, Germany. The bioinfo-core workshop will again be 2 hours, split into two one hour topics. Within each topic we plan 2- 3 short talks (~10 mins each) by members of the community to introduce, provide an instance or problem area of the topic, followed by an interactive panel discussion with all workshop participants where the topic area will be further explored.

Read through prior workshops tweets by searching #WK3 or #BICORE at twitter.com. You will get more specific hits through searching on #WK3 #ISMB.

As in previous years we have aimed to have one topic mostly focused on science and analysis, and another which focuses on the technical aspects of running a core facility.

Introduction to the Workshop by Brent Richter

  • New Organizing Members and Nominations to the bioinfo-core organizing committee


Topic 1 (science): Integrative analysis of large scale data

Moderated by Simon Andrews and David Sexton

This session will look at the increasing scale of the studies required to obtain a complete understanding of a biological system. The session will have two main focusses:

1) The integration of public datasets in the analysis of novel local data

2) The analysis of studies composed of multiple different data types

In the first part we will look at how relating results obtained from novel data to existing public datasets such as those developed by the Encode project, or those present in the major public data repositories can help with data interpreation. We will discuss the limitations of such analyses and cover the options we have for the types of data to use and the degree to which we trust existing data processing and analysis when integrating with our own data. We will also look at the tools available to make such converged public private projects as simple as possible. We can also discuss how to handle the management of large amounts of public data within an existing IT infrastructure to avoid repeated downloads or data duplication.

In the second part we will look at an increasing trend to use multiple different data types to profile a biological system. Already studies are being produced which incorporate RNA-Seq, ChIP-Seq of several factors, Bisulphite-seq and maybe mass spec and imaging data. These studies will become more common in future as will studies mixing in public datasets of different types. In this part of the session we will look at the strategies we can employ to identify complex signals across these types of data, the sorts of tools we can use to both visualise and analyse these types of experiment and discuss experiences people have had and share any recommendations.


Speakers

Speaker 1 Mikhail Spivakov, Babraham Institute, United Kingdom: Integrative analysis of large scale data


Speaker 2 Suraj Menon, Cancer Research UK Cambridge Research Institute, United Kingdom


Topic 2 (business): Tracking and reporting the analysis of biological data

Moderated by Hans-Rudolf Hotz, Matthew Eldridge

A central tenet of science is that all results should be described in sufficient detail to allow them to be reproduced by others in the field. Computational analyses are forming an ever larger proportion of major papers and yet it is still often difficult to reproduce exactly the analyses described in many papers. On a more practical level a core facility has to frequently report the results of an analysis back to a scientist and to do this effectively they need systems to keep track of the analysis they have done (which may involve many blind alleys before hitting the final result), and to present this in an understandable and yet robust way.

This session will look at how different core facilities keep track of the work they do. It will look at any tools or systems people use, and the level of detail they use in recording the analysis. We will discuss the potential conflict between recording all steps in an analysis rigorously and the overhead this imposes on the speed at which different analyses can be tried.

Some groups are now producing completely automated records of their analysis in a format which allows them to be re-run on other sites. We can discuss how useful this might be in the field, both for reproducing existing analyses and for constructing new pipelines based on previous results. We can also discuss the conflict between having full details of an analysis suitable for a computer to reconstruct it and producing clear reports which explain the process undertaken in a simple way to scientists.

On a larger scale we can also discuss the increasing move towards electronic recording systems in labs and share experiences of LIMS or ELN systems and discuss how these might be integrated with our existing workflows in future.

Speakers

Speaker 1. Jim Cavalcoli, University of Michigan, United States "Metadata Management: Ensuring Data Remain Useful". This talk will review some lessons learned in metadata management, and suggest some constructive solutions.

Speaker 2. Kim-Ahn Le Cao, University of Queensland, Australia


Speaker Notes