Difference between revisions of "ISMB 2013: BioinfoCoreWorkshop"

From BioWiki
Jump to navigationJump to search
Line 48: Line 48:
  
 
Speaker 1. Gordon Brown (Senior Scientific Officer/Bioinformatician,
 
Speaker 1. Gordon Brown (Senior Scientific Officer/Bioinformatician,
Cancer Research UK Cambridge Institute, University of Cambridge), Title to
+
Cancer Research UK Cambridge Institute, University of Cambridge), "Metadata Management: Ensuring Data Remain Useful".
be confirmed.
+
This talk will review some lessons learned in metadata management, and suggest some constructive solutions.
  
 
Speaker 2. TBC, suggested topics:
 
Speaker 2. TBC, suggested topics:

Revision as of 06:05, 23 January 2013

The ISMB Bioinfo Core Workshop call proposal can be found at [1]

Introduction

The 2013 ISMB meeting will be held 19 - 23 July in Berlin, Germany. The bioinfo-core workshop will again be 2 hours, split into two one hour topics. Within each topic we plan 2- 3 short talks (~10 mins each) by members of the community to introduce, provide an instance or problem area of the topic, followed by an interactive panel discussion with all workshop participants where the topic area will be further explored.

Read through prior workshops tweets by searching #WK3 or #BICORE at twitter.com. You will get more specific hits through searching on #WK3 #ISMB.

As in previous years we have aimed to have one topic mostly focused on science and analysis, and another which focuses on the technical aspects of running a core facility.

Introduction to the Workshop by TBD

  • New Organizing Members and Nominations to the bioinfo-core organizing committee


Topic 1 (science): Integrative analysis of large scale data

This session will look at the increasing scale of the studies required to obtain a complete understanding of a biological system. The session will have two main focusses:

1) The integration of public datasets in the analysis of novel local data

2) The analysis of studies composed of multiple different data types

In the first part we will look at how relating results obtained from novel data to existing public datasets such as those developed by the Encode project, or those present in the major public data repositories can help with data interpreation. We will discuss the limitations of such analyses and cover the options we have for the types of data to use and the degree to which we trust existing data processing and analysis when integrating with our own data. We will also look at the tools available to make such converged public private projects as simple as possible. We can also discuss how to handle the management of large amounts of public data within an existing IT infrastructure to avoid repeated downloads or data duplication.

In the second part we will look at an increasing trend to use multiple different data types to profile a biological system. Already studies are being produced which incorporate RNA-Seq, ChIP-Seq of several factors, Bisulphite-seq and maybe mass spec and imaging data. These studies will become more common in future as will studies mixing in public datasets of different types. In this part of the session we will look at the strategies we can employ to identify complex signals across these types of data, the sorts of tools we can use to both visualise and analyse these types of experiment and discuss experiences people have had and share any recommendations.


Speakers

Speaker 1 (Name TBD): [Suggested topic] A case study showing how public data could be added to newly derived private data to obtain results which would not have been possible by either alone.

Speaker 2 (Name TBD): [Suggested topic1] A case study showing how multiple different data types can be analysed to idenfify signals which span different data types and to discuss the tools available for this type of study.

or

Speaker 2 (Name TBD): [Suggested topic2] How to manage the storage and annotation of public data in a core facility.

Topic 2 (business): Tracking and reporting the analysis of biological data

A central tenet of science is that all results should be described in sufficient detail to allow them to be reproduced by others in the field. Computational analyses are forming an ever larger proportion of major papers and yet it is still often difficult to reproduce exactly the analyses described in many papers. On a more practical level a core facility has to frequently report the results of an analysis back to a scientist and to do this effectively they need systems to keep track of the analysis they have done (which may involve many blind alleys before hitting the final result), and to present this in an understandable and yet robust way.

This session will look at how different core facilities keep track of the work they do. It will look at any tools or systems people use, and the level of detail they use in recording the analysis. We will discuss the potential conflict between recording all steps in an analysis rigorously and the overhead this imposes on the speed at which different analyses can be tried.

Some groups are now producing completely automated records of their analysis in a format which allows them to be re-run on other sites. We can discuss how useful this might be in the field, both for reproducing existing analyses and for constructing new pipelines based on previous results. We can also discuss the conflict between having full details of an analysis suitable for a computer to reconstruct it and producing clear reports which explain the process undertaken in a simple way to scientists.

On a larger scale we can also discuss the increasing move towards electronic recording systems in labs and share experiences of LIMS or ELN systems and discuss how these might be integrated with our existing workflows in future.

Speakers

Speaker 1. Gordon Brown (Senior Scientific Officer/Bioinformatician, Cancer Research UK Cambridge Institute, University of Cambridge), "Metadata Management: Ensuring Data Remain Useful". This talk will review some lessons learned in metadata management, and suggest some constructive solutions.

Speaker 2. TBC, suggested topics:

- Use of experimental metadata management systems to track progress in processing samples (both wet lab and bioinformatically), for configuring analysis pipelines and recording the results, and to support data submission to public repositories.

- Practical experiences in developing, managing and documenting analysis workflows in a Core Facility setting; delivering traceable and reproducible analysis results.

Speaker Notes