Difference between revisions of "ISMB 2017: BioinfoCoreWorkshop"

From BioWiki
Jump to navigationJump to search
m (Tidied links etc for Phil's talk.)
 
(2 intermediate revisions by 2 users not shown)
Line 17: Line 17:
 
Speaker: '''Russell Hamilton''', University of Cambridge<br>
 
Speaker: '''Russell Hamilton''', University of Cambridge<br>
 
Time: 10:00-10:15<br>
 
Time: 10:00-10:15<br>
 +
 +
There are many facets to setting up a new bioinformatics core facility. These include: implementing pipelines, data sharing, acquiring domain expertise, teaching, project management and staff recruitment / management. I will introduce the Centre for Trophoblast Research and how I have tailored the set-up of the bioinformatics facility to best support research within the department. I will outline the successful strategies implemented over the first year as well some of the pitfalls. I’ll then detail my future strategies to ensure a thriving core facility for year 2 and beyond.
  
  
Line 47: Line 49:
 
      
 
      
 
Different organizational structures: study/survey opportunity--reporting to an academic?  working inside institutional funding?
 
Different organizational structures: study/survey opportunity--reporting to an academic?  working inside institutional funding?
 
  
 
===Ensuring Reproducibility===
 
===Ensuring Reproducibility===
Line 57: Line 58:
 
[[File:Phil_Ewels_ISMB_BioinfoCore_2017.pdf]]
 
[[File:Phil_Ewels_ISMB_BioinfoCore_2017.pdf]]
  
Core facility for all of Sweden.  Maintain a continuum of QC, from fully automated and rigorous using MultiQC to the occasional QC for development. utilizes continuous integration: GitHub, docker and travis CL.   uses [https://github.com/search?q=topic%3Anextflow&type=Repositories Nextflow] for NGI-RNAseq and others.
+
Core facility for all of Sweden.  Maintain a continuum of QC, from fully automated and rigorous using MultiQC to the occasional in-depth QC validations. Utilizes continuous integration for analysis pipelines: GitHub, docker and Travis CI. Uses [http://nextflow.io/ Nextflow] for [https://github.com/SciLifeLab/NGI-RNAseq NGI-RNAseq] and others ([https://github.com/SciLifeLab/NGI-ChIPseq NGI-ChIPseq], [https://github.com/SciLifeLab/NGI-MethylSeq NGI-MethylSeq], [https://github.com/SciLifeLab/NGI-smRNAseq NGI-smRNAseq], [https://github.com/SciLifeLab/CAW Cancer Analysis Workflow]). Everything is on GitHub, see http://opensource.scilifelab.se, http://multiqc.info, https://github.com/SciLifeLab.
everything is on GitHubhttp://opensource.scilifelab.se, http://multiqc.info, http://GitHub.com/SciLifeLab.
 
  
 
'''Reproducible and fully documented data analyses at the Functional Genomics Center Zurich'''<br>
 
'''Reproducible and fully documented data analyses at the Functional Genomics Center Zurich'''<br>
Line 65: Line 65:
  
 
Leverages an Open Technology Platform since 2002
 
Leverages an Open Technology Platform since 2002
 +
  
 
'''Reproducibility Discussion: 12-12:30'''
 
'''Reproducibility Discussion: 12-12:30'''
  
Opitz: How do you deal with external clients and data backup?  Data is backed up by policy.  the users signs up for retention of their data for a period of time--industry and external users understand their data is captured and kept for 3-6 months.  QC results are kept forever but original data is purged.
+
''Opitz: How do you deal with external clients and data backup?'' Data is backed up by policy.  the users signs up for retention of their data for a period of time--industry and external users understand their data is captured and kept for 3-6 months.  QC results are kept forever but original data is purged.
  
What is everyone's definition of reproducibility?  is it that you are able to run the data again in 10 years and get the same results and are you ensured to get the same results?  Ewels has everything on GitHub and can re-run data using  specific version of tools branched on GitHub.  
+
''What is everyone's definition of reproducibility?'' is it that you are able to run the data again in 10 years and get the same results and are you ensured to get the same results?  Ewels has everything on GitHub and can re-run data using  specific version of tools branched on GitHub.  
 
[http://singularity.lbl.gov singularity] to run containers on HPC.   
 
[http://singularity.lbl.gov singularity] to run containers on HPC.   
 
ISO certification: validation steps, everything is audited, complete documentation system around the IT and informatics systems.   
 
ISO certification: validation steps, everything is audited, complete documentation system around the IT and informatics systems.   
  
Do you have a sense as to whether your users understand the investments you have done to setup such a rigorous system?  a lot of people couldn't reproduce their old pipeline and didn't trust them. They've written the QC measures to demonstrate quality,  
+
''Do you have a sense as to whether your users understand the investments you have done to setup such a rigorous system?'' A lot of people couldn't reproduce their old pipeline and didn't trust them. They've written the QC measures to demonstrate quality,  
 
the value and improve reproducibility and trust.
 
the value and improve reproducibility and trust.
 
try to develop interactive tools with SHINY.
 
try to develop interactive tools with SHINY.

Latest revision as of 06:08, 25 July 2017

We are holding a Bioinfo-core workshop at the ISMB/ECCB meeting in Prague. We have been provided a half-day slot in the program on Monday, July 24, 2017, 10:00 am – 12:30 pm

Standing on Two Legs: Managing Operations in a Core and Ensuring Scientific Reproducibility

Workshop Structure

The workshop is split into two sessions with a required break between. Each session will have two 15 minute talks followed by a 30 minute discussion.

  • The first slot will have 2 15 minute talks on the topic of Managing Core Facilities followed by a 40 minute audiance discussion. After the break we will have 2 15 minute talks about Ensuring Scientific Reproducibility, followed by a 40 minute panel discussion.

Workshop topics

Managing a Core Facility

Setting up a new bioinformatics core facility: a first year review
Speaker: Russell Hamilton, University of Cambridge
Time: 10:00-10:15

There are many facets to setting up a new bioinformatics core facility. These include: implementing pipelines, data sharing, acquiring domain expertise, teaching, project management and staff recruitment / management. I will introduce the Centre for Trophoblast Research and how I have tailored the set-up of the bioinformatics facility to best support research within the department. I will outline the successful strategies implemented over the first year as well some of the pitfalls. I’ll then detail my future strategies to ensure a thriving core facility for year 2 and beyond.


Managing people in a core facility
Speaker: Annette McGrath, CSIRO The University of Queensland
Time: 10:15-10:30


One constant of running a core facility: have to deal with people at all levels: funders, staff, investigators/clients, etc. Difference in your clients affect what kind of staff, in terms of skills and personalities, need to be maintained in a core: need for pipeline redundancy and translation to staffing, do they need to interact directly with clients? customer service is important, for example. Managers have to take care of their people: do they have what they need in computers, tools and training? ensure they can take pride in their work, work out time/priority conflicts, do they have the big picture of the mission on the organization, the core and their clients? Talk to your folks as to what motivates them and ways to stay engaged. Assign projects to stretch their abilities and instill a sense of ownership to them for their projects. celebrate successes. Annette has always Advocated for the need for bioinformatics within her organizations--a priority.

Some Questions: What's your biggest horror story? what where some of the traits and abilities you look for in a candidate? what's the size of your team(s) and the reporting structure?

Management Discussion: 10:30-11:10

balance of operating a core and research: research is secondary, but how are you judged? how do you manage phd students through the core? they are co-supervised. a fair number of people in the audience maintain both research and service. how are these balanced? administrative part of running the core comes first. how many people are hired into academic tracks vs. professional track? The problem of career progression: none at all--hired into a position and remain at that hiring position. This was highlighted at a recent Cores conference in the UK. what is interesting is that funders were present and discussed putting together a professional development track to parallel an academic track--Wellcome, etc. what is rewarded? what is the balance in providing training vs. owning the expertise in a specific analytical domain? Train collaborators? how much of the analysis is a core doing by themselves and how much to empower other to perform. self-help is important to get started and to answer low-live questions. However, development of this body of knowledge in how-to articles or knowledge bases is challenging to do as it takes time to put into the effort. another potential topic: couple of discussion around: training and education--how much do you train and what? What level? does it cut into your business? training at CSIRO--develop a mission and stay focused. aim to provide others literacy so they understand some of the basic information. "data internships" where a customer interns with a bioinformatician for a week or 2. How do you stay organized as you grow in people? is it manageable: are you all working like mad vs. same number of projects, but they are now more shared (greater bandwidth). the "embedded" bioinformatician: how to keep them engaged in the core. dynamic of research bioinformaticians embedded in the core--these folks generally feel most isolated, reach out and engage them as a community.

Different organizational structures: study/survey opportunity--reporting to an academic? working inside institutional funding?

Ensuring Reproducibility

Developing Reliable QC at the Swedish National Genomics Infrastructure
Speaker: Phil Ewels, SciLifeLab (Sweden)
Time: 11:30-11:45
File:Phil Ewels ISMB BioinfoCore 2017.pdf

Core facility for all of Sweden. Maintain a continuum of QC, from fully automated and rigorous using MultiQC to the occasional in-depth QC validations. Utilizes continuous integration for analysis pipelines: GitHub, docker and Travis CI. Uses Nextflow for NGI-RNAseq and others (NGI-ChIPseq, NGI-MethylSeq, NGI-smRNAseq, Cancer Analysis Workflow). Everything is on GitHub, see http://opensource.scilifelab.se, http://multiqc.info, https://github.com/SciLifeLab.

Reproducible and fully documented data analyses at the Functional Genomics Center Zurich
Speaker: Lennart Opitz, University of Zurich
Time: 11:45-12

Leverages an Open Technology Platform since 2002


Reproducibility Discussion: 12-12:30

Opitz: How do you deal with external clients and data backup? Data is backed up by policy. the users signs up for retention of their data for a period of time--industry and external users understand their data is captured and kept for 3-6 months. QC results are kept forever but original data is purged.

What is everyone's definition of reproducibility? is it that you are able to run the data again in 10 years and get the same results and are you ensured to get the same results? Ewels has everything on GitHub and can re-run data using specific version of tools branched on GitHub. singularity to run containers on HPC. ISO certification: validation steps, everything is audited, complete documentation system around the IT and informatics systems.

Do you have a sense as to whether your users understand the investments you have done to setup such a rigorous system? A lot of people couldn't reproduce their old pipeline and didn't trust them. They've written the QC measures to demonstrate quality, the value and improve reproducibility and trust. try to develop interactive tools with SHINY. training on experimental design and coaching, involving the core in experimental design. It's an advantage to work in a center of core labs, to work together with the sequencing lab/informatics lab, etc so the kick off is all done together. also advantage to have wet lab right next store. idea of an interactive tool that exports a reproducible script.