ISMB 2019: BioinfoCoreWorkshop

From BioWiki
Revision as of 22:01, 25 July 2019 by Mcm (talk | contribs) (→‎Schedule)
Jump to navigationJump to search

Dinner information

We will be leaving at 7.30 Wednesday from the reception hall (near the ribbon stand) or see you at the restaurant

Details: Veranda Pellicanò Birskopfweglein 7, 4052 Basel, Switzerland +41 61 311 55 01

https://maps.app.goo.gl/gwWciAvukf7ARhgo9


Workshop Overview

The bioinfo-core workshop is scheduled for Monday, July 22, 2019, from 10:15 to 12:40 pm at the Congress Center in Basel.

The bioinformatics core workshop is a workshop by practitioners and managers of Core Facilities for all members of core facilities, including scientists, engineers, analysts, operations and management staff. In this 16th year of bringing the Core community together at ISMB, we will explore topics relevant to bioinformatics core facilities through lightning talks and demos followed by small-group break out discussions with insights brought back to the full audience for further discussion and knowledge sharing.

Organizers:

  • Madelaine Gogol, Stowers Institute, United States
  • Hemant Kelkar, University of North Carolina, United States
  • Alastair Kerr, CRUK-MI, University of Manchester, United Kingdom
  • Brent Richter, Partners HealthCare of Massachusetts General and Brigham and Women’s Hospitals, United States
  • Alberto Riva, University of Florida, United States

Social Events:

  • ISCB Markthalle event, Tuesday, July 23rd, 8pm (look for bioinfo-core signs)
  • Wednesday night dinner, Veranda Pelicano, 8pm (meet outside congress center at 7:30pm to walk over), email mcm@stowers.org to RSVP

Additional related opportunity:

Part A: Technologies and Analytical Methods

Machine Learning, AI, single cell RNA-seq analysis, and conda/bioconda.

Part B: Communication and Training

Communication and project management tools and training offered by cores.

Part C: Small group discussion

During this hour-long session, audience members will divide into groups based on their own interests. Groups will come up with their main take away points and bring them back to the main audience for knowledge sharing and for further discussion. Topics may include all previous presentation areas as well as other areas of interest to running or working within a bioinformatics core facility.

Part D: Pipeline Demo

Demo of nextflow

Schedule

Time Title Authors
10:20 - 10:30 AM Transitioning bioinformatics core to support biomedical AI/ML research - lessons learned (slides) Yang Fann, NIH, United States
10:30 - 10:40 AM Supporting single cell RNA-seq analysis: A Core's Perspective (slides) Shannan Ho Sui, Harvard School of Public Health, United States
10:40 - 10:50 AM Conda and Bioconda, the best thing since sliced bread ([Conda and Bioconda The best thing since sliced bread.pdf slides]) Devon Ryan, Max Planck Institute, Germany
10:50 - 11:00 AM Improving project management and tracking with Asana and Toggl (slides) Sara Brin Rosenthal, UCSD, United States
11:00 - 11:10 AM Bioinformatics training (in the context of a core) ([Bioinfo-core training.pdf]) Radhika Khetani, Harvard School of Public Health, United States
11:10 - 11:20 AM Development of bioinformatics workshop by a core facility ([Riva-ISMB19-slides-final.pdf]) Alberto Riva, University of Florida, United States
11:20 - 11:55 AM Small Group Discussions
11:55 AM - 12:20 PM Small Group Reports
12:20 PM - 12:35 PM nf-core - A community effort to collect a curated set of pipelines built using Nextflow (https://nf-co.re/). Harshil Patel, The Francis Crick Institute, United Kingdom

Workshop Discussion

175 total people over the 2.5 hours (over capacity within room). 55 people participated for the full 2.5 hours including participation in the breakout sessions and discussions. 75 people for final NextFlow Demo

  • Transitioning bioinformatics core to support biomedical AI/ML research - lessons learned
    • Large, diverse datasets from multiple sources both private and public from around the world.
  • Supporting single cell RNA-seq analysis: A Core's Perspective
    • Single cell growing in demand over the last 5 years. Data analysis is becoming the bottleneck. Taking a community based approach by collaborating with other HSPS teams and other schools (HMS) to tackle the problem: sequencing core (de-multiplexing), labs (iterative; requires research input--is cell cycling part or mitochondria?), training, etc.
    • built out bcbio python toolkit with 62 international contributors.
    • settled on serat suite of tools but also uses many others such as multicca
  • Conda and Bioconda, the best thing since sliced bread
    • Installing Software--get asked for help to install all kinds of software, particularly ones that carry many dependencies.
    • with Conda, root access not needed, ever. dependencies are handled for you.
    • free and can add your own packages
    • module load activates a conda environment behind the scene for them
    • bioconductor packages in bioconda. for every package they also compile a singularity and docker container. Biocontainer
    • CoreOS--Quay.
    • 1700 packages upgraded over a week. behind the scenes, bioconductor upgrading
    • bioconda has 700+ contributors: release your tools using bioconda.
  • Improving project management and tracking with Asana and Toggl
    • Fee for Service Center with up to 324 projects over the last 4 years.
    • Need to Track projects intra-team: transition them 1 team member to another as the project cycles through the experts
    • Analysis can be punctuated with long periods of time while investigator writes papers and grants. Needs to pick up history sometimes a year later.
    • Asana: have defined a workflow within Asana that includes intake, waiting periods, in progress systems, close out and bill
    • archive data to S3.
    • implemented toggl to track time on each project and subtask. Integrates with Asana for project management components.
    • allows for obtaining better estimates to people. Have found in general they underestimate work.
  • Bioinformatics training (in the context of a core)
    • Funders provide FTEs dedicated to training (harvard catalyst, HMS)
    • interplay between training and consulting: surge in single cell analysis highlights need for training in this technology
    • 2/3 time spent on training, the remainder on consulting and understanding best practices
    • partner with faculty on teaching for credit--e.g. an R component for their cause
    • 10:1 student to instructor ratios, 25 per class. Use local resources such as their HPC system. Publish materials on GitHub
  • Development of bioinformatics workshop by a core facility
    • being asked to provide practical bioinformatics training
    • challenges: large and diverse audience which makes it hard to develop a suitable curriculum, limited to 8x1hour courses, need to find source of support
    • partnered with the cancer center for admin support, the library for 5-seat lab, faculty for some lectures and research computing for the HiPerGator cluster with a dedicated allocation of cores.
    • successful: filled 50 spots in just a few days and over ½ attending all lectures. videorecorded and publicly available.

Breakout sessions

  • Training
    • chunk out training and repackage and create efficiency
    • signups--under subscription vs. over. Charging to put some skin in the game
    • Access to compute.
    • Google and AWS use, and cost effective. use of jupyter notebooks are particularly cheap
  • Single Cell
    • help people help themselves.
    • shiny apps
    • what let's you know it worked properly? primer dimers, cell ranger but serat R package is the main thing that came out of it.
    • need to talk about the standard set of thresholds
  • Project Management
    • from Excel to google docs
    • Asana, trello, Jira
    • time tracking with Toggle and Harvast (app on phone, laptop, etc)
    • Wants: Confluence to integrate project management together with documentation?
    • fees help manage demand and help finance pipeline development
  • Conda/bioconda reproducibility

Demos

  • Nextflow
    • manages reproducibility. integrates with many other schedulers
    • uses Conda
    • AWS iGenomes
    • git repo at nf-core/configs and test datasets at nf-core/test-datasets