Difference between revisions of "ISMB 2019: BioinfoCoreWorkshop"
Line 1: | Line 1: | ||
+ | = Dinner information = | ||
+ | |||
+ | We will be leaving at 7.30 Wednesday from the reception hall (near the ribbon stand) or see you at the restaurant | ||
+ | |||
+ | Details: | ||
+ | Veranda Pellicanò | ||
+ | Birskopfweglein 7, 4052 Basel, Switzerland | ||
+ | +41 61 311 55 01 | ||
+ | |||
+ | https://maps.app.goo.gl/gwWciAvukf7ARhgo9 | ||
+ | |||
+ | |||
=Workshop Overview= | =Workshop Overview= | ||
Revision as of 07:59, 24 July 2019
Contents
Dinner information
We will be leaving at 7.30 Wednesday from the reception hall (near the ribbon stand) or see you at the restaurant
Details: Veranda Pellicanò Birskopfweglein 7, 4052 Basel, Switzerland +41 61 311 55 01
https://maps.app.goo.gl/gwWciAvukf7ARhgo9
Workshop Overview
The bioinfo-core workshop is scheduled for Monday, July 22, 2019, from 10:15 to 12:40 pm at the Congress Center in Basel.
The bioinformatics core workshop is a workshop by practitioners and managers of Core Facilities for all members of core facilities, including scientists, engineers, analysts, operations and management staff. In this 16th year of bringing the Core community together at ISMB, we will explore topics relevant to bioinformatics core facilities through lightning talks and demos followed by small-group break out discussions with insights brought back to the full audience for further discussion and knowledge sharing.
Organizers:
- Madelaine Gogol, Stowers Institute, United States
- Hemant Kelkar, University of North Carolina, United States
- Alastair Kerr, CRUK-MI, University of Manchester, United Kingdom
- Brent Richter, Partners HealthCare of Massachusetts General and Brigham and Women’s Hospitals, United States
- Alberto Riva, University of Florida, United States
Social Events:
- ISCB Markthalle event, Tuesday, July 23rd, 8pm (look for bioinfo-core signs)
- Wednesday night dinner, Veranda Pelicano, 8pm (meet outside congress center at 7:30pm to walk over), email mcm@stowers.org to RSVP
Additional related opportunity:
- AEBC2 Workshop - Friday, July 26th.
Part A: Technologies and Analytical Methods
Machine Learning, AI, single cell RNA-seq analysis, and conda/bioconda.
Part B: Communication and Training
Communication and project management tools and training offered by cores.
Part C: Small group discussion
During this hour-long session, audience members will divide into groups based on their own interests. Groups will come up with their main take away points and bring them back to the main audience for knowledge sharing and for further discussion. Topics may include all previous presentation areas as well as other areas of interest to running or working within a bioinformatics core facility.
Part D: Pipeline Demo
Demo of nextflow
Schedule
Time | Title | Authors |
10:20 - 10:30 AM | Transitioning bioinformatics core to support biomedical AI/ML research - lessons learned | Yang Fann, NIH, United States |
10:30 - 10:40 AM | Supporting single cell RNA-seq analysis: A Core's Perspective | Shannan Ho Sui, Harvard School of Public Health, United States |
10:40 - 10:50 AM | Conda and Bioconda, the best thing since sliced bread | Devon Ryan, Max Planck Institute, Germany |
10:50 - 11:00 AM | Improving project management and tracking with Asana and Toggl | Sara Brin Rosenthal, UCSD, United States |
11:00 - 11:10 AM | Bioinformatics training (in the context of a core) | Radhika Khetani, Harvard School of Public Health, United States |
11:10 - 11:20 AM | Development of bioinformatics workshop by a core facility | Alberto Riva, University of Florida, United States |
11:20 - 11:55 AM | Small Group Discussions | |
11:55 AM - 12:20 PM | Small Group Reports | |
12:20 PM - 12:35 PM | nf-core - A community effort to collect a curated set of pipelines built using Nextflow (https://nf-co.re/). | Harshil Patel, The Francis Crick Institute, United Kingdom |
Workshop Discussion
175 total people over the 2.5 hours (over capacity within room). 55 people participated for the full 2.5 hours including participation in the breakout sessions and discussions. 75 people for final NextFlow Demo
- Transitioning bioinformatics core to support biomedical AI/ML research - lessons learned
- Large, diverse datasets from multiple sources both private and public from around the world.
- Supporting single cell RNA-seq analysis: A Core's Perspective
- Single cell growing in demand over the last 5 years. Data analysis is becoming the bottleneck. Taking a community based approach by collaborating with other HSPS teams and other schools (HMS) to tackle the problem: sequencing core (de-multiplexing), labs (iterative; requires research input--is cell cycling part or mitochondria?), training, etc.
- built out bcbio python toolkit with 62 international contributors.
- settled on serat suite of tools but also uses many others such as multicca
- Conda and Bioconda, the best thing since sliced bread
- Installing Software--get asked for help to install all kinds of software, particularly ones that carry many dependencies.
- with Conda, root access not needed, ever. dependencies are handled for you.
- free and can add your own packages
- module load activates a conda environment behind the scene for them
- bioconductor packages in bioconda. for every package they also compile a singularity and docker container. Biocontainer
- CoreOS--Quay.
- 1700 packages upgraded over a week. behind the scenes, bioconductor upgrading
- bioconda has 700+ contributors: release your tools using bioconda.
- Improving project management and tracking with Asana and Toggl
- Fee for Service Center with up to 324 projects over the last 4 years.
- Need to Track projects intra-team: transition them 1 team member to another as the project cycles through the experts
- Analysis can be punctuated with long periods of time while investigator writes papers and grants. Needs to pick up history sometimes a year later.
- Asana: have defined a workflow within Asana that includes intake, waiting periods, in progress systems, close out and bill
- archive data to S3.
- implemented toggl to track time on each project and subtask. Integrates with Asana for project management components.
- allows for obtaining better estimates to people. Have found in general they underestimate work.
- Bioinformatics training (in the context of a core)
- Funders provide FTEs dedicated to training (harvard catalyst, HMS)
- interplay between training and consulting: surge in single cell analysis highlights need for training in this technology
- 2/3 time spent on training, the remainder on consulting and understanding best practices
- partner with faculty on teaching for credit--e.g. an R component for their cause
- 10:1 student to instructor ratios, 25 per class. Use local resources such as their HPC system. Publish materials on GitHub
- Development of bioinformatics workshop by a core facility
- being asked to provide practical bioinformatics training
- challenges: large and diverse audience which makes it hard to develop a suitable curriculum, limited to 8x1hour courses, need to find source of support
- partnered with the cancer center for admin support, the library for 5-seat lab, faculty for some lectures and research computing for the HiPerGator cluster with a dedicated allocation of cores.
- successful: filled 50 spots in just a few days and over ½ attending all lectures. videorecorded and publicly available.
Breakout sessions
- Training
- chunk out training and repackage and create efficiency
- signups--under subscription vs. over. Charging to put some skin in the game
- Access to compute.
- Google and AWS use, and cost effective. use of jupyter notebooks are particularly cheap
- Single Cell
- help people help themselves.
- shiny apps
- what let's you know it worked properly? primer dimers, cell ranger but serat R package is the main thing that came out of it.
- need to talk about the standard set of thresholds
- Project Management
- from Excel to google docs
- Asana, trello, Jira
- time tracking with Toggle and Harvast (app on phone, laptop, etc)
- Wants: Confluence to integrate project management together with documentation?
- fees help manage demand and help finance pipeline development
- Conda/bioconda reproducibility
Demos
- Nextflow
- manages reproducibility. integrates with many other schedulers
- uses Conda
- AWS iGenomes
- git repo at nf-core/configs and test datasets at nf-core/test-datasets