Difference between revisions of "ISMB 2019: BioinfoCoreWorkshop"

From BioWiki
Jump to navigationJump to search
Line 128: Line 128:
 
**Asana, trello, Jira
 
**Asana, trello, Jira
 
**time tracking with Toggle and Harvast (app on phone, laptop, etc)
 
**time tracking with Toggle and Harvast (app on phone, laptop, etc)
**Wants: Congluence?
+
**Wants: Confluence to integrate project management together with documentation?
 
**fees help manage demand and help finance pipeline development
 
**fees help manage demand and help finance pipeline development
 
*Conda/bioconda reproducibility
 
*Conda/bioconda reproducibility

Revision as of 03:20, 22 July 2019

Workshop Overview

The bioinfo-core workshop is scheduled for Monday, July 22, 2019, from 10:15 to 12:40 pm at the Congress Center in Basel.

The bioinformatics core workshop is a workshop by practitioners and managers of Core Facilities for all members of core facilities, including scientists, engineers, analysts, operations and management staff. In this 16th year of bringing the Core community together at ISMB, we will explore topics relevant to bioinformatics core facilities through lightning talks and demos followed by small-group break out discussions with insights brought back to the full audience for further discussion and knowledge sharing.

Organizers:

  • Madelaine Gogol, Stowers Institute, United States
  • Hemant Kelkar, University of North Carolina, United States
  • Alastair Kerr, CRUK-MI, University of Manchester, United Kingdom
  • Brent Richter, Partners HealthCare of Massachusetts General and Brigham and Women’s Hospitals, United States
  • Alberto Riva, University of Florida, United States

Part A: Technologies and Analytical Methods

Machine Learning, AI, single cell RNA-seq analysis, and conda/bioconda.

Part B: Communication and Training

Communication and project management tools and training offered by cores.

Part C: Small group discussion

During this hour-long session, audience members will divide into groups based on their own interests. Groups will come up with their main take away points and bring them back to the main audience for knowledge sharing and for further discussion. Topics may include all previous presentation areas as well as other areas of interest to running or working within a bioinformatics core facility.

Part D: Pipeline Demo

Demo of nextflow

Schedule

Time Title Authors
10:20 - 10:30 AM Transitioning bioinformatics core to support biomedical AI/ML research - lessons learned Yang Fann, NIH, United States
10:30 - 10:40 AM Supporting single cell RNA-seq analysis: A Core's Perspective Shannan Ho Sui, Harvard School of Public Health, United States
10:40 - 10:50 AM Conda and Bioconda, the best thing since sliced bread Devon Ryan, Max Planck Institute, Germany
10:50 - 11:00 AM Improving project management and tracking with Asana and Toggl Sara Brin Rosenthal, UCSD, United States
11:00 - 11:10 AM Bioinformatics training (in the context of a core) Radhika Khetani, Harvard School of Public Health, United States
11:10 - 11:20 AM Development of bioinformatics workshop by a core facility Alberto Riva, University of Florida, United States
11:20 - 11:55 AM Small Group Discussions
11:55 AM - 12:20 PM Small Group Reports
12:20 PM - 12:35 PM nf-core - A community effort to collect a curated set of pipelines built using Nextflow (https://nf-co.re/). Harshil Patel, The Francis Crick Institute, United Kingdom


Workshop Discussion

  • Transitioning bioinformatics core to support biomedical AI/ML research - lessons learned
    • Large, diverse datasets from multiple sources both private and public from around the world.
  • Supporting single cell RNA-seq analysis: A Core's Perspective
    • Single cell growing in demand over the last 5 years. Data analysis is becoming the bottleneck. Taking a community based approach by collaborating with other HSPS teams and other schools (HMS) to tackle the problem: sequencing core (de-multiplexing), labs (iterative requires research input--is cell cycling part or mitochondria?), training, etc.
    • built out bcbio python toolkit with 62 international contributors.
    • settled on serat suite of tools but also uses many others such as multicca
  • Conda and Bioconda, the best thing since sliced bread
    • Installing Software--get asked for help to install, particularly ones that need dependencies.
    • no route needed, ever. dependencies are handled for you.
    • free and can add your own .
    • module load activates a condo environment behind the scene
    • bioconductor packages in bioconda. for every package made a singularity and docker container compiled as well. Biocontainer
    • CoreOS--Quay.
    • 1700 packages via can over a week. behind the scene bioconductor upgrading
    • bioconda has 700+ contributors release your tools.
  • Improving project management and tracking with Asana and Toggl
    • Fee for Service Center with up to 324 projects over the last 4 years.
    • Tracking projects in the transition from 1 team member to another as the project cycles through the experts
    • Analysis can be punctuated with long periods of time while investigator writes papers and grants. Needs to pick up history sometimes a year later.
    • Asana: have defined a workflow within Asana that includes intake, waiting periods, in progress systems, close out and bill
    • archive data to S3.
    • implemented toggl to track time on each project and subtask. Integrates with Asana for project management components.
    • allows for obtaining better estimates to people. Have found in general they underestimate work.
  • Bioinformatics training (in the context of a core)
    • Funders provide FTEs dedicated to training (harvard catalyst, HMS)
    • interplay between training and consulting: surge in single cell analysis highlights need for training in this technology
    • 2/3 time spent on training, the remainder on consulting and understanding best practices
    • partner with faculty on teaching for credit--e.g. an R component for their cause
    • 10:1 student to instructor ratios, 25 per class. Use local resources such as their HPC system. Publish materials on GitHub
  • Development of bioinformatics workshop by a core facility
    • being asked to provide practical bioinformatics training
    • challenges: large and diverse audience which makes it hard to develop a suitable curriculum, limited to 8x1hour courses, need to find source of support
    • partnered with the cancer center for admin support, the library for 5-seat lab, faculty for some lectures and research computing for the HiPerGator cluster with a dedicated allocation of cores.
    • successful: filled 50 spots in just a few days and over ½ attending all lectures. videorecorded and publicly available.

Breakout sessions

  • Training
    • chunk out training and repackage and create efficiency
    • signups--under subscription vs. over. Charging to put some skin in the game
    • Access to compute.
    • Google and AWS use, and cost effective. use of jupyter notebooks are particularly cheap
  • Single Cell
    • help people help themselves.
    • shiny apps
    • what let's you know it worked properly? primer dimers, cell ranger but serat R package is the main thing that came out of it.
    • need to talk about the standard set of thresholds
  • Project Management
    • from Excel to google docs
    • Asana, trello, Jira
    • time tracking with Toggle and Harvast (app on phone, laptop, etc)
    • Wants: Confluence to integrate project management together with documentation?
    • fees help manage demand and help finance pipeline development
  • Conda/bioconda reproducibility