Difference between revisions of "ISMB 2025: BioinfoCoreWorkshop"
(Created page with "= Dinner information = We will have a table at the conference wide social event which is an add on that can be purchased during registration. =Workshop Overview= The bioinf...") |
|||
Line 3: | Line 3: | ||
We will have a table at the conference wide social event which is an add on that can be purchased during registration. | We will have a table at the conference wide social event which is an add on that can be purchased during registration. | ||
− | = | + | =Session Overview= |
The bioinfo-core COSI session is scheduled for Wednesday July 23rd, 2025 from 11:20am-6pm in the ACC Liverpool. | The bioinfo-core COSI session is scheduled for Wednesday July 23rd, 2025 from 11:20am-6pm in the ACC Liverpool. | ||
Line 18: | Line 18: | ||
[https://www.iscb.org/ismbeccb2025/programme-agenda/scientific-programme/bioinfo-core Scientific Programme] | [https://www.iscb.org/ismbeccb2025/programme-agenda/scientific-programme/bioinfo-core Scientific Programme] | ||
+ | |||
+ | =Notes from the session= | ||
+ | |||
+ | The bioinfo-core COSI brings together managers and staff working in bioinformatics core facilities around the world. In our session we had a well-rounded and interesting mix of presentations, panel discussions, and breakout groups. | ||
+ | |||
+ | Talks: | ||
+ | Damian Dalle Nogare - Bioimage analysis in the age of AI: lessons and a path forward from a core facility perspective. A thought-provoking keynote regarding some history of the computational imaging field and its journey into a fully AI mode. We can learn a lot from the things they have gone through, and we should work together. | ||
+ | |||
+ | Kübra Narcı - Benchmarking Variant-Calling Workflows: The nf-core/variantbenchmarking Pipeline within the GHGA Framework: Variant callers do all sorts of different things and comparing structural variants is non-trivial. This is an nf-core pipeline to handle this type of comparison that was developed for the GHGA and shared with the world. | ||
+ | |||
+ | Thomas Roder - Assembly Curator: rapid and interactive consensus assembly generation for bacterial genomes – A microbial assembly benchmarking paper (Wick and Holt) basically found no assembler worked well on all assemblies. Tricycler from Ryan Wick is good, but time consuming. The goal of this was to get 80% of the quality of Tricycler in 5% of the time. On github at https://github.com/MrTomRod/assembly-curator | ||
+ | |||
+ | Adam Giess - Long Read Sequencing at Genomics England: They’ve done 12,000 Nanopore Promethion flowcells and shared some of the details of their setup and approach. | ||
+ | |||
+ | Anil S. Thanki - Autonomous Single Cell Transcriptomics Analysis in Persist-seq: Multi-institutional effort to study early tumor environment. Diverse toolset included Kubernetes, Jenkins, Galaxy (workflow executer command line), AWS, and Slack. | ||
+ | |||
+ | Iris Diana Yu - Advancing the Expression Atlas Resources: A Scalable Single-Cell Transcriptomics Pipeline to Facilitate Scientific Discoveries: Expression Atlas and single cell expression atlas – ingest, annotate, curate, and serve data! | ||
+ | |||
+ | Ayushi Agrawal - Mixed effects models applied to single nucleus RNA-seq data identify cell types associated with animal level pathological trait of Alzheimer’s disease: LODopt model applied to single nucleus RNA-seq data. https://github.com/gladstone-institutes/LODopt | ||
+ | |||
+ | Natalie Gill - Optimizing Clustering Resolution for Multi-subject Single Cell Studies: Method of optimizing clustering by splitting PCs into odd and even. https://github.com/gladstone-institutes/clustOpt | ||
+ | |||
+ | Hubert Rehrauer - GEO Uploader: Simplifying the data deposition in the GEO repository: A tool to make it easier for researchers to upload their data to GEO: https://github.com/fgcz/geo-uploader | ||
+ | |||
+ | Carlos Prieto - Enhancing Bioinformatics Workflows with Analytical Visualization Tools: a series of impressive interactive tools and data visualizations – rjsplot, D3GB, looking4clusters, Rvisdiff, RD3plot. https://github.com/BioinfoUSAL | ||
+ | |||
+ | Patricia Carvajal-López - Competency framework profiles to reflect career progression within bioinformatics core facility scientists: A collaboration and community effort led by EBI to develop more robust career definitions and stages for people in core facilities. Publication coming soon in Bioinformatics Advances. | ||
+ | |||
+ | |||
+ | Panels: | ||
+ | |||
+ | The rise of computational imaging | ||
+ | |||
+ | Panelists: Damian Dalle Nogare, Jamie Soul, Syed Murtuza Baker, Emily Johnson | ||
+ | |||
+ | Imaging groups and bioinformatics groups feel themselves coming together, mainly due to spatial transcriptomics. How can these two groups interact and learn from each other to advance science? We saw issues with the software built-in to some commercial platforms in terms of calling cells correctly and more custom solutions are probably needed. Perhaps some things can be learned by the bioinformatics core community and some things will be a collaboration between the two groups, but we need to be clear who does what and who needs what files from who in order to accomplish things. In some ways imaging has transformed into a fully AI field and we can learn from their growing pains. | ||
+ | The practical use of AI in cores | ||
+ | |||
+ | Panelists: Madelaine Gogol, Ashley Sawle, Mohab Helmy, Ken Brewer | ||
+ | |||
+ | We discussed training users in the use of genAI (because we as bioinformaticians might be more experienced). Do we still need to teach coding? I think the consensus was at least some for some people, but it’s not really clear how much. How do we keep up with new tools? We need to be intentional about setting aside time for reading papers, experimenting, taking time to try things. Seqera AI was mentioned as a platform for writing NextFlow code (many of the models are not good at it yet). Multiqc has an argument now to generate an AI summary about the quality in the report. Could be used to take a collection of scripts and make it into a NextFlow pipeline with a little effort. University of Queensland fast AI course was mentioned as a potential source for upskilling. There were discussions of hiring in the age of AI with in-person interviews maybe more required and deep questions to check knowledge. Watch out for slopsquatting or bad prompting generating garbage for collaborators. | ||
+ | |||
+ | Breakout groups: | ||
+ | |||
+ | We broke into four breakout groups based on what the people in the room were interested in discussing. | ||
+ | |||
+ | * Spatial / Imaging – We formed a consensus that imaging and bioinformatics should remain in separate groups but try to make it clear who does what and communicate well and regularly. Bioinformaticians should be able to learn some basics for well worked out tasks of image analysis so they don’t have to bug the imaging experts until they have something more complex. Imaging has pain they have gone through and probably stuff to teach us about storage, compression, good data practices. Maybe there can be “Image analysis for computational biologist” short courses or trainings to help bridge the gap? Spatial transcriptomics will likely require image segregation and image registration or alignment tasks at least. | ||
+ | * AI – When users come to you with a request to apply AI on their data, we first have to know whether what they are asking is really the best method, and if it’s not, how do you keep them with you rather than turn them away and have them pursue other options? Facilities can serve as gatekeepers / empower users in the use of AI. We spoke of “white box” or explainable AI as being a more reproducible any better way to do science. When attempting to upskill users, free food helps. It may be good to have one person or a few people in the team to become the experts on this topic so they can specialize and help the team understand what to do. | ||
+ | * Project Management and tools – Jira was a popular one. They were leaders from different types and sizes of groups but face similar issues when getting projects done. Some groups had a more ruthless scoping style, putting tight boundaries on what a project consists of, but others found that to be uncollaborative. Since we are downstream of a lot of steps in the experiment process, we need ways to get in at the very beginning so we can help them plan the right experiment to gather the best data and avoid data post-mortems. They were all doing some form of Agile – it doesn’t work for research, but does work for planning people’s time. Teaching others how to say no in a way that could still make them happy, but without doing the thing right now. Project management in our role is sometimes also relationship management, having good relationships is really important to ensuring good communication. | ||
+ | * Reporting / communicating results – All the way from data to presentation. They are finding Quarto nice for dynamically generating presentations and reports with code, if you need to change something it’s easy to regenerate. A lot of love for Nextflow and containers for returning reproducible results and techniques to people. It was mentioned that in clinical environments change can be difficult. |
Revision as of 23:49, 23 July 2025
Dinner information
We will have a table at the conference wide social event which is an add on that can be purchased during registration.
Session Overview
The bioinfo-core COSI session is scheduled for Wednesday July 23rd, 2025 from 11:20am-6pm in the ACC Liverpool.
The bioinformatics core session is organized by staff and managers of Core Facilities and designed for all members of bioinformatics core facilities or people in related roles.
Organizers:
- Madelaine Gogol, Stowers Institute, United States
- Alberto Riva, Human Technopole, Italy
- Lorena Pantano Rubino, Harvard School of Public Health, United States
- Awtum Brashear Lewis, Shriners Hospitals for Children, United States
- Yuvanesh Vedaraju, Houston Methodist Hospital, United States
Notes from the session
The bioinfo-core COSI brings together managers and staff working in bioinformatics core facilities around the world. In our session we had a well-rounded and interesting mix of presentations, panel discussions, and breakout groups.
Talks: Damian Dalle Nogare - Bioimage analysis in the age of AI: lessons and a path forward from a core facility perspective. A thought-provoking keynote regarding some history of the computational imaging field and its journey into a fully AI mode. We can learn a lot from the things they have gone through, and we should work together.
Kübra Narcı - Benchmarking Variant-Calling Workflows: The nf-core/variantbenchmarking Pipeline within the GHGA Framework: Variant callers do all sorts of different things and comparing structural variants is non-trivial. This is an nf-core pipeline to handle this type of comparison that was developed for the GHGA and shared with the world.
Thomas Roder - Assembly Curator: rapid and interactive consensus assembly generation for bacterial genomes – A microbial assembly benchmarking paper (Wick and Holt) basically found no assembler worked well on all assemblies. Tricycler from Ryan Wick is good, but time consuming. The goal of this was to get 80% of the quality of Tricycler in 5% of the time. On github at https://github.com/MrTomRod/assembly-curator
Adam Giess - Long Read Sequencing at Genomics England: They’ve done 12,000 Nanopore Promethion flowcells and shared some of the details of their setup and approach.
Anil S. Thanki - Autonomous Single Cell Transcriptomics Analysis in Persist-seq: Multi-institutional effort to study early tumor environment. Diverse toolset included Kubernetes, Jenkins, Galaxy (workflow executer command line), AWS, and Slack.
Iris Diana Yu - Advancing the Expression Atlas Resources: A Scalable Single-Cell Transcriptomics Pipeline to Facilitate Scientific Discoveries: Expression Atlas and single cell expression atlas – ingest, annotate, curate, and serve data!
Ayushi Agrawal - Mixed effects models applied to single nucleus RNA-seq data identify cell types associated with animal level pathological trait of Alzheimer’s disease: LODopt model applied to single nucleus RNA-seq data. https://github.com/gladstone-institutes/LODopt
Natalie Gill - Optimizing Clustering Resolution for Multi-subject Single Cell Studies: Method of optimizing clustering by splitting PCs into odd and even. https://github.com/gladstone-institutes/clustOpt
Hubert Rehrauer - GEO Uploader: Simplifying the data deposition in the GEO repository: A tool to make it easier for researchers to upload their data to GEO: https://github.com/fgcz/geo-uploader
Carlos Prieto - Enhancing Bioinformatics Workflows with Analytical Visualization Tools: a series of impressive interactive tools and data visualizations – rjsplot, D3GB, looking4clusters, Rvisdiff, RD3plot. https://github.com/BioinfoUSAL
Patricia Carvajal-López - Competency framework profiles to reflect career progression within bioinformatics core facility scientists: A collaboration and community effort led by EBI to develop more robust career definitions and stages for people in core facilities. Publication coming soon in Bioinformatics Advances.
Panels:
The rise of computational imaging
Panelists: Damian Dalle Nogare, Jamie Soul, Syed Murtuza Baker, Emily Johnson
Imaging groups and bioinformatics groups feel themselves coming together, mainly due to spatial transcriptomics. How can these two groups interact and learn from each other to advance science? We saw issues with the software built-in to some commercial platforms in terms of calling cells correctly and more custom solutions are probably needed. Perhaps some things can be learned by the bioinformatics core community and some things will be a collaboration between the two groups, but we need to be clear who does what and who needs what files from who in order to accomplish things. In some ways imaging has transformed into a fully AI field and we can learn from their growing pains. The practical use of AI in cores
Panelists: Madelaine Gogol, Ashley Sawle, Mohab Helmy, Ken Brewer
We discussed training users in the use of genAI (because we as bioinformaticians might be more experienced). Do we still need to teach coding? I think the consensus was at least some for some people, but it’s not really clear how much. How do we keep up with new tools? We need to be intentional about setting aside time for reading papers, experimenting, taking time to try things. Seqera AI was mentioned as a platform for writing NextFlow code (many of the models are not good at it yet). Multiqc has an argument now to generate an AI summary about the quality in the report. Could be used to take a collection of scripts and make it into a NextFlow pipeline with a little effort. University of Queensland fast AI course was mentioned as a potential source for upskilling. There were discussions of hiring in the age of AI with in-person interviews maybe more required and deep questions to check knowledge. Watch out for slopsquatting or bad prompting generating garbage for collaborators.
Breakout groups:
We broke into four breakout groups based on what the people in the room were interested in discussing.
- Spatial / Imaging – We formed a consensus that imaging and bioinformatics should remain in separate groups but try to make it clear who does what and communicate well and regularly. Bioinformaticians should be able to learn some basics for well worked out tasks of image analysis so they don’t have to bug the imaging experts until they have something more complex. Imaging has pain they have gone through and probably stuff to teach us about storage, compression, good data practices. Maybe there can be “Image analysis for computational biologist” short courses or trainings to help bridge the gap? Spatial transcriptomics will likely require image segregation and image registration or alignment tasks at least.
- AI – When users come to you with a request to apply AI on their data, we first have to know whether what they are asking is really the best method, and if it’s not, how do you keep them with you rather than turn them away and have them pursue other options? Facilities can serve as gatekeepers / empower users in the use of AI. We spoke of “white box” or explainable AI as being a more reproducible any better way to do science. When attempting to upskill users, free food helps. It may be good to have one person or a few people in the team to become the experts on this topic so they can specialize and help the team understand what to do.
- Project Management and tools – Jira was a popular one. They were leaders from different types and sizes of groups but face similar issues when getting projects done. Some groups had a more ruthless scoping style, putting tight boundaries on what a project consists of, but others found that to be uncollaborative. Since we are downstream of a lot of steps in the experiment process, we need ways to get in at the very beginning so we can help them plan the right experiment to gather the best data and avoid data post-mortems. They were all doing some form of Agile – it doesn’t work for research, but does work for planning people’s time. Teaching others how to say no in a way that could still make them happy, but without doing the thing right now. Project management in our role is sometimes also relationship management, having good relationships is really important to ensuring good communication.
- Reporting / communicating results – All the way from data to presentation. They are finding Quarto nice for dynamically generating presentations and reports with code, if you need to change something it’s easy to regenerate. A lot of love for Nextflow and containers for returning reproducible results and techniques to people. It was mentioned that in clinical environments change can be difficult.