11th Discussion-7 November 2011
Brief Description and Continuing Discussion:
Topic 1: Measuring the output of a core (Led by Simon Andrews and Fran Lewitter)
This is a follow-on to a topic which Matt Eldridge brought up at the ISMB workshop. This generated an interesting discussion afterwards which we had to curtail due to time constraints so we thought this would be a good topic to expand upon in a public call.
The original questions was how you can measure how well your core is working. Such a measurement could be used to assess the performance of the core over time, or to allow comparisons between cores. It could also be useful in helping to support applications for additional funding or personnel.
Preliminary Information
Questions which would be relevant to this topic might include:
- Is it useful or desirable to objectively measure the performance of a core facility?
- Are groups doing this kind of assessment already, and if so, how?
- Which metrics could be used to allow this measurement?
- Could it be useful to compare results from different cores?
Topic 2: Managing and Tracing Software Updates (Led by Brent Richter)
If analysis results are to be reproducible then in addition to recording the settings and options used for analysis we would also have to record the exact versions of all software components used and all datasets searched. Providing such traceability could be an arduous task, but may be required in cores which handle clinical or translational workflows, in not samples, and may be desirable in other situations.
This also raises a wider issue of how we manage the transition between releases of software packages on which we rely. Scientific software tends to have a rapid release cycle with improvements and changes coming at regular and (relatively) rapid intervals. This raises a number of questions about how frequently we update software packages on which we depend, and how we manage these transitions.
Preliminary Information
Questions which may be relevant to this discussion:
- Do people try to keep track of the software versions used for analysis? If so, is this because of regulatory or other "business" requirements?
- Just how often do you update software packages?
- Is there a difference between public and vendor-distributed packages in frequency of updates that you install?
- How do people manage software updates?
- Do you have test environments to try out updates or do they just get used immediately?
- Do you update packages during the course of a study?
- Do you have multiple versions of key packages?
- How do you monitor for available updates? Could this be managed better?
One specific example which was raised in preliminary discussions was the recent Illumina Update to Casava 1.8. This update offered many advantages but also necessitated major changes to the analysis pipelines people were using. It may serve as a useful example of some of the problems faced during software transitions, as well as being specifically of interest to those people running Illumina pipelines.
Transcript of Minutes
People Present
- Alistair Kerr - Edinburgh
- Monica Wood - Promega
- Fran Lewitter - Whitehead
- George Bell - Whitehead
- Simon Andrews - Babraham
- Matt Eldridge - CRUK
- Brent Richter - Partners
- David Sexton - Baylor
- Pinal Kanabar - Rutgers
- Charlie Whittaker - MIT
- Hemant kelkar - University of North Carolina
- Madelaine Gogol - Stowers Institute