ISMB 2014: BioinfoCoreWorkshopWriteUp
Introduction
The annual bioinfo-core workshop ran successfully at the 2014 ISMB conference. We had a good attendance for the meeting despite the workshop clashing exactly with the world cup final and we're very grateful for everyone who chose to come along.
We changed the format of the workshop slightly from previous years. In the past we had always had two sets of presentations followed by a moderated group discussion. This year we had only one formal session with presentations, with the second half of the workshop being taken up with a larger group discussion covering a number of topics which were collated from suggestions taken from the bioinfo-core mailing list.
The group discussions were very lively and we had a large number of people contributing to them. For those who weren't able to attend we will try to summarise some of the main points of the discussions below - as these were all active discussion sessions we don't have a great number of notes to work from so anyone who has other points they remember please fill in anything which has been missed.
Introduction to the Workshop by Brent Richter
Earlier in the conference Brent had already presented to a special session which had introduced bioinfo-core as one of the new ISCB COSIs (communities of special interest). He was able to summarise the rationale for having bioinfo-core as a group and talk about the activities the group performs. Hopefully the increased exposure the group receives through becoming a COSI will help bring us to the attention of some people who might not have known about us before.
Topic 1: Core on a Budget vs Enterprise
Moderated by Matt Eldridge and David Sexton
Speakers:
- Alastair Kerr - Edinburgh Uni
- Mike Poidinger - A*Star
The purpose of this session was to look at whether it is possible to run a core facility on a limited budget, and to explore what becomes possible when you have a larger amount of money to spend.
Alastair Kerr led the session talking about his small core (2 people) at Edinburgh University. His core is almost completely self-reliant and has to cover all of the hardware, storage, software and analysis infrastructure required for the full range of users he supports. Alastair described how his infrastructure is built on a number of key open source components from ZFS based storage systems which provide 0.5PB of storage for a fraction of the cost of commercial systems to pipelining and workflow systems built within Galaxy, to user-friendly analysis scripts provided to users though the R Shiny system.
Alastair described how he actively avoids the use of commercial software within his group and described occasions in the past where their adoption of initially useful commercial packages had ultimately had negative impacts when the software later changed its licensing fees or became unsupported. The only commercial package they still have is Lasergene for basic molecular biology manipulations and this is mostly for historical reasons and for the lack of a suitable open alternative.
Mike Poidinger then went on to present the contrary case. His group is very well funded by his supporting institution and is somewhat larger than the Edinburgh group with 9 members. Mike's initial contention was that it should be a requirement when setting up a core that sufficient funding be provided and that it would be reasonable to refuse to head up a core where suitable funding to provide an appropriate infrastructure was not forthcoming. Mike stressed that open source software played a major role in the operation of his core, with much of the analysis of data being provided by these types of packages, which are generally much more agile and able that their commercial equivalents. However, he made a strong case for two particular pieces of commercial support software which now form a key part of his infrastructure - Pipeline Pilot and Spotfire.
Mike's contention was that whilst open source packages are very good at performing individual analyses, they can be difficult to work with due to the difficulty in collating and understanding the wide variety of output files and formats they generate. His group uses pipeline pilot to abstract away a lot of the 'dirty' parts of an analysis so that they can leave the commercial system to store and retrieve appropriate data and to handle the format conversions required to pass data through several parts of a larger pipeline. Having this type of collation system in place means that all of the analysis can be done in the form of pipelines and a complete record of all analyses is preserved and can be reproduced or reused very easily.
The other package heavily used within his group is Spotfire. This is a data presentation and integration package which makes it easy for users to explore the quantitative data coming out of the end of an analysis pipeline. It would compete with simple solutions such as Excel, or more complex analyses and visualisations in R, but provides a friendly and powerful interface to the data. Mike's team have linked these packages to other tools such as the Moin moin wiki to provide a combined system which keeps a complete record of analyses, presents it back to the original users in a friendly way and provides an interface through which they can themselves manipulate and explore the data further.
Overall it was Mike's contention that the use of these commercial products within his group added around 20% to the efficiency of his staff, and also allowed new members to get started much more quickly. The cost of the licensing for these packages was therefore outweighed by the efficiency improvements which his group gained from their use.