ISMB 2015: BioinfoCoreWorkshop

From BioWiki
Revision as of 11:33, 14 July 2015 by Bgrichter (talk | contribs)
Jump to: navigation, search


Our proposal for this year is that we would like to have a single unifying topic for the entire session, which would be "The evolving relationship between core facilities and researchers". This will be divided into sub-topics but each of these will look at how the role of core facilities is changing in response to the increased prevalence of bioinformatics knowledge within the wider research community, and the introduction of many more dedicated bioinformaticians into research groups. We hope this this discussion will attract people from both the core facility and research group side of the discussion as this changing relationship will affect both parties.

The more detailed structure will be broken down into four somewhat overlapping areas. Each of these will be introduced by a different speaker with a short presentation and will be followed by a moderated group discussion.

Topic 1: The role of core facilities when everyone is a bioinformatician

  • Speaker  : Davide Cittaro
  • Moderators  : Simon Andrews and Matthew Eldridge

Whilst bioinformatics used to be the preserve of dedicated bioinformaticians a modern research group will now often have a significant amount of bioinformatics expertise within its staff. This will range from wet lab biologists who would like to be responsible for the analysis of their own data to dedicated embedded bioinformaticians who can find enough work from the output of a single group to fill their time. In this environment the role played by core facilities must necessarily change. Their traditional role as the analysis hub for a set of research groups must give way to a broader view of how they can help to support the more diverse range of informatics activities happening within a research institution. This session will look at the ways in which different core facilities have adapted to these changes and try to look at how their role will change further in future.

Topic 2: Bioinformatics core facilities as service providers

  • Speaker  : Sven Nahnsen
  • Moderators  : Simon Andrews and Matthew Eldridge

One of the growing roles for core facilities is to act as central service providers for routine large scale analyses or data stores. In this session we will look at how much it is possible to automate routine analyses and how much of a standard analysis pipeline can be treated in this way. We will aim to go further though and explore how core facilities can remain relevant and stay on top of the latest developments rather than being constrained as high volume service providers.

Topic 3: Maintaining a publicly used analysis infrastructure

  • Speaker  : Madelaine Gogol
  • Moderators  : David Sexton and Brent Richter

When a large proportion of the research staff in an institution want to be able undertake bioinformatics analyses on large data sets it makes sense to have a centralised computing resource on which to run this, and the management of these resources is generally falling into the hands of bioinformatics core facilities. We will look here at the ways in which different sites have chosen to make their compute infrastructure more widely available, and how they have tackled the problems which this has thrown up.

Topic 4: The business of core services

  • Speaker  : Jim Cavalcoli
  • Moderators  : David Sexton and Brent Richter

Many core facilities operate on a cost recovery basis, and the most common method of recharging has been around the number of hours of analyst time spent working on specific projects. In an era where the core facility is less visible as a front line analysis service and spends more of its time maintaining infrastructure and services how do cores continue to recoup their costs. We will look at the different funding models which are being used and will discuss the fairest and least burdensome ways of recharging and how to communicate these costs to end users.

Discussion (unordered notes on session)

Cittaro: Core services: Reward bioinformaticians (Nature 520, 151-152): Core's do real science, but the core rents out the bioinformatician. For Cittaro's core, some of the people are dedicated individuals to a specific PI, these folks are usually split 80/20 or 70/30 between PI work and core work. It benefits to have people members of a core to generate a critical mass of knowledge and diversity, rather than working alone. How does authorship work: Cittaro's core participates in scientific design and even if they charge for the collaboration, they obtain authorship and measure performance using this as one metric. For other metrics, they also perform survey's to get feedback.

In Eldridge's institution they are seeing big changes with bioinformaticians moving into wet labs and wet lab scientists learning more techniques. They are finding that they are doing more training. Andrews: use the core as a meeting place to share knowledge how do you deal with the fact that bioinformatics is so broad: proteomics, genetics, etc. Cittaro's core only deals with NGS given the focus of the institution. Also tries to find individuals who have complementary experience and train them in the specifics of NGS (metabolomics for example).

     some cores try to understand external groups' expertise and direct new users' questions to those groups--biostats questions to the biostats group, for example.

Students: There are different models: PhD students who spend a part of their time in the core to learn, others who have a core and a faculty appointment maintain K award for training, etc. Sometimes this works, but supervisors are mostly outside of the core. One core has hired a dedicated person who supervises/manages PhD students, funded by institute. Interesting model: the core facility is seen as a center of bioinformatics expertise that can train students, professors "attach" students to the core and have argued for the institution to fund a dedicated supervisor. Showers have an exchange program with University of Oregon Master program that send students for 9 - 12 months. Core staff need to understand that it's a training situation, not that the student drives Core personnel. Internships are problematic: ramp up is needed to make a 3 month intern productive.

there is a range of sizes to core, demand, no matter how large the core, is still an issue.

data integration: internal cluster and virtualization. this field should be differentiated into 2 areas, data processing and data analytics. processing and data management can be standardized, but analytics is custom. Data integration is a critical area moving forward. at the university environment: there are no centralized mechanism for data management, metadata, etc--all investigators have their own. From the industry perspective, they try to enforce standard and supply tools and SOPs to do so. TranSMART initiative is gaining adoption in much of industry and the institutional core level.

Nahnsen: launch a discussion into how much service should be done by a core facility vs. pure service. One has a business model of some kind, market, scale out/up, grown the team. But the reality of a core, which is embedded in academic environment, have to be flexible and do not operate at a mature nor a production scale. Need to keep understanding of the forefront of the needs, provide democratic support: both those with funds and those without. the core can be involved in many aspects of the research: design, processing data management, analysis and interpretation. Are bioinformatics core facilities only service providers or should they be? yes: expert services a product or NO: fundamental research and need to be at the forefront. or BOTH: automate as much as possible and provide a service and also work on "discovery" of new techniques.

  what can be standardized?  it's difficult, Cittaro is ISO certified for standards, but documents changes to the standard processing pipeline.  
  Can someone "sell" scientific contributions and how to deal with authorship.  There are differences: if a standard service, expect an acknowledgment.  If a core provides experimental design, analysis development, etc, then an author.  The argument that they pay for the person/service and therefore they do not need to put a core person on a paper does not hold water because that same PI pays the salary of their grad student and they are on the paper.  
      The important point is that the expectation has to be set up front and can vary over the course of the project.

Is a one person facility sustainable? Yes. It all depends upon scheduling and turning away excess demand.

The more time a group pushes and develops standardized services, the less time you have to keep up with the leading edge techniques and development, which may be pushed out to the individual research groups. How does one balance?

  the majority of the work of Cittaro is not the standardized things, its more of the custom work.
  Where do you put your effort?  sensing the trends and directions is important.  Synch effort into the major trends.
  RNA-seq is a generalized pipeline.  most other analysis are custom.
  how does a core participate in institutional strategy regarding future needs?  Or how to predict where analysis will come from?  Easier to stay connected in smaller institutions.