15th Discussion-24 Feb 2014

From BioWiki
Jump to navigationJump to search

Brief Description and Continuing Discussion:

Topic 1: The Biologist is the Analyst

Introduction and Preliminary Information

As large scale bioinformatics analyses become increasingly common it is often the case that much of the analysis for a project will not be carried out by a dedicated bioinformatician but will instead be undertaken by the biologists responsible for generating the data. Improved bioinformatics knowledge and training along with the development of a number of more biologist friendly tools means that this is more viable solution than before, and is something which core facilities will have to adapt to.

In this session we will explore this topic, focusing on questions such as:

  • Are you seeing a shift in your facility to analysis being performed by biologists
  • What tools or facilities have you provided to help this transition. Is this something you welcome or encourage?
  • Has the presence of user based analysis changed the balance of work within your facility?
  • Have you seen problems in the analysis performed by biologists which might have been avoided if these were done by your facility. Is there any way to avoid these problems?
  • How do you allow and manage access to large computational or data resources by biologists? Has this caused problems?
  • Does the way your service is perceived by biologists changed once they start to do their own analysis. Do you think this might impact recognition on eventual publications?

Notes from the Call

  • Thomans Manke - Freiburg
  • Madeline Gogol
  • Elia Korpelainen Finland
  • Hans-Rudolf Hotz - Basel
  • George Bell Whitehead
  • Alastair Kerr Edinburgh
  • Matt Eldridge - CRUK
  • Charlie Whitacre MIT
  • Jim Cavalcoli - Michigan
  • Hemant Kelkar - UNC
  • Fredrick ?? Baltimore
  • Fran Lewitter - Whitehead
  • Simon Andrews - Babraham

Biologists would like to develop their own skills and would like the help of cores to develop and support these skills.

Hans-Rudolf uses galaxy and trains people on that. Simon’s group develop tools for biologists. Matt’s group mostly does analysis for the users, but is getting more requests to develop skills and provide training courses. Wanted to find out how other gropus were approaching this - were they seeing a shift and what does this means for us as core facilities.

Matt has been approached for more training around R and bioconductor. They want wet lab members to do their own analysis and want cores to play a key role in helping.

What are other people doing?

Max Planck - started with R and linux and didn’t find it very helpful. They repeat courses regularly but have tended to move completely to galaxy and R stuff is implemented in galaxy wrappers.

Lax - has lots of success with individual postdocs with linux and R. Getting publications from computational work but works only for individuals.

Jim - gets diversity of people. Those who are inclined pick it up easily but other aren’t interested. Not much good for a mass approach. Asked for workshops for RNA-Seq or exome analysis not sure how to set up.

Lax - is going to recruit postdoc for

Hans-Rudolf - in addition to galaxy also teach R courses doing 30-40 students. Only 10% who pick it up - only good for those few students, but the effort is worth it.

Simon - run courses but not for everyone - overlap between people who can’t run scripts themselves and people who

Problem with galaxy is that the informaticians who add tools to galaxy aren’t the ones who are using it which ends up with different versions. Galaxy is not a one push pipeline but is a wider framework to share data. More important for that aspect.

Alastair - Galaxy is great as a data sharing platform but also use it a lot for the development of their own tools and allows users to use tools interactively and feed back to the developers. Have a large number of command line users driven by bioinformaticians developing their own tools which also cause problems and will be part of the session in Boston.

Fran - Whitehead used to support galaxy - good to get started, but once they understand the tools they move to command line. George - have done workshops in R and Perl. Have had better success in writing scripts for others which they can then tweak rather than making people completely independent.

R course take up has been low - difficult to know how much people are using it.

R is being drive by PIs more than postdocs. Have an interface for Stats which is being withdrawn along with the stats training course with R commander to keep stats.

Fran lots of people still using Prism - people like it and don’t need much training.

Hemant - lots of interest in galaxy but has found that it’s bad that version numbers of wrappers don’t match the back end version numbers. Doesn’t find it easy to maintain and takes up a lot of resource. Uses command line tools instead - finds people are interested in learning basic unix since this is a valuable skill to have on your CV and people recognise this. Once they have the basics then they can pick up the rest.

Are Unix skills part of the core facility remit?

Hemant targets courses for biomedical scientists - most people have never seen a command line so starts pretty simple and they learn a lot.

Jim - size of audience dictates what is feasible. Jim has 2000 researchers plus postdocs and can’t dedicate resource to this. Doesn’t teach basic unix points them to a book.

Alastair - important to show how to document analyses along with environment. How do others do this. Hemant uses module system which makes this easy.

How many people have done software carpentry? They’ve been in and have been useful for some people. They are “free” you have to pay for their flight and travel and provide an environment.

As bioinformaticians we’ve developed skills to avoid pitfalls and worried that people newer to the field might hit problems which a bioinformatician would have avoided. No longer worries that there will be nothing left for the informaticians to do but more worried that people might put less good work for publication.

People come into cores with half done analysis. Need to ensure that people misusing systems we maintain doesn’t end up reflecting badly on us. People will always need expert interpretation and guidance.

As soon as we train people how to do things themselves, how do we maintain a collaborative relationship with those people.

If we are just providing tools and support - does that impact how we are perceived and credited on papers.

Alistair - not such a problem in academia - more problematic is PIs hiring people to do their own informatics and make sure that we keep links to these people so that the core knowledge is maintained.

Jim has had several groups bring in their own informaticians but tries to bring best practice along to these people and have them interact with other core people.

Alistair has been stung once by a group setting up a silo and causing problems on common resources.

Matt agrees and finds that informaticians often end up reinventing the wheel. Challenges his group to make pipelines which are so useful that people will naturally be drawn to use them rather than making their own. Sees it as a real success when other bioinformaticians want to engage with the core. If they make parallel pipelines then this is a real shame.

Lax - how do you financially support a service? Lots of training at the beginning but then nothing else until the paper comes out.

Jim has said that if they provide training then they have to charge for effort after that. People will not become perfect immediately.

Topic 2: Open Forum (Led by Hans-Rudof Hotz)

We will again be holding an open forum to quickly cover any other topics which people would like to discuss. We have had a few suggestions in advance which are listed below, but we can add more topics on the day.

  • How are people preparing for the new Human Genome Build GRCh38? Is anyone using it yet?
  • Nanopore sequencers - who is involved in the early access programme? What are you planning to do?
  • Reports from meetings attended. Anything useful from AGBT or elsewhere?
  • Broadening our use of the wiki.


Lax started the discussion with the question who got access to the Oxford Nanopore MinION program

a few people are involved or at least know someone who have been selected by Oxford Nanopore

there is a general worry about the error rate - will there be new aligner needed to deal with the errors?

will it replace PacBio?

general disappointment, that there is still no data available -> let' wait and see till the first participants from the 'MinION Access Programme' will show their data

The discussion moved on to more general topics (from AGBT)

Oxford Nanopore is well ahead of other 'pore' sequencing systems

working with GRCh38 will require new alignment programs allowing for variations in the genome

GRCh38 has better centromeres

low usage of Ion Torrent system among the group

upgrade for HiSeq2500 should be available in fall

At the end Jim's suggestion to broadening the use of the wiki was discussed

the pages should not duplicate other resources (eg for training material), but rather contain links to them

new feature will be possible in the future thanks to ISCB

it would be good to have test data available to share