4 February 2008
Education for Biologists; Tools utilized by Biologists and Cores.
- 1 Administrative
- 2 Education For Biologists (Lead by Michael Rebhan):
- 3 Part II: Tools (David Sexton Introduction)
(x = introduced themselves. All others RSVP’d): Hershel Safer John Rux, Wistar Institute Lee Watkins JHU Charlie Whitaker MIT Cancer Research x Alastair Kerr Edinborough x David Sexton, Vanderbilt x Brent Richter, Partners Healthcare MGH/BWH x Fran Lewitter, Whitehead Institute x Michael Rebhan Mauman Maqbool Aaron Noll Stowers Institute x Hemant Kelkar UNC xDavid Tom UNC xJianping Zen UNC Monika Wood Promega x Steve Jennings U of Arkansas, Little Rock x George Bell, Whitehead Institute x Jerry Xu Partners Healthcare x Lakshmanan Iyer (Lax) Tufts University Medical x Andrea Hoerster BASF Germany x Amir Karger, Bauer Center Harvard University x David Lapointe, Umass Medical x Dawei Lin UC Davis
bionfo-core meeting logistics at ISMB, Toronto.
Fran—Currently we have 2 BoF sessions scheduled which occur during lunch. There is 1.25 hours allotted for any BoF session. To make best use of the time, some pre-planning of agenda should be accomplished. Amongst callers, it was agreed that these times presented the best official ISMB meeting opportunities. Additionally, there would be planned a few dinners for informal gatherings and chat. Open discussion point: hold a meeting after close of ISMB? There is ½ day left after close. No consensus.
Time of Conference Call: Voting is available to select a time of day to hold this conference call in the future:  Please visit and add your vote for your choice. A day has been chosen on the site but please ignore (it’s required). We are only voting on the time. Once a time is selected, we can vote on a specific date.
Possible next concall topics (to be held end of April/ beginning in May):
1. Discussion of ePHI within a regulatory and compliance governed research environment. Patient Health Information (PHI) is routinely collected and analyzed together with *omic information (e.g. genotype-phenotype interactions) within informatics groups located within medical schools and academic medical centers. Use of online tools and hosting services are also very accessible and provide utility and collaboration opportunities to the academic researcher. Additionally, institutional review boards and institution’s compliance groups have been frequently seeking guidance from informatics core facilities regarding security and best practice given requests. The discussion could encompass the convergence of these environments and experiences with ensuring privacy of ePHI. Uses of survey monkey to collect patient responses, google docs to share spreadsheets of information have been voiced as possible open-access utilities.
2. large scale analysis and meta-analysis, trick, tips and tools. Similar to this concall discussion, we would focus on practices, techniques, obtain feedback from the “school of hard-knocks”, discuss tools, etc. regarding genotyping chip arrays, whole genome studies, combining disparate data sets during meta-analysis, etc.
3. Update on Next-generation sequencing technologies. Following on initial discussion during the first bioinfo-core concall, the participants could synch on the latest techniques and tools, particulary in mapping, that have emerged within the scientific community.
4. Meetings topic—We were not able to discuss this topic during the call--which seminars and meetings are most effective for bioinformatics core staff and scientists to be introduced to the field and to stay up with new developments.
No other topics were brought forth, but there was interest in ePHI (acknowledging it’s scope does not pertain within a basic research environment utilizing model organisms) and Analysis discussions.
Education For Biologists (Lead by Michael Rebhan):
After introduction of topic by Michael Rebhan:
AH—worked in 2 companies. Found that there were differences in scientist involvement but mainly due to cultural aspects. In first company the scientists learned the command line, but another company this was not the case. Experience shows that scientists do not try to do anything that doesn't resemble an easy google search. Some even learn to do simple command line usage, especially if it’s a few repetitive commands.
There are training offerings for in-house tools. VectorNTI and another in-house, custom tool is popular. Some scientists have the motivation to learn, but they tend to learn the tools independently.
AK post meeting comments: Typically most of our PhD students & postdocs are able to learn anything, (including command-line) *if* they can see it addressing an immediate need and their training was applied fairly frequently thereafter. Training was of less use where users had infrequent need of the application. Non technical and broad picture idea seminars why and when to search in protein and not DNA space) do have general appeal and work best in a hands-on environment (20min talk - 20min practical) or in a discussion based format.)
Similarly, the UNC core trains scientists on specific tools.
DL—Found that it’s beneficial for biologists to know basic knowledge to analyze data. They are able to do these with relative small help. However, large-scale analysis require core assistance.
AK—overarching factor in training is the biologist’s familiarity with repetitive analysis and then layering on top the laziest ways to that end. Knowledge of the command line fits this paradigm in some cases and there has been acceptance in learning some unix code. Single click answers are very nice for simplicity, but there are those individuals who take on learning tools and techniques themselves.
MR— Biologists are becoming more comfortable with computers overall, but that doesn’t necessarily mean that they would carefully read what is displayed, carefully consider parameters on forms etc. This can be challenging in a training situation for tools that need careful usage to provide meaningful results. We also found that short courses are most popular, and that personalized training related to urgent problems is most effective.
FL—the core at Whitehead at one time provided several formal course offerings: a 1 semester course of lectures, a 2-day intensive course, and a 5 mini course series (lecture and hands on). Topics included sequence analysis, protein analysis, Introduction to UNIX, Perl programming, bioperl, RDBMS’s using MySQL, microarray analysis. However, after a few years, a lot of these offerings were dropped due to attendance and preparation needed. Courses/seminars that are still provided and popular are microarray analysis and introduction to experimentation techniques and statistics for biologist. All materials are freely accessible to the community at .
Format has also changed. While once the courses were formal, now providing “Hot Topics” sessions that are specific and targeted 1-hour sessions with online notes posted internally. They have found the scientists interested in learning more are busy and find it hard to commit to a longer session. A recent topic was ”Unix tips and tricks” (how to manipulate large data outside of excel.) . Had 4-5 different labs represented with 12-15 people in attendance.
SJ--short and sweet courses are most valuable. LAMP (Linux Apache ModPerl/Python, etc) courses seem to get a lot of participating bioinformatics people. Also provide courses on specific tools and usually have hundreds sign up. R statistics package, for example, had 40 people turn out on a Saturday. Most people who come are grad students who need to get into the domain. Some are biologists who are not afraid of the technical. However, there is an unmet need amongst the biologists who need help with tools, but they haven't been able to draw them in yet.
UNC group--young people adapt better, amazed by grad student skills. However, there is the MS office culture where biologists demand an easy GUI with which to interact.
Discussion--How to create students who are much better in bioinformatics skills?
MIT has research focused informatics courses in the biology department, but these are not applied.
AH—Germany offers bioinformatics for biologist courses. But even if free, there is not a lot of interest or they are too difficult. Instruction of a more general basis in grad school doesn't make a lot of sense as people are motivated especially when there is direct applicability to their own needs and work.
MR--we can’t expect too much from mainstream biologists, how about people who are in between—those that are very computer savvy and those that need some nurturing.
AK--can't teach everything to everyone. Short vocational training works in many cases.
DS—Vanderbilt is more willing to hire staff than train. The institution perceives this to possess more bang for their buck. They have had good success in creating wikis for document sharing to share knowledge amongst themselves. Experience has shown that a lot of people would rather explore and investigate first on their own without having to ask someone. Given this, it’s easier to look something up on a wiki. Vanderbilt has a lot of success on using the wiki. If they have a new piece of software, the users usually type up directions and it appears on their wiki.
DL--Education from regular classes teaches theory. researchers need to know details to run certain programs—need to know the practical.
AH—They found that errors that occur usually encourages all questions to the core. It’s better to work together first for training. It’s usually easy to get biologists to understand correct way of doing something but hard to teaching them how to do it right.
Part II: Tools (David Sexton Introduction)
This discussion dovetails Part I as we want to discuss tools that enable biologists to perform analysis. Some tools investigated:
Interproscan  Galaxy  DS--played with this a little to glue workflows together. Seemed to do well in going to NCBI, do a blast, bring blast results back to workbench to do follow-on analysis. But it appears to be more of a tool for bioinformaticians than biologists. It still requires knowledge of each step than working de-novo. MR--good experience. groups have used it for large scale analysis. Some are even starting to put their own tools into it, adding them for a limited user audience, its good for large-scale analysis, especially comparisons between datasets (e.g. overlaps between datasets that refer to defined genomic regions with start and stop, such as ChIP-on-chip data with genome annotations)
Taverna  : reliability is not that high. Recommended to bring web service in house. Requires lots of file passing, and if those files are large, this makes it very cumbersome. There could be some solutions to get around this. Lax was surprised when it worked without much effort AK: Most problems occur with web services not running: having a local copy of said web services is the easiest solution. I use a local copy of Soaplab as it is trivial to add EMBOSS applications (or any command line program with an ACD file) to Soaplab. Seahawk/BioMoby should make this easy as well but I have not tried. N.B. main snag is that all files are moved across the network, which makes working with large file sizes almost a no go. This may be fixed in taverna2. Also I have had problems with the taverna client on 64-bit linux systems but I have not tried hard enough to debug this.
Good site for general microarray links: 
BASE  DL--trying to create a database using (second version) from sweden. people can retrieve data from web and have a plugin to. Some folks on the list have tried it. TM4: : Lax: multiple experiment viewer from TIGR--good for experimentalist. madmax has gone commercial--no longer open source.
Webarray?  a web based microarray tool wraps around bioconductor and mysql backend.
UCdavis uses R package mostly for analysis
SNP array analysis
chipmonk.  “A tool to visualize and analyze chip on chip array data. However, they only have certain genomes available in their database. Some have had good experience talk to author who has bee very good at incorporating additional genomes. However the Arabidopsis genome seems to be troublesome. AK--Simple but useful. Its simplicity has enabled a few users to get to grips with analysing their data without being too confused over multiple options. Author has been good in incorporating genome of interest (yeast).
Plink . A whole genome association analysis toolset. Analyzes large scale chip arrays. Vanderbilt, Harvard Med uses it. Developed at MGH.
WASP  from vanderbilt is publicly available.
Dchip  --some good usage, some not. Can do some simple analysis.
AK--One of the tools of choice for our structural groups. Loads of powerful feature and well as the capability to generate publication quality figures.
MAGE and Kinemage:  from Duke. A web based tool to connect structure and display.
autodoc , is very good for docking calculations.