IBEST Bioinformatics Core

From BioWiki
Jump to: navigation, search

Location: University of Idaho, Moscow, ID 83855 (USA)

Name/Title: James A. Foster, Director of IBEST Bioinformatics Core, Professor of Biological Sciences


Group size: 285 users

Environment: The University of Idaho is a major Land Grant university and has the primary responsibility for research and graduate education in Idaho.
The Initiative for Bioinformatics and Evolutionary STudies (IBEST) is an interdisciplinary research group at the University of Idaho. IBEST blends expertise from biologists, biochemists, mathematicians, statisticians, and computer scientists to examine the underpinnings of evolutionary biomedicine and to develop the analytical tools needed to do so.

A wide array of software is available for general sequence analysis, phylogenetic and population genetics analyses, protein structure modeling, expression array analysis, statistics and mathematical modeling. The software available on these computers include: General Sequence Analysis Packages (EMBOSS, etc.), Database Access (PDB, SCOP, GenBank, etc.), Phylogenetic Inference (PHYLIP, PAUP*, MrBayes, fastDNAml, GeneTree, MODELTEST, P4, PAML, Seq-Gen, TreeView), Population Genetics (Migrate, Fluctuate, Recombine, Lamarc, GeneConv), Sequence Alignment (HMMER, ClustalW, mafft, muscle, etc.), Sequence Assembly (Phred/Phrap/Consed, RepeatMasker), Protein Structure Visualization (Amber, Charmm, Cn3D, Rasmol, 3D Molecular Viewer), and Statistical/Mathematical Packages (Mathematica, MatLab, R, S3 Stochastic Spatial).
Most of these programs are free for academic use, while others are commercial packages or have been developed by COBRE students and personnel. The latter includes new software (EVALYN) for multiple sequence alignments, a fast program (ClearCut) for inferring phylogenetic trees that is based on a modified neighbor joining method, a program for high throughput analysis of ribosomal RNA gene sequences (HiTSA), and a companion program (StatGen) that summarizes and graphically displays the results from HiTSA, and the Microbial Community Analysis (MiCA) for analyzing TRFLP data about bacterial communities. In addition, tools to facilitate data analysis have been developed including an “all-against-all” BLAST, a tool for detecting transposable elements in genomes that uses RepeatMasker, as well as tools for distributed PAUP and bootstrap analysis. Each of these software and data analysis tools is freely available to researchers anywhere.

Pertinent hardware info:
The IBEST Bioinformatics Core currently comprises several compute clusters, stand-alone application servers, data storage systems, software, and personnel.
Our primary production cluster is made up of Dell M1000E enclosures and M605 blades. It has a total of 512 cores (AMD64) and 512GB of total system memory (1GB per core). In addition to our primary cluster we maintain a 96 processor Intel Xeon based system with 48GB total system memory (512MB per processor) and a 96 processor PowerPC G5 based system with 192GB total system memory (2GB per processor). We also maintain a cluster primarily used for testing and development, which is made up of 44 Intel Xeon processors and 22GB of system memory (512MB per processor). In addition to the research clusters we have a small cluster (datarig) made up of Dell PE2950 and PE1950 servers which dedicated to the post-processing of 454 sequencing data. It is a 40 core Intel Xeon cluster with 96GB of total system memory. The clusters are currently networked with 1Gb/s TCP interconnects.
The stand-alone application servers include 3 Dell M905's each with 16 cores and 32GB of system memory, 2 Dell PE6950's each with 8 cores and 8GB of system memory and 2 dual processor Sun SPARC V440's.
We support over 85TB of total data storage and backup. Our LTO-4 tape backup system is capable of backing up 20TB of data. Our main production cluster has 30TB of dedicated storage, the 454 datarig has 15TB of dedicated storage, and the remaining production and development clusters split the remaining 20TB of data storage. All user data is backed up regularly.
The core systems are located on the University of Idaho campus in a 1400 square foot room that has been specifically designed and renovated by UI for this Core. 1GB fiber and copper connect all equipment, and the UI backbone provides 4GB/s transfer rates. This room has a dedicated 80KVa UPS with three phase power and four forced air handlers attached to redundant university chilled water systems. The facility has an emergency backup diesel generator.
The bioinformatics core is connected to the university backbone with 1Gb/s fiber and provides 1Gb/s networking to the faculty offices and laboratories. Also, the University of Idaho, funded in part through the $10M NIH Lariat infrastructure grant, has expanded off campus data transmission capacity to 2.8Gb/s in the short term, and will expand to 10Gb/s within 3 years. This will enable large, high-speed data transfer with the rest of the world, rather than just within the university. This is important for both collaborations and for systems support, since keeping our many huge databases up to date requires constant transmission of vast amounts of data from primary database providers such as NCBI.
Staffing includes: two dedicated full time systems administrators and a dedicated Bioinformatics Coordinator (a PhD level scientist) to help users. All staff participate regularly in the Initiative for Bioinformatics and Evolutionary STudies (IBEST) meetings—several faculty and students from departments in the biological, computational, and mathematical sciences that meets at least weekly to present and discuss crosscutting research. We also have regular access to faculty and students in the UI MS/PhD program in Bioinformatics and Computational Biology (BCB).

IBEST provides user training in the form of short courses, weekly updates, and individualized assistance.
We also are closely affiliated with the Bioinformatics and Computational Biology (BCB) graduate program. BCB is a highly interdisciplinary program that requires students and faculty to bridge biological, computational, and mathematical disciplines. BCB faculty members are drawn from nine departments or divisions: Biochemistry (MMBB), Biological Sciences, Computer Science, Fish and Wildlife, Forest Resources, Mathematics, Plant Sciences (PSES), Rangeland Ecology, and Statistics. These academic units span four colleges and one institute: Science, Natural Resources, Agricultural and Life Sciences, Engineering, and WWAMI medical education program. These faculty members are available to serve on BCB graduate student committees.

Bioinformatics support model:
Grants for expansion and some staffing, institutional support for most staffing, in the process of implementing user fees for service contracts and maintennance