User talk:Bioinfocoreguru
Bioinfo-Core:
We handle >20 data analysis project/year with a staff of four at the Bioinformatics Analysis Core at the University of Pittsburgh (Genomics and Proteomics Core Laboratories). Our model is fee-for-service. The process involves an orientation meeting w/the investigators to discuss study aims and study design. We provide advice on study design for free. We derive a cost estimate for the scope of work (data analysis & research services). The investigator's project enters our data analysis queue when the data are produced. We maintain separate queues for microarray data analysis, top-down biomarker development (prediction modeling), proteomics (mass-spec, 2D-DIGE, multiplexed panels), and Next Generation Sequencing (GS-FLX and SoLiD). Our product is a Preliminary Research Report communicated to the PI in person (via PPT).
A key component of our core is our focus, for each data stream, on continuous methodological optimization. We call it "Intelligent Objective Methods Optimization"; we use criteria such a internal consistency for instances where the true status of genes/proteins is unknown (via Efficiency Analysis, (Jordan et al. 2008)[1] a resampling method developed for microarray data normalization + test comparative evaluation). We used spiked-in sample sets (e.g., protein standards) to explore method space for situation when the true status of a gene/protein is known (e.g., Sultana et al. 2009 [2]). By comparing various methods under objective criteria, we acquire increased confidence in the interpretation. Key to this is to build the evaluative comparative capabilities into the analysis software resources we build (e.g., Patel and Lyons-Weiler, 2004)[3].
I thought I would share our model; I am grateful to Bioinfo-Core for standing up a valuable community resource and hope to look around and learn a bit. Our Next Generation Sequencing bioinformatics effort is now well underway, and we hope to report to our community of peers what we are learning about optimal analysis settings for QA/QC optimization in the next year or so. If anyone would care to point me in the direction of empirical evidence for preferred settings for SoLiD sequence data analysis (specifically optimizing read length parameter during an alignment to reference) and tips on how many mismatches are preferred for various applications we would be grateful!
-jlw
PS Cancer Informatics is a fine journal for cancer-related informatics developments and research findings. I'm a proud papa of the journal, so I'm biased. No conflicts of interest to report. Enjoy.
References
- [1]:Jordan R, Patel S, Hu H, and Lyons-Weiler J. 2008. Efficiency analysis of competing tests for finding differentially expressed genes in lung adenocarcinoma. Cancer Informatics, 6: 389-421. http://www.la-press.com/journal.php?journal_id=10&issue_id=101
- [2]:Patel, S, J Lyons-Weiler. 2004. caGEDA: A web application for the integrated analysis of global gene expression patterns in cancer. Applied Bioinformatics 3:49-62.
- [3]:Sultana, T., R. Jordan & J. Lyons-Weiler. 2009. Optimization of the use of consensus methods for the detection and putative identification of peptides via mass-spectrometry using protein standard mixtures. J. of Proteomics & Bioinformatics 2:262-273.