MCIC Novemeber 2012 news.
|
 |
ISSUE 3
Monday, November 5 |
|
THIS ISSUE
PreVious ISSUEs
|
MCIC Computational Biology Laboratory.By Tea Meulia Open house is on November 16, 2012. from 10:00 AM till 3:30 PM
The MCIC is very pleased to announce that we are expanding our bioinformatics core services by launching the Computational Biology Laboratory (MCBL). The creation of the MCBL has been part of a joint MCIC and CAPS effort that aims to strengthen bioinformatics collaborations and computational support available to biologists at OSU. We aspire for the MCBL to become an engaging environment for learning and performing bioinformatics research at the OARDC, by providing a computer lab, servers and workstations that host up-to-date software for genetic data analysis and experience personnel.
The MCBL aims to provide an engaging work environment and space for performing research involving biological data analysis. We aspire for the MCBL to become the place to be for learning and performing bioinformatics research at the OARDC, the place where ideas are discussed and exchanged, students and users learn from each other and get help and support from our experience staff when needed, and where as a community we move our bioinformatics knowledge forward.The MCBL is also connected to scientists and activities, such as seminars and journal clubs, at the CAPS Computational Biology Laboratory (CCBL) in Rightmire Hall (https://caps.osu.edu/caps-computational-biology-laboratory) via 24 hours videoconferencing. Similarly, in the future the videoconferencing equipment will allow interactions with other OSU bioinformatics units.
Moreover, MCBL infrastructure is conducive for research and also well suited for teaching and training purposes. It includes a computer lab with eight client workstations connected to servers and workstations. These machines hosts algorithms and custom scripts for data analyses, such as for example, the MCIC customized Galaxy server (http://Galaxy.oardc.ohio-state.edu) that host a wide array of software and workflows for deep sequence data analysis in a biologist friendly environment, standalone Blast, Blast2Go, R statistical package, IGV browser for visualization, automated RNA-Seq workflow that runs using a bash scrip to analyze large number of samples in one time for cases where uploading into Galaxy is too time consuming, and others. The detailed list of services can be found in the MCBL description document [MCBL infrastructure and membership].
The MCBL has experienced staff. Asela Wijeratne, is stepping into a Research Scientist position and will oversee the bioinformatics activities of the core. He joined the MCIC in 2010 as Genomics Manager, and has been involved in several projects involving deep sequencing data analysis and has been instrumental for the development of the bioinformatics tools that are now available at the MCIC. He will continue to provide support for research, train students and collaborate with faculty on projects involving genomics and genotyping data analysis, and to develop and introduce new bioinformatics methodologies. Stephen Opiyo also joined the MCIC in 2010 as Research Scientist based in Kottman Hall in Columbus. Stephen provides bioinformatics support for metabolomics and protein data analysis, and for statistical methods. Finally Saranga Wijeratne also joined the MCIC in 2010 as System Developer and has developed and also maintains the entire computational infrastructure at the MCIC, including our hardware and software. Highlights of his scripting and computing capabilities include our local customized Galaxy site, and the set up of the MCBL computer. He is currently working towards developing a cluster to host our software, which will further expand our computational capabilities and make efficient use of the available hardware. Anyone interested in bioinformatics can become MCBL member. Guidelines for the application, rules for using the infrastructure and fee structure can be found in the MCBL description and guidelines document [MCBL infrastructure and membership].
We will held an open house for the MCBL on Friday, November 16, 2012, from 10:00 AM till 3:30 PM. Please come to learn more about our new core and how it can help your research.
Become A MCBL member by filling in this online registration form.
This project is supported by OARDC Seeds grant and the MCIC cost recovery fund.
Back to top
|
Sequencing on the Illumina GAII is becoming cheaper and is ideal for smaller pilot projects. By Tea Meulia
As everyone is aware of, sequencing on the Illumina GAII Genome Analyzer is as expensive as sequencing on the newest Illumina technology the HiSeq, however data throughput is about 3-4 times lower on the GAII than the HiSeq. We were finally able to negotiate a discount for the Illumina GAII sequencing reagents, and consequently we are able to provide more competitive pricing. Our new rates are:
Single read, up to 36 cycles (or 29 bases, plus 7 index bases): $600.00
Paired-end, up to 76 bases: $1,400.00
Paired-end, up to 100 bases: $1,700.00
Paired-end, up to 150 bases: $2,000.00
We al offer an additional 15% per lane discounts for the submission of 7-8 samples, or multiples thereof. Discounts for larger projects are also available. Please inquire about those. Our rates are available at http://www.oardc.ohio-state.edu/mcic/rates/rates.html.
These new prices make now sequencing on the Illumina GAII genome analyzer more competitive, and the lower, more affordable sequencing costs are ideal for smaller pilot projects or for the many instances where ultra-deep sequencing is not required, such as for example smallRNA or ChipSeq analysis, or reduced genome representation sequencing for genotyping purposes. One GAII flow-cell lane provides 35 – 45 million clusters for a single read, and the double for a paired-end read.
Back to top
|
MCBL upcoming tutorials, training sessions and user group. By asela wijeratne and stephen opiyo
In spring and summer 2012 we offer two tutorials that were very well received, and we could not accommodate all those who wanted to participate. Saranaga taught a basic hands-on bioinformatics session on how to use the MCIC Galaxy portal for high throughput sequence data analysis. This session focused on an introduction to “Galaxy”, frame work that aims to make computational biology accessible to researchers that do not have in-depth computational skills. The topics were divided into four sections: (1) Introduction to Galaxy tools; (2) How to manage history in Galaxy; (3) Running basic Text manipulation tools; (4) How to upload and download data from and to the UCSC genome browser. In addition, graphical user interfaces like FileZilla were discussed in detail for large data transferring through ftp.
Asela taught a tutorial on RNA-Seq data analysis. The workshop was divided into four sections of approximately 3 hours, with each topic comprising a lecture, aimed to introduce basic concepts with real world examples, and a practical hand-on session that illustrated the use of software and protocols that are integrated into our customized version of Galaxy. Covered topics included an introduction to RNA-seq and experimental planning, short read mapping and data visualization, differential gene expression analyses, de novo and reference guided transcriptome assembly and transcript annotation of assembled transcripts.
Upcoming tutorials:
RNA-Seq and transcritome assemblies: Asela is planning to repeat the RNA-Seq tutorial in December and will be sending out more information regarding the dates shortly. However, if you are interested in participating, please contact Asela via e-mail (wijeratne.1@osu.edu) so that he can reserve a spot, as last time we had more requests that we could accommodate. A detailed description of the topics covered can be downloaded here [PDF]. You can help us improve the workshop and update the topics, by providing us feedback by taking this short survey: http://www.surveymonkey.com/s/K7RKGSY.
Introduction to R: Stephen Opiyo is preparing a tutorial aimed to familiarized users with R. This is an introduction to R software for beginners. The workshop will include three sessions: (1) an introductions to R basic tools, (2) graphics and data visualization in R, and (3) how to perform basic statistical operations in R. This tutorial is planned for January and you can find details about the topics that will be covered in this document [PDF]. Please contact Stephen via e-mail (opiyo.1@osu.edu), if you would like to reserve a spot or get more information regarding this upcoming workshop.
MCIC bioinformatics and genomics discussion group ‘Genomics_MCIC’ allows you to post questions and start discussions on bioinformatics and genomics topics. If you would like to subscribe, please send an e-mail to aselaw90@gmail.com.
Back to top
|
Mass SNP Genotyping by Sequencing on the Illumina GAII Genome Analyzer. By Tea meulia
We are exploring possibilities to expand the use of the Illumina GAII Genome Analyzer for new applications. One such application could be the Mass SNP Genotyping by Sequencing Technology (MGST) that is being developed by Eureka Genomics. Essentially, the assay enables to screen hundreds of SNPs and thousands of samples in parallel using a single lane of the Illumina GAII Genome Analyzer. The assay is briefly outined in Figure 1, and a more detailed description can be found in the poster and presentation provided by Eureka Genomics [XGEN-MGST.pdf; EG_LDMA_Presentation.pdf].
This same assay is also compatible with copy number variation (CNV), methylation and gene expression applications, however we do not have details regarding these applications yet. The multiplexing nature of the MGST assay, apparently greatly reduces the cost of the analysis of variants in a population, particularly in terms of reagents, and consequently the cost per data point. Table 1 shows an estimated cost per data point for four different scenarios. These costs are an estimate and were calculated from very preliminary estimates for probe and primer design and bioinformatics provided to us by Eureka Genomic, and from our estimate for setting up the assay and running it on our GAII. Eureka Genomics also offers a full services. If you are interested in this technology, please contact Tea Meulia (Meulia.1@osu.edu).
 |
|
SNP |
Samples |
Cost per data point |
24 |
2,000 |
$ 0.72 |
5 |
2,000 |
$ 3.33 |
5 |
10,000 |
$ 1.39 |
200 |
1,000 |
$ 0.30 |
|
Figure 1: MGST assay
Click on the image for larger figure |
|
Table : Estimated cost per data point for 4 different scenarios. |
Back to top
|
Automation of RNA-Seq library preparation using the Eppendorf liquid handling robot. BY ASELA WIJERATNE and Maria Elena Hernandez-Gonzalez
RNA-Seq is a powerful tool to detect sequence and expression level of transcribed RNA. Modern day Illumina high-throughput sequencers can run up to 24 lanes generating more than 200 million short reads per lane. Because of this very high throughput one can easily combine several samples in one lane and investigate a very large number of samples in a single sequencing run. However, the protocols for RNA-Seq library preparation have multiple pipetting steps. In average, it takes up to three days to execute the entire protocol and the maximum number samples a single technician can comfortably handle in one setting is 16. Therefore, for preparing a larger number of libraries at once, automation using a liquid handling robot is advantageous, it saves time and it produces more consistent results.
Here we have evaluated the use of the Eppendorf EpMotion liquid handler for Illumina TruSeq RNA-Seq library preparation. Four total RNA samples were divided into two aliquots and RNA-Seq libraries were prepared manually or using the Eppendorf EpMotion liquid handler.
Initial quality controls indicated that manual sample preparation yielded higher library quantity compared the automated sample preparation. The final concentrations of the eight libraries are shown in Table 1. In average manually prepared libraries were twice as concentrated than the libraries prepared with the liquid handler. However, both methods yielded more than enough material for sequencing. Usually 2-3 ul of a 10 nM solution are denatured and processed for sequencing.
The four samples prepared with the same method were pooled together and sequenced in a single lane of Illumina GAII sequencer using the 76-bases paired-end protocol. Each flow-cell lane yielded nearly 64 million reads. Initial sequence data quality assessments vary similar for the two lanes containing samples prepared manually and with the liwuid handler respectively. After de-multiplexing each dataset was pre-processed to exclude low quality bases and adapter sequences, and then each dataset was aligned to Histoplasma cDNA sequences. Aligned reads were counted using a python script and the count data were normalized using DESeq package in R. The number of reads obtained after each step are shown in Table 2. Samples prepared with the liquid handler yielded a slightly lower % of mapped reads: an average of 56.75% for manually prepared samples compared to 54.75% for the samples prepared with the liquid handler.
Pearson correlation coefficient was calculated for manual versus automated sample preparation. Table 3 shows that the manual and automated sample preparation comparisons had Pearson correlation coefficient of one, indicating that the outcome of the two methods is very similar. Similarly the Euclidean distance calculation for the eight samples ( Figure 1) show clustering of the same two samples prepared with using the manual or automated procedure and indicate that they are the most closely related.
| Sample |
QuBit (ng/ul) |
qPCR (nM) |
| hist_man_1 |
33.70 |
988.07 |
| hist_man_2 |
31.40 |
1052.05 |
| hist_man_3 |
31.00 |
1019.70 |
| hist_man_4 |
31.70 |
870.58 |
| hist_robot_1 |
15.90 |
310.61 |
| hist_robot_2 |
25.30 |
605.53 |
| hist_robot_3 |
18.40 |
439.83 |
| hist_robot_4 |
16.30 |
352.40 |
|
|
 |
| Table 1: Library concentrations |
|
Table 2: The number of reads obtained after each step
Click on the image for larger figure |
| |
hist_man_1 |
hist_man_2 |
hist_man_3 |
hist_man_4 |
| hist_robot_1 |
1 |
0.99 |
084 |
0.85 |
| hist_robot_2 |
0.99 |
1 |
0.85 |
0.85 |
| hist_robot_3 |
0.85 |
0.85 |
1 |
0.99 |
| hist_robot_4 |
0.85 |
0.85 |
0.99 |
1 |
|
|
![Rplot[3].jpg](1211_newsletter_images/Rplot[3].jpg) |
| Table 3: Pearson correlation coefficient for manually prepared vs automated samples (hist_man: manually prepared; hist_robot: sample prepared using liquid handler). |
|
Figure 1: The euclidean distances between the samples
Click on the image for larger figure |
Back to top
|
|
FORWARD TO FRIEND
Know someone who might be interested in the email? Why not forward this email to them.
|
|
|
|
|