Monthly Archives: December 2012

Plot Genome Coverage Graphs using grid package in R

It is a common task to visualize the genome coverages across different conditions. This usually requires a multiple panel graph with some shared axis labels and a title shared by all the panels. One easy way to do this is using the grid packages in R. The code below shows how to do it.

coverages <- read.csv(‘Ambion_coverage_small.csv’, header=TRUE, sep=’,’)


## 26 Ambion
title = expression(paste(“26”, degree, “C rep 1”))
p23 <- ggplot(coverages, aes(x=position, y=NBA.23)) + geom_line()
p23 <- p23 + theme(axis.title.x = element_blank(),
axis.title.y = element_blank())
p23 <- p23 + ggtitle(title)

title = expression(paste(“26”, degree, “C rep 2”))
p25 <- ggplot(coverages, aes(x=position, y=NBA.25)) + geom_line()
p25 <- p25 + theme(axis.title.x = element_blank(),
axis.title.y = element_blank())
p25 <- p25 + ggtitle(title)

title = expression(paste(“37”, degree, “C rep 1”))
p20 <- ggplot(coverages, aes(x=position, y=NBA.20)) + geom_line()
p20 <- p20 + theme(axis.title.x = element_blank(),
axis.title.y = element_blank())
p20 <- p20 + ggtitle(title)

title = expression(paste(“37”, degree, “C rep 2”))
p21 <- ggplot(coverages, aes(x=position, y=NBA.21)) + geom_line()
p21 <- p21 + theme(axis.title.x = element_blank(),
axis.title.y = element_blank())
p21 <- p21 + ggtitle(title)

title = expression(paste(“37”, degree, “C rep 3”))
p22 <- ggplot(coverages, aes(x=position, y=NBA.22)) + geom_line()
p22 <- p22 + theme(axis.title.x = element_blank(),
axis.title.y = element_blank())
p22 <- p22 + ggtitle(title)


pushViewport(viewport(layout=grid.layout(6,2, heights= unit(c(0.8, 5, 5, 5, 5, 5), ‘null’), widths=unit(c(0.5, 5), ‘null’))))

grid.text(“Genome Coverages of Ambion Libraries”, vp = viewport(layout.pos.row = 1)) # a title for the entire grid

grid.text(“Row read counts”, rot=90, vp = viewport(layout.pos.col = 1)) # a title for the entire grid
print(p23, vp = viewport(layout.pos.row=2, layout.pos.col=2))
print(p25, vp = viewport(layout.pos.row=3, layout.pos.col=2))
print(p20, vp = viewport(layout.pos.row=4, layout.pos.col=2))
print(p21, vp = viewport(layout.pos.row=5, layout.pos.col=2))
print(p22, vp = viewport(layout.pos.row=6, layout.pos.col=2))

The above code uses a grid of 6 rows and 2 columns. The first column and the first row are used for shared y-axis and a title for the entire grid. The rest 5 rows in 1 column are used to plot all the panels. There are also some tricks for hide x- and y-axis labels for every panel, and a way to print the degree sign along with other text as panel titles.

These code generate the following figure.



Using ShortRead for mapping quality control

The ShortRead package in Bioconductor offers quality control for NGS data from base-calling to mapping. For mapping scores, ShortRead can read several formats, including BAM files generated from samtools. However, it cannot recognize the sorted BAM files.

The R code below plots alignment quality density distribution.

aln <- readAligned(getwd(), 'my.bam', type='BAM')
q <- quality(alignQuality(aln))
densityplot(q[q&bt;1], xlab=’Alignment quality’, plot.points=FALSE, log=’y’)

The plot looks like this:

Alignment quality density distribution