Monthly Archives: December 2012

Plot Genome Coverage Graphs using grid package in R

It is a common task to visualize the genome coverages across different conditions. This usually requires a multiple panel graph with some shared axis labels and a title shared by all the panels. One easy way to do this is using the grid packages in R. The code below shows how to do it.

coverages <- read.csv(‘Ambion_coverage_small.csv’, header=TRUE, sep=’,’)


## 26 Ambion
title = expression(paste(“26”, degree, “C rep 1”))
p23 <- ggplot(coverages, aes(x=position, y=NBA.23)) + geom_line()
p23 <- p23 + theme(axis.title.x = element_blank(),
axis.title.y = element_blank())
p23 <- p23 + ggtitle(title)

title = expression(paste(“26”, degree, “C rep 2”))
p25 <- ggplot(coverages, aes(x=position, y=NBA.25)) + geom_line()
p25 <- p25 + theme(axis.title.x = element_blank(),
axis.title.y = element_blank())
p25 <- p25 + ggtitle(title)

title = expression(paste(“37”, degree, “C rep 1”))
p20 <- ggplot(coverages, aes(x=position, y=NBA.20)) + geom_line()
p20 <- p20 + theme(axis.title.x = element_blank(),
axis.title.y = element_blank())
p20 <- p20 + ggtitle(title)

title = expression(paste(“37”, degree, “C rep 2”))
p21 <- ggplot(coverages, aes(x=position, y=NBA.21)) + geom_line()
p21 <- p21 + theme(axis.title.x = element_blank(),
axis.title.y = element_blank())
p21 <- p21 + ggtitle(title)

title = expression(paste(“37”, degree, “C rep 3”))
p22 <- ggplot(coverages, aes(x=position, y=NBA.22)) + geom_line()
p22 <- p22 + theme(axis.title.x = element_blank(),
axis.title.y = element_blank())
p22 <- p22 + ggtitle(title)


pushViewport(viewport(layout=grid.layout(6,2, heights= unit(c(0.8, 5, 5, 5, 5, 5), ‘null’), widths=unit(c(0.5, 5), ‘null’))))

grid.text(“Genome Coverages of Ambion Libraries”, vp = viewport(layout.pos.row = 1)) # a title for the entire grid

grid.text(“Row read counts”, rot=90, vp = viewport(layout.pos.col = 1)) # a title for the entire grid
print(p23, vp = viewport(layout.pos.row=2, layout.pos.col=2))
print(p25, vp = viewport(layout.pos.row=3, layout.pos.col=2))
print(p20, vp = viewport(layout.pos.row=4, layout.pos.col=2))
print(p21, vp = viewport(layout.pos.row=5, layout.pos.col=2))
print(p22, vp = viewport(layout.pos.row=6, layout.pos.col=2))

The above code uses a grid of 6 rows and 2 columns. The first column and the first row are used for shared y-axis and a title for the entire grid. The rest 5 rows in 1 column are used to plot all the panels. There are also some tricks for hide x- and y-axis labels for every panel, and a way to print the degree sign along with other text as panel titles.

These code generate the following figure.



extract lines in a text file using Awk

In bioinformatics, one commonly needs to extract some data from a big text file to check the data. The following code in Awk helps to do this:

awk ‘0 == (NR-1) % 100’ big_file.txt > small_file.txt

It extracts 1 line every other 100 lines in a text file, including the first line.

Using ShortRead for mapping quality control

The ShortRead package in Bioconductor offers quality control for NGS data from base-calling to mapping. For mapping scores, ShortRead can read several formats, including BAM files generated from samtools. However, it cannot recognize the sorted BAM files.

The R code below plots alignment quality density distribution.

aln <- readAligned(getwd(), 'my.bam', type='BAM')
q <- quality(alignQuality(aln))
densityplot(q[q&bt;1], xlab=’Alignment quality’, plot.points=FALSE, log=’y’)

The plot looks like this:

Alignment quality density distribution