Tag Archives: pathway

Search KEGG Pathway Maps by Keyword in R

I was recently asked about how to list KEGG pathways not specific to any organism, which was not addressed in my previous post on accessing KEGG pathway from R.  Here, I post one example that shows you how to search KEGG pathway by keyword from R.

First, let’s define the function getKeggPathwayTable:

getKeggPathwayTable <- function(){
  pathway_link_REST_url <- "http://rest.kegg.jp/list/pathway/"
  kegg_pathways<- data.frame()

  current_row = 1
  for (line in readLines(pathway_link_REST_url)) {
    tmp <- strsplit(line, "\t")[[1]]
    map <- tmp[1]
    map <- strsplit(map, ":")[[1]][2]  
    pathway_name<- tmp[2]
    kegg_pathways[current_row, 1]=map
    kegg_pathways[current_row, 2]=pathway_name
    current_row = current_row + 1
  }
  names(kegg_pathways) <- c("map_id","name")
  kegg_pathways
}

Running the above function will give a data frame that lists all the current KEGG pathway IDs and names.

Next, lets define a function to search the data frame returned by getKeggPathwayTable by keyword.

searchKeggPathwayByKeyword <- function(keyword) {
  kegg_pathways <- getKeggPathwayTable()
  hits <- grep(keyword, kegg_pathways$name, ignore.case=TRUE)
  kegg_pathways[hits,]
}

Note that I use “ignore.case=TRUE” to make sure that the search is not case sensitive. Now let’s try it with the keyword “vitamin” and “sucrose.”

> searchKeggPathwayByKeyword("vitamin")
      map_id                             name
110 map00750            Vitamin B6 metabolism
303 map04977 Vitamin digestion and absorption

> searchKeggPathwayByKeyword("sucrose")
     map_id                          name
60 map00500 Starch and sucrose metabolism

Mapping genes to pathways using biomaRt

One common task in RNA-seq analysis is to map significantly regulated genes into pathways. Pathway information are available from various databases, e.g. reactome, uniprot, etc. The code below shows how to map genes of interest (as reported by cufflinks/cummeRbund) to human pathways in uniprot.


library(cummeRbund)
data <- readCufflinks()
gene_list <- genes(data)
gene_diff_data <- diffData(gene_list)
gene_list <- features(gene_list)
gene_list <- subset(gene_list, select=c('gene_id', 'gene_short_name'))
sig_gene_data <- subset(gene_diff_data, (significant=='yes'))
library(biomaRt)
unimart = useMart('unimart', dataset='uniprot')
pathway.data <- data.frame('gene_name', 'go_name')
for (i in sig_gene_data$gene_id) {
 gene_name = subset(gene_list, gene_id == i, select='gene_short_name')
 tmp <- getBM(attributes=c('gene_name','organism', 'go_name'),
 filters=c('gene_name'),
 values=gene_name,
 mart=unimart)
 tmp <- subset(tmp, tmp$organism=='Homo sapiens',
 select=c('gene_name', 'go_name'))
 tmp <- tmp[grep('^P:', tmp$go_name),]
 colnames(tmp) <- colnames(pathway.data)
 pathway.data <- rbind(pathway.data, tmp)
}
write.table(pathway.data, 'sig_gene_pathway.csv', sep=',', row.names=FALSE)