submit batch GRID jobs using BASH

The GRID engine provides a nice way to manage computing resources and is commonly used in bioinformatics labs. In the following example, I give an example about how to submit a batch of jobs in a BASH script.

#!/bin/bash
template="#!/bin/bash\n
#$ -N SAMPLE_NAME\n
#$ -pe smp 8 -l dev -l h_vmem=8G\n
#$ -o SAMPLE_OUT\n
#$ -e SAMPLE_ERR\n
#$ -cwd\n
#$ -S /bin/bash\n

date\n
/user/bin/sortmerna --I SAMPLE_FASTQ -n 1 --db /db/src/SILVA_version_111.fasta --accept SAMPLE_ACCEPT -a 8 --other SAMPLE_REJECT --log SAMPLE_LOG --paired-in\n
date\n"

while read line;
do 
    starts_with=${line:0:1}
    if [ "$starts_with" != "#" ]; then # skip commented out entries
     output_base=`basename $line`
     output_base=$PWD/$output_base
     output_out=${output_base/".trimmed.fastq"/".out"}
     output_err=${output_base/".trimmed.fastq"/".err"}
     output_accepted_reads=${output_base/".trimmed.fastq"/"_accepted_reads"}
     output_rejected_reads=${output_base/".trimmed.fastq"/"_rejected_reads"}
     output_log=${output_base/".trimmed.fastq"/""}
     

     job=`basename $line`
     job=SORT_${job:0:6}
     job=${template/"SAMPLE_NAME"/$job}
     job=${job/"SAMPLE_OUT"/$output_out}
     job=${job/"SAMPLE_ERR"/$output_err}
     job=${job/"SAMPLE_FASTQ"/$line}
     job=${job/"SAMPLE_ACCEPT"/$output_accepted_reads}
     job=${job/"SAMPLE_REJECT"/$output_rejected_reads}
     job=${job/"SAMPLE_LOG"/$output_log}

     job_file=${output_log/".log"/".qsub"}
     echo -e $job > $job_file
     qsub $job_file
    fi
done

The above shell script generates a job (.qsub) file for each line from the standard input and submits the job using qsub. The program used in the jobs is sortmerna, a rRNA reads filtering tool.

One can run the above code using the following syntax:

[me@bioinfo sortmerna]$ batch_sortmerna_submit.bash < trimmed_fastq_list.txt 
Advertisements

3 thoughts on “submit batch GRID jobs using BASH

  1. Lijing Bu

    If use PBS submit and GNU-Parallel, use 8 nodes with 4 cores/node, it looks like this:
    ## Request number of nodes and cores
    #PBS -lnodes=8:ppn=4
    #PBS -lwalltime=70:00:00
    ## Specify the shell to be bash
    #PBS -S /bin/bash

    cat trimmed_fastq_list.txt | parallel -j4 ‘/user/bin/sortmerna –I {}_FASTQ -n 1 –db /db/src/SILVA_version_111.fasta –accept {}_ACCEPT -a 8 –other {}_REJECT –log {}_LOG –paired-in’

    -j specify how many cores to use.

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s