R Tutorial

Find the files in this tutorial on our GitHub!

R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. Some of R’s main features include:

an effective data handling and storage facility,

a suite of operators for calculations on arrays, in particular matrices

a large, coherent, integrated collection of intermediate tools for data analysis,

graphical facilities for data analysis and display either on-screen or on hardcopy, and

a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

Documentation for R can be found on its official website. Currently, versions 3.4.1 and 3.5.0 of R are available on the cluster.

Basics

After loading the R module, R can be launched through the command line by simply typing R.

R Virtual Environment

To use R through the command line, you must first initialize a R virtual environment. Virtual environments are isolated environments for projects, so that each project can have its own dependencies and packages installed, regardless of what dependencies every other project has. To create the virtual environment, run the following commands:


module load python3/anaconda/5.2.0
conda create -n r-environment r-essentials r-base
source activate r-environment
export R_LIBS_USER=/home/$USER/.conda/envs/r-environment/lib/R/library/

Once the virtual environment is created, it can be launched at any time by ensuring that the python3 module is loaded, using the command

module load python3/anaconda/5.2.0

and then launching the environment by using

source activate r-environment

Next, you can install any packages you need inside this environment. These packages will only be available within this environment. Packages can be installed with the install.packages() function in R. For example, to load the ggplot2 package, which is an R system for creating graphics, use the command

install.packages("ggplot2")

To exit the R virtual environment, use the command

source deactivate

Running R through a job script

1. Ensure that you have a virtual environment created, following the steps described above.

2. Create a R script. This repository provides a simple script, test.r, which demonstrates some of R's basic features.

test.r


print("Starting tests.")

x <- 1:10 # initializes x as values from 1 to 10
print(x) # print x

y <- sample(1:100, 10, replace=T) # generate 10 random numbers from 1 to 100
print(y) # print y

mean(x) # find mean of x

print(x*y) # print product of x and y

a <- c(2, 4, 6, 8, 10, 12) # a is a vector
b <- a[a>4] # b is the vector of all indices in a greater than 4
print(a)
print(b)

3. Prepare the submission script, which is the script that is submitted to the Slurm scheduler as a job in order to run the R script. This repository provides the script job.sh as an example.

job.sh


#!/bin/sh

#SBATCH --job-name=r_test
#SBATCH -o r_out%j.out
#SBATCH -e r_err%j.err
#SBATCH -N 1
#SBATCH --tasks-per-node=1

echo -e '\n submitted R job'
echo 'hostname'
hostname

# loads python module
module load python3/anaconda/5.2.0

# activates R virtual environment
source activate r-environment

# runs R program
R < test.r --no-save

# exit the virtual environment
source deactivate

4. Submit the job using: sbatch job.sh

5. Examine the results.