Scientific Applications

A list of applications downloaded on the cluster, and a description of what they do.

Bioinformatics

Abawaca (1.07)– abawaca (A Binning Algorithm Without A Cool Acronym) is a binning program that can take advantage of different types of information such as differential coverage and DNA signature

Abyss (2.1.0)– ABySS is a parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.

AFNI (17.2.16)– AFNI is a set of C programs for processing, analyzing, and displaying functional MRI (FMRI) data

Ants (2.2)– Advanced Normalization Tools (ANTs) extracts information from complex datasets that include imaging.

BBMAP (37.93, 38.16)– BBMap is a splice-aware global aligner for DNA and RNA sequencing reads. It can align reads from all major platforms – Illumina, 454, Sanger, Ion Torrent, Pac Bio, and Nanopore. BBMap has a large array of options, described in its shell script. It can output many different statistics files, such as an empirical read quality histogram, insert-size distribution, and genome coverage, with or without generating a sam file.

Bedtools (2.27.1)– The bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.

blast (2.6.0)– Basic Local Alignment Search Tool is a sequence comparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query.

blat (35)– BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 20 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more.

bowtie2 (1.2.1.1, 2.3.3)– Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).

bwa (3.16)– BWA is a program for aligning sequencing reads against a large reference genome (e.g. human genome). It has two major components, one for read shorter than 150bp and the other for longer reads.

checkm (1.0.9)– Assess the quality of microbial genomes recovered from isolates, single cells, and metagenomes

concoct (0.4.0)– A program for unsupervised binning of metagenomic contigs by using nucleotide composition, coverage data in multiple samples and linkage data from paired end reads.

cufflinks (2.2.1)– Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.

DAS_Tool (1.1)– DAS Tool is an automated method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly.

epacts (3.2.6)– EPACTS (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical tests for identifying genome-wide association from sequence data through a user-friendly interface, both to scientific analysts and to method developers.

FastQC (0.11.5)– FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines

freesurfer (2017)– FreeSurfer is a set of tools for analysis and visualization of structural and functional brain imaging data

fsl (5.0.10)– FSL is a comprehensive library of analysis tools for FMRI, MRI and DTI brain imaging data.

GATK (3.80, 4.0.5.1)– The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyze high-throughput sequencing data

kallisto 0.44.0– kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.

lammps (mar 2017)– LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale.

maxbin (2.2.4)– MaxBin is a software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm.

megahit (1.1.3)– An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

metabat (2.12.1)– A robust statistical framework for reconstructing genomes from metagenomic data.

miRA (1.2.0)– MIRA is a whole genome shotgun and EST sequence assembler

miRExpress (2.1.4)– A database-supported, efficient and flexible tool for detecting miRNA expression profiles.

MiRge (2018)– A fast, smart small RNA-seq solution to process samples in a highly multiplexed fashion. miRge employs a Bayesian alignment approach, whereby reads are sequentially aligned against customized mature miRNA, hairpin miRNA, noncoding RNA and mRNA sequence libraries.

oases (0.2.09)– Oases is a de novo transcriptome assembler designed to produce transcripts from short read sequencing technologies, such as Illumina, SOLiD, or 454 in the absence of any genomic assembly.

picard (2017)– A set of Java command line tools for manipulating high-throughput sequencing data (HTS) data and formats

Pplacer (1.1)– Pplacer places query sequences on a fixed reference phylogenetic tree to maximize phylogenetic likelihood or posterior probability according to a reference alignment.

prodigal (2.6.3)– Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program

rsem (1.3.0)– Accurate quantification of gene and isoform expression from RNA-Seq data

samtools (1.5)– SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments

snpEff (12_2017)– Genetic variant annotation and effect prediction toolbox

SNPiR (12_2017)– Identifies single nucleotides polymorphisms (SNPs) in RNA-seq data. SNPiR consists of (1) a modified RNA-seq read-mapping procedure that allows alignment of reads to the reference in a splice-aware manner, (2) variant calling using the Genome Analysis Toolkit (GATK) and (3) vigorous filtering of false-positive calls.

SOAPdenovo (r240)– SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective way.

spades (3.11, 3.12)– SPAdes – St. Petersburg genome assembler – is intended for both standard isolates and single-cell MDA bacteria assemblies.

star (2.5)– RNAseq aligner

stringtie (1.3.3)– StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus.

subread(1.5.3)– The Subread software package is a tool kit for processing next-gen sequencing data. It includes Subread aligner, Subjunc exon-exon junction detector and featureCounts read summarization program

tophat (2.1.1)– TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

transabyss (2.0.1)– de novo assembly of RNA-Seq data using ABySS

trimmomatic (0.36)– a flexible read trimming tool for Illumina NGS data.

Trinity (2.6.6)– Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes.

vcf2maf (2017)– Convert a VCF into a MAF, where each variant is annotated to only one of all possible gene isoforms

Chemistry/Material Sciences

Abaqus (V6R2017)– Abaqus is a finite element analysis software used for engineering simulations.

Amber(16)– “Amber” refers to two things: a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs); and a package of molecular simulation programs which includes source code and demos

DFTB+ (18.2)– DFTB+ is a fast and efficient versatile quantum mechanical simulation software package. Using DFTB+ you can carry out quantum mechanical simulations similar to density functional theory but in an approximate way, typically gaining around two orders of magnitude in speed.

firefly (8.2.0)– Firefly is a freely available ab initio and DFT computational chemistry program developed to offer high performance on Intel-compatible x86, AMD64, and EM64T processors.

gamess (apr2017)– The General Atomic and Molecular Electronic Structure System (GAMESS) is a general ab initio quantum chemistry package.

gwyddion (2.5.0)– Gwyddion is a modular program for SPM (scanning probe microscopy) data visualization and analysis. Primarily it is intended for the analysis of height fields obtained by scanning probe microscopy techniques (AFM, MFM, STM, SNOM/NSOM) and it supports. However, it can be used for general height field and (greyscale) image processing, for instance for the analysis of profilometry data or thickness maps from imaging spectrophotometry.

lammps (mar 2017)– LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale.

nwchem (6.8)– NWChem aims to provide its users with computational chemistry tools that are scalable both in their ability to treat large scientific computational chemistry problems efficiently, and in their use of available parallel computing resources from high-performance parallel supercomputers to conventional workstation clusters.

qe (6.2.0)– a software suite for ab initio quantum chemistry methods of electronic-structure calculation and materials modeling, distributed for free under the GNU General Public License. It is based on Density Functional Theory, plane wave basis sets, and pseudopotentials.

SAPT (2016.1)– SAPT is a collection of computer codes designed to implement the many-body (body = electron) version of Symmetry-Adapted Perturbation Theory for intermolecular interactions.

vasp (5.4.4)– intel - The Vienna Ab initio Simulation Package, better known as VASP, is a package for performing ab initio quantum mechanical molecular dynamics using either Vanderbilt pseudopotentials, or the projector augmented wave method, and a plane wave basis set.

Mathematics

ACML 5.3.1– ACML provides a free set of thoroughly optimized and threaded math routines for HPC, scientific, engineering and related compute-intensive applications. ACML is ideal for weather modeling, computational fluid dynamics, financial analysis, oil and gas applications and more.

blacs (1.1)– The BLACS (Basic Linear Algebra Communication Subprograms) project is an ongoing investigation whose purpose is to create a linear algebra oriented message passing interface that may be implemented efficiently and uniformly across a large range of distributed memory platforms.

blas (3.6.0)– The BLAS (Basic Linear Algebra Subprograms) are routines that provide standard building blocks for performing basic vector and matrix operations. The Level 1 BLAS perform scalar, vector and vector-vector operations, the Level 2 BLAS perform matrix-vector operations, and the Level 3 BLAS perform matrix-matrix operations. Because the BLAS are efficient, portable, and widely available, they are commonly used in the development of high quality linear algebra software.

Cmdstan (2.17)– CmdStan is the command line interface to Stan, a state-of-the-art platform for statistical modeling and high-performance statistical computation.

Cmgui (7.3)– Cmgui is an advanced 3D visualisation software package with modelling capabilities. Cmgui is part of CMISS, a mathematical modelling environment initially developed by the University of Auckland Bioengineering Institute. CMISS stands for Continuum Mechanics, Image analysis, Signal processing and System Identification.

fftw2 (2.1.5), fftw3 (3.3.4)– FFTW is a C subroutine library for computing the discrete Fourier transform (DFT) in one or more dimensions, of arbitrary input size, and of both real and complex data (as well as of even/odd data, i.e. the discrete cosine/sine transforms or DCT/DST).

gsl (2.4)– The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. The library provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total with an extensive test suite.

JAGS (4.3.0)– JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation. JAGS was written with three aims in mind: To have a cross-platform engine for the BUGS language, to be extensible, allowing users to write their own functions, distributions and samplers, and to be a platform for experimentation with ideas in Bayesian modelling

lapack 3.6.0 – LAPACK provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems.

matlab (R2017b)– MATLAB® is a high-level language and interactive environment for numerical computation, visualization, and programming. Using MATLAB, you can analyze data, develop algorithms, and create models and applications.

Openblas (0.2.18)– OpenBLAS is an optimized BLAS library, which is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication.

R (3.4.1, 3.5.0)– R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files.

scalapack (2.0.2)– ScaLAPACK is a library of high-performance linear algebra routines for parallel distributed memory machines. ScaLAPACK solves dense and banded linear systems, least squares problems, eigenvalue problems, and singular value problems.

stan (2.17)– Stan® is a state-of-the-art platform for statistical modeling and high-performance statistical computation. Thousands of users rely on Stan for statistical modeling, data analysis, and prediction in the social, biological, and physical sciences, engineering, and business.

stata (14)– Stata is a complete, integrated software package that provides all of your data science needs—data manipulation, visualization, statistics, and reproducible reporting.

Developers

Bamtools (2.4.1)– BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.

BCFtools (1.5)– BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF.

Bonnie++ (1.97.1)– Bonnie++ is a free file system benchmarking tool for Unix-like operating systems that is aimed at performing a number of simple tests of hard drive and file system performance.

caffe (1.0)– Caffe is a deep learning framework

Cmake (3.3.1)– CMake is an open-source, cross-platform family of tools designed to build, test and package software. CMake is used to control the software compilation process using simple platform and compiler independent configuration files, and generate native makefiles and workspaces that can be used in the compiler environment of your choice.

cuda 7.5, 8.0, 9.2– CUDA (aka Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA and implemented by the graphics processing units (GPUs) that they produce

dos2unix (7.4)– unix2dos is a tool to convert line breaks in a text file from Unix format to DOS format and vice versa. When invoked as unix2dos the program will convert a Unix text file to DOS format, when invoked as dos2unix it will convert a DOS text file to UNIX format

gcc 4.9.1– The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for these languages (libstdc++, libgcj,…).

Gdb (7.11)– GDB, the GNU Project debugger, allows you to see what is going on `inside' another program while it executes -- or what another program was doing at the moment it crashed. GDB can do four main kinds of things (plus other things in support of these) to help you catch bugs in the act: Start your program, specifying anything that might affect its behavior, make your program stop on specified conditions, examine what has happened when your program has stopped, and change things in your program, so you can experiment with correcting the effects of one bug and go on to learn about another.

globalarrays (5.4)– Global Arrays (GA) is a Partitioned Global Address Space (PGAS) programming model. It provides primitives for one-sided communication (Get, Put, Accumulate) and Atomic Operations (read increment). It supports blocking and non-blocking primitives, and supports location consistency.

hdf5 (1.6.10, 1.8.17)– HDF5 is a data model, library, and file format for storing and managing data. It supports an unlimited variety of datatypes, and is designed for flexible and efficient I/O and for high volume and complex data.

hwloc (1.11.3)– The Portable Hardware Locality (hwloc) software package provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information as well as the locality of I/O devices such as network interfaces, InfiniBand HCAs or GPUs.

Intel Compiler– The Intel® Composer XE suites are available in several configurations that combine industry leading C, C++ and Fortran compilers, programming models including Intel® Cilk™ Plus and OpenMP*, performance libraries including Intel® Math Kernel Library (Intel® MKL), Intel® Integrated Performance Primitives (Intel® IPP) and Intel® Threading Building Blocks (Intel® TBB) for leadership application performance on systems using Intel® Core™ and Xeon® processors, Intel® Xeon Phi™ coprocessors and compatible processors.

Iozone3 (434)– Iozone is a filesystem benchmark tool. The benchmark generates and measures a variety of file operations. Iozone has been ported to many machines and runs under many operating systems. Iozone is useful for performing a broad filesystem analysis of a vendor’s computer platform.

JAVA (1.8.0_151, 1.8.0_162)– For Java Developers. Includes a complete JRE plus tools for developing, debugging, and monitoring Java applications.

mpich (3.2rc2)– MPICH2 is an implementation of the Message-Passing Interface (MPI). The goals of MPICH2 are to provide an MPI implementation for important platforms, including clusters, SMPs, and massively parallel processors. It also provides a vehicle for MPI implementation research and for developing new and better parallel programming environments.

mvapich2 (2.2rc1)– MVAPICH2 (MPI-3 over InfiniBand) is an MPI-3 implementation based on MPICH ADI3 layer.

netcdf(4.4.0)– NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

Netperf (2.7.0)– Netperf is a benchmark that can be used to measure the performance of many different types of networking. It provides tests for both unidirectional throughput, and end-to-end latency.

open64 (4.5.2.1)– Open64 has been well-recognized as an industrial-strength production compiler. It is the final result of research contributions from a number of compiler groups around the world. Formerly known as Pro64, Open64 was initially created by SGI from SGI’s MIPSPro compiler, and licensed under the GNU Public License (GPL v2).

opencv(3.30)– OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library

openmpi(1.10.1)– The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.

Pgi (17.3)– PGI® Workstation: PGI’s suite of compilers and toolsPGI Workstation™ is PGI’s single-user scientific and engineering compilers and tools product.

pycharm (2017.2.3)– PyCharm is an integrated development environment used in computer programming, specifically for the Python language.

python (2.7, 3)– Python is a remarkably powerful dynamic programming language that is used in a wide variety of application domains. Python is often compared to Tcl, Perl, Ruby, Scheme or Java.

sbt (1.0.1)– sbt is a build tool for Scala, Java, and more.

sge (2011.11p1)– The Sun Grid Engine queuing system is useful when you have a lot of tasks to execute and want to distribute the tasks over a cluster of machines.

Singularity (2.6.0)– Singularity is a container platform focused on supporting "Mobility of Compute". Mobility of Compute encapsulates the development to compute model where developers can work in an environment of their choosing and creation, and when the developer needs additional compute resources, this environment can easily be copied and executed on other platforms. Additionally, as the primary use case for Singularity is targeted towards computational portability. Many of the barriers to entry of other container solutions do not apply to Singularity, making it an ideal solution for users (both computational and non-computational) and HPC centers.

slurm (16.05.8)– The Slurm Workload Manager, or Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

tensorflow (1.3)– TensorFlow is an open-source machine learning library for research and production.

Torque (6.0.2)– The Terascale Open-source Resource and QUEue Manager (TORQUE) is a distributed resource manager providing control over batch jobs and distributed compute nodes

TSN (2017)– a novel framework for video-based action recognition. which is based on the idea of long-range temporal structure modeling. It combines a sparse temporal sampling strategy and video-level supervision to enable efficient and effective learning using the whole action video.

General Sciences

cloudy (c17)– Cloudy is a spectral synthesis code designed to simulate conditions in interstellar matter under a broad range of conditions.

gflow (1.7)– GFLOW is a highly efficient stepwise groundwater flow modeling system developed by Haitjema Software. It models steady state flow in a single heterogeneous aquifer using the Dupuit-Forchheimer assumption.

SNANA (10_60d)– SNANA contains a light curve fitter and simulation that can be applied to any supernova (SN) model and to any data set.

wrf (3.9.1)– WRF is a state-of-the-art atmospheric modeling system designed for both meteorological research and numerical weather prediction. It offers a host of options for atmospheric processes and can run on a variety of computing platforms.