SWAPHI-LS is the first parallel Smith-Waterman algorithm exploiting Intel Xeon Phi clusters to accelerate the alignment of long DNA sequences. This algorithm is written in C++ (with a set of SIMD intrinsic functions), OpenMP and MPI. The performance evaluation revealed that our algorithm achieves very stable performance, and yields a performance of up to 30.1 GCUPS on a single Xeon Phi and up to 111.4 GCUPS on four Xeon Phis sharing a host (compiled by Intel C++ compiler version 14.0.1 along with OpenMPI version 1.6.5). In addition, we developed its sister program SWAPHI for very large-scale protein sequence database search with multiple shared-host Xeon Phis supported.
- latest source code (v1.0.12)NEW
more details about the changes in this version are availabe at changelog.
- Genome sequences
Six genome sequences used in our paper "SWAPHI-LS: Smith-Waterman algorithm on Xeon Phi coprocessors for long DNA sequences". Users can download and unzip the sequences for use. The description of each genome sequence has been given in this link.
- Yongchao Liu, Tuan-Tu Tran, Felix Lauenroth, Bertil Schmidt: "SWAPHI-LS: Smith-Waterman algorithm on Xeon Phi coprocessors for long DNA sequences". 2014 IEEE International Conference on Cluster Computing, 2014, pp.257-265
- Yongchao Liu and Bertil Schmidt: "Pairwise DNA sequence alignment optimization". High Performance Parallelism Pearls Volume Two - Multicore and Many-core Programming Approaches, edited by James Reinders and Jim Jeffers, 2015, pp. 43-54
Other related papers
- Yongchao Liu, Bertil Schmidt, and Douglas L. Maskell: "MSA-CUDA: multiple sequence alignment on graphics processing units with CUDA". 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2009), 2009, 121-128
- Yongchao Liu, Douglas L. Maskell, Bertil Schmidt: "CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units". BMC Research Notes, 2009, 2:73
- Yongchao Liu, Bertil Schmidt, Douglas L. Maskell: "CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions". BMC Research Notes, 2010, 3:93
- Yongchao Liu, Adrianto Wirawan, Bertil Schmidt: "CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions". BMC Bioinformatics, 2013, 14:117.
- Yongchao Liu and Bertil Schmidt: "SWAPHI: Smith-Waterman protein database search on Xeon Phi coprocessors". 25th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2014), 2014, pp. 184-185.
- Yongchao Liu and Bertil Schmidt: "GSWABE: faster GPU-accelerated sequence alignment with optimal alignment retrieval for short DNA sequences". Concurrency and Computation: Practice and Experience, 2015, 27: 958-972
- Tuan Tu Tran, Yongchao Liu, Bertil Schmidt: "Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi". Parallel Computing, 2016, 54: 128-138
Two executable binaries will be generated after compiling and linking: swaphi-ls and mpi-swaphi-ls. The program swaphi-ls does not rely on MPI library and targets a single Xeon Phi. The progrom mpi-swaphi-ls must be compiled with an MPI library and is designed for Xeon Phi clusters.
- -i < str> (query DNA sequence file [REQUIRED])
- -j < str> (subject DNA sequence file [REQUIRED])
- -k < int>(place the longer sequence horizontally, default = 1)
- -m < int> (match score, default = 1)
- -M < int> (mismatch penalty, default = 3)
- -g < int> (gap opening penalty, default = 5)
- -e < int> (gap extension penalty, default = 2)
- -a < int> (the vertical tile size within each device, default = 16 [NO need to change])
- -b < int> (the horizonal tile size within each device, default = 0 [0 means auto])
- -c < int> (enable the tiling, default = 1)
only applicable for the non-MPI-based version.
- -C < int> ((the vertical block size between devices, default = 131072)
only applicable for the MPI-based version.
- -t < int> (number of threads per Xeon Phi, deafult = 0 [0 means auto])
- -x < int>(index of the Xeon Phi used, default = 0)
only applicable for the non-MPI-based version.
- -n < int>(simulate #int characters, deafult = 0 [SPEED TEST])
- Intel C/C++ compiler or any other C/C++ compiler that supports Xeon Phi coprocessors.
- A C/C++ MPI library (e.g. OpenMPI, MPICH, Intel MPI) that is compiled by the aforementioned C/C++ compiler.
Before compiling, please modify the corresponding Makefile to point to the correct compilers and libraries.
- To compile the non-MPI-based version, please type command "make -f Makefile.phi".
- To compile the MPI-based version, please type command "make -f Makefile.mphi".
- To compile both versions, please type command "make".
Non-MPI-based program swaphi-ls
- export KMP_AFFINITY=balanced; swaphi-ls -i seq1.fa -j seq2.fa
- export KMP_AFFINITY=balanced; swaphi-ls -i seq1.fa.gz -j seq2.fa.gz -x 2
MPI-based program mpi-swaphi-ls
- export KMP_AFFINITY=balanced; mpirun -np 4 mpi-swaphi-ls -i seq1.fa -j seq2.fa
- export KMP_AFFINITY=balanced; mpirun -hostfile host.file -np 4 mpi-swaphi-ls -i seq1.fa.gz -j seq2.fa.gz
Configure hostfile for MPI-based program
when running on a Xeon Phi cluster, you must make sure that the number of MPI processes running on a node must not be more than the number of available Xeon Phis. This constraint can be ensured using a host file.
- An example of MPICH host file is as follows, where each node contains two Xeon Phis:
- An example of OpenMPI host file is as follows, where each node contains two Xeon Phis:
- May 19, 2015 (v1.0.12)
- By default, we have removed the dependence on ZLIB. Instead, we allow users to configure whether to support gziped input or not. This can be done by enabling the macro "COMPRESSED_INPUT" in the Makefile.
- We have added a LICENSE file in the source code tarball
- There is no change in the code.
If any questions or improvements, please feel free to contact Liu, Yongchao.