Quickstart on a cluster¶
This document describes briefly how to install and use the PacBio Data Processing software on a cluster. If you need more details, please consult the references.
Starting with a PacBio sequencing file (bam file) and a reference sequence (fasta file) you can generate a dataframe (csv file) with columns containing properties for each molecule that overcame good quality filters.
Additional to this, a summary report is generated containing information related to the input and output files for each process.
Open a cluster access account (see Using PacBio Data Processing on a cluster).
Open a terminal and login to access to the cluster (see Using PacBio Data Processing on a cluster).
Install python 3.9 in the cluster (see the Installation document).
Create a virtual environment (see the Installation document).
Install the external dependences pbindex, blasr and ccs (see the Using PacBio Data Processing on a cluster document).
Install PacBio Data Processing (see the Installation document).
Transfer the input files to the cluster. Assuming you want to process a file called
pbsequencing.bamand your reference is stored in a file calledreference.fasta(with its companion indexreference.fasta.fai), run the following command in a terminal:scp pbsequencing.bam reference.fasta{,.fai} velazquez@goethe.hhlr-gu.de:/home/fuchs/darmstadt/velazquezthe path will change depending on the name on your account, and the wanted destination directory.
Running a Job (see Using PacBio Data Processing on a cluster).
Transfer the output files to your personal computer:
scp velazquez@goethe.hhlr-gu.de:/home/fuchs/darmstadt/velazquez/[file to transfer] .where the trailing
.(dot) can be replaced by any other local path, of course. The special case of.means current working directory.Or you can synchronize the remote location with your current working directory like:
rsync -av velazquez@goethe.hhlr-gu.de:/home/fuchs/darmstadt/velazquez/ ./