Installation¶
Pre-requisites¶
Python¶
To install PacBio Data Processing, a Python interpreter is needed
in your system since the package is written in Python. The recommended
version of Python is 3.9
. Strictly speaking the code should work
with a less recent versions, but some dependencies will require anyway
Python-3.9
.
If you are using Linux, it is likely that Python is already present in your system. Check it out with:
$ python --version
or
$ python3 --version
You know that Python is in your system if you get as output something like (your mileage may vary):
Python 3.9.6
Installing Python
If you don’t have Python, or you have an old version, you can either use your system’s package manager to install a recent version of Python, or visit the official Python site, where there is a link to Download Python. Then install Python using the downloaded file.
If you download the sources (typically a file ending in .tgz
,
.tar.xz
or similar) the procedure is relatively simple:
untar the file. For instance:
tar xf Python-3.9.7.tgz
Enter in the created directory with the sources:
cd Python-3.9.7
Open the
README.rst
file and follow the instructions in its Build Instructions section. They schematically amount to:./configure make make install
but the
README.rst
file gives some useful hints.
In case you need/want to learn more about the installation process, you might be interested in reading this Python installation guide.
Other dependencies¶
PacBio Data Processing delegates some tasks to external tools. Therefore, the next is a list of external dependencies:
These dependencies are required to be present in your system in order to use some tools provided by PacBio Data Processing. You need to install them if they are absent in your system.
Virtual environment¶
It is optional but highly recommended to use a virtual environment
(or a variant thereof) to install PacBio Data Processing. In this
document we will use the standard library’s venv
module.
A virtual environment (or venv
for short) allows us to have
the required set of packages independently of the system-wide packages
installed. This has several advantages. First, it will help you produce an
isolated mess in case something goes wrong, but it also allows us to
decide the version of any package we are interested in. irrespective
of what other venv
’s need, or what the system needs.
A venv
can be created like follows:
$ python3.9 -m venv PDP-py39
this line will create a folder called PDP-py39
containing the venv
.
You can choose another name if you like.
After the installation one can activate the venv
to start using it with:
$ source PDP-py39/bin/activate
From that point on, the management of and access to Python packages
happens within the venv
. For example, installing a new package
will be done inside the venv
.
Afterwards you can proceed with the installation of PacBio Data Processing.
For more information on venv
’s, consult the documentation of that module
in the standard library venvs, and references therein.
Note
To stop using a venv
, type deactivate
in the same
terminal where the venv
was activated.
Installing the stable release of PacBio Data Processing¶
The latest stable release of PacBio Data Processing can be installed by executing this command in your terminal:
$ pip install pacbio-data-processing
If you don’t have pip installed, this Python installation guide can guide you through the process of installing pip.
Alternative: Installing PacBio Data Processing from a file¶
It is also possible to install PacBio Data Processing from a file: a tarball or a wheel.
You simply need the file and run pip on it. For instance, using as an example
a tarball corresponding to version 1.0.0
, it would be:
pip install PacbioDataProcessing-1.0.0.tar.gz
From a wheel it would be:
pip install PacbioDataProcessing-1.0.0-py3-none-any.whl
Alternative: Installing PacBio Data Processing from the repository¶
Warning
The instructions in this section are not necessary for end users. If you are simply interested in using PacBio Data Processing to analyze some BAM file or you need to use some functionality provided by PacBio Data Processing from within your code, you don’t necessarily need this section. But if you want to have access to the source code keep reading.
The sources for PacBio Data Processing can be downloaded from the GitLab repo.
You can either clone the public repository:
$ git clone git://gitlab.com/dvelazquez/pacbio-data-processing
and install it with:
pip install ./pacbio-data-processing
Or download the tarball:
$ curl -OJL https://gitlab.com/dvelazquez/pacbio-data-processing/-/archive/master/pacbio_data_processing-master.zip
and install it with:
$ pip install pacbio_data_processing-master.zip