Author: | Oliver Beckstein, Joshua Adelman |
---|---|
Year: | 2010–2013 |
Copyright: | GNU Public License v3 |
The module contains functions to fit a target structure to a reference structure. They use the fast QCP algorithm to calculate the root mean square distance (RMSD) between two coordinate sets [Theobald2005] and the rotation matrix R that minimizes the RMSD [Liu2010]. (Please cite these references when using this module.).
Typically, one selects a group of atoms (such as the C-alphas), calculates the RMSD and transformation matrix, and applys the transformation to the current frame of a trajectory to obtain the rotated structure. The alignto() and rms_fit_trj() functions can be used to do this for individual frames and trajectories respectively.
The RMS-fitting tutorial shows how to do the individual steps manually and explains the intermediate steps.
See also
The example uses files provided as part of the MDAnalysis test suite (in the variables PSF, DCD, and PDB_small). For all further examples execute first
>>> from MDAnalysis import *
>>> from MDAnalysis.analysis.align import *
>>> from MDAnalysis.tests.datafiles import PSF, DCD, PDB_small
In the simplest case, we can simply calculate the C-alpha RMSD between two structures, using rmsd():
>>> ref = Universe(PDB_small)
>>> mobile = Universe(PSF,DCD)
>>> rmsd(mobile.atoms.CA.coordinates(), ref.atoms.CA.coordinates())
18.858259026820352
Note that in this example translations have not been removed. In order to look at the pure rotation one needs to superimpose the centres of mass (or geometry) first:
>>> ref0 = ref.atoms.CA.coordinates() - ref.atoms.CA.centerOfMass()
>>> mobile0 = mobile.atoms.CA.coordinates() - mobile.atoms.CA.centerOfMass()
>>> rmsd(mobile0, ref0)
6.8093965864717951
The rotation matrix that superimposes mobile on ref while minimizing the CA-RMSD is obtained with the rotation_matrix() function
>>> R, rmsd = rotation_matrix(mobile0, ref0)
>>> print rmsd
6.8093965864717951
>>> print R
[[ 0.14514539 -0.27259113 0.95111876]
[ 0.88652593 0.46267112 -0.00268642]
[-0.43932289 0.84358136 0.30881368]]
Putting all this together one can superimpose all of mobile onto ref:
>>> mobile.atoms.translate(-mobile.atoms.CA.centerOfMass())
>>> mobile.atoms.rotate(R)
>>> mobile.atoms.translate(ref.atoms.CA.centerOfMass())
>>> mobile.atoms.write("mobile_on_ref.pdb")
To fit a single structure with alignto():
>>> ref = Universe(PSF, PDB_small)
>>> mobile = Universe(PSF, DCD) # we use the first frame
>>> alignto(mobile, ref, select="protein and name CA", mass_weighted=True)
This will change all coordinates in mobile so that the protein C-alpha atoms are optimally superimposed (translation and rotation).
To fit a whole trajectory to a reference structure with the rms_fit_trj() function:
>>> ref = Universe(PSF, PDB_small) # reference structure 1AKE
>>> trj = Universe(PSF, DCD) # trajectory of change 1AKE->4AKE
>>> rms_fit_trj(trj, ref, filename='rmsfit.dcd')
It is also possible to align two arbitrary structures by providing a mapping between atoms based on a sequence alignment. This allows fitting of structural homologs or wild type and mutant.
If a alignment was provided as “sequences.aln” one would first produce the appropriate MDAnalysis selections with the fasta2select() function and then feed the resulting dictionary to rms_fit_trj():
>>> seldict = fasta2select('sequences.aln')
>>> rms_fit_trj(trj, ref, filename='rmsfit.dcd', select=seldict)
(See the documentation of the functions for this advanced usage.)
Returns RMSD between two coordinate sets a and b.
a and b are arrays of the coordinates of N atoms of shape N*3 as generated by, e.g., MDAnalysis.core.AtomGroup.AtomGroup.coordinates().
An implicit optimal superposition is performed, which minimizes the RMSD between a and b although both a and b must be centered on the origin before performing the RMSD calculation so that translations are removed.
One can use the center = True keyword, which subtracts the center of geometry (for weights = None) before the superposition. With weights, a weighted average is computed as the center.
The weights can be an array of length N, containing e.g. masses for a weighted RMSD calculation.
The function uses Douglas Theobald’s fast QCP algorithm [Theobald2005] to calculate the RMSD.
>>> u = Universe(PSF,DCD)
>>> bb = u.selectAtoms('backbone')
>>> A = bb.coordinates() # coordinates of first frame
>>> u.trajectory[-1] # forward to last frame
>>> B = bb.coordinates() # coordinates of last frame
>>> rmsd(A,B)
6.8342494129169804
Spatially align mobile to reference by doing a RMSD fit on select atoms.
The superposition is done in the following way:
The mobile and reference atom groups can be constructed so that they already match atom by atom. In this case, select should be set to “all” (or None) so that no further selections are applied to mobile and reference, therefore preserving the exact atom ordering (see Ordered selections).
Warning
The atom order for mobile and reference is only preserved when select is either “all” or None. In any other case, a new selection will be made that will sort the resulting AtomGroup by index and therefore destroy the correspondence between the two groups. It is safest not to mix ordered AtomGroups with selection strings.
Arguments : |
|
---|---|
Returns : | RMSD before and after spatial alignment. |
See also
For RMSD-fitting trajectories it is more efficient to use rms_fit_trj().
Changed in version 0.8: Added check that the two groups describe the same atoms including the new tol_mass keyword.
RMS-fit trajectory to a reference structure using a selection.
Both reference ref and trajectory traj must be MDAnalysis.Universe instances. If they contain a trajectory then it is used. The output file format is determined by the file extension of filename. One can also use the same universe if one wants to fit to the current frame.
Arguments : |
|
---|---|
Returns : | filename (either provided or auto-generated) |
Changed in version 0.8: Added kwargs to be passed to the trajectory Writer and filename is returned.
Returns the 3x3 rotation matrix for RMSD fitting coordinate sets a and b.
The rotation matrix R transforms a to overlap with b (i.e. b is the reference structure):
b = R . a
Arguments : |
|
---|---|
Returns : | (R, rmsd) rmsd and rotation matrix R |
R can be used as an argument for MDAnalysis.core.AtomGroup.AtomGroup.rotate() to generate a rotated selection, e.g.
>>> R = rotation_matrix(A.selectAtoms('backbone').coordinates(), B.selectAtoms('backbone').coordinates())
>>> A.atoms.rotate(R)
>>> A.atoms.write("rotated.pdb")
Note that the function does not shift the centers of mass or geometry; this needs to be done by the user.
See also
rmsd() calculates the RMSD between a and b; for fitting a whole trajectory it is more efficient to use rms_fit_trj(). A complete fit of two structures can be done with alignto().
The following functions are used by the other functions in this module. They are probably of more interest to developers than to normal users.
Return selection strings that will select equivalent residues.
The function aligns two sequences provided in a FASTA file and constructs MDAnalysis selection strings of the common atoms. When these two strings are applied to the two different proteins they will generate AtomGroups of the aligned residues.
fastafilename contains the two un-aligned sequences in FASTA format. The reference is assumed to be the first sequence, the target the second. ClustalW produces a pairwise alignment (which is written to a file with suffix .aln). The output contains atom selection strings that select the same atoms in the two structures.
Unless ref_offset and/or target_offset are specified, the resids in the structure are assumed to correspond to the positions in the un-aligned sequence, namely the first residue has resid == 1.
In more complicated cases (e.g. when the resid numbering in the structure/psf has gaps due to missing parts), simply provide the sequence of resids as they appear in the psf in ref_resids or target_resids, e.g.
target_resids = [a.resid for a in trj.selectAtoms('name CA')]
(This translation table is combined with any value for xxx_offset!)
Arguments : |
|
---|---|
Returns : |
|
Test if the two AtomGroup ag1 and ag2 consist of the same atoms.
Two tests are performed:
Arguments : |
|
---|---|
Keywords : |
|
Raises : | SelectionError if any of the tests fails. |
New in version 0.8.