{% extends "sharedTemplates/base.html" %} {% block title %}PCA{% endblock %} {% block body %}
Principal component analysis (PCA) is a technique that reduces high-dimensional data into a few summary dimensions (called principal components) that capture the greatest variance in your measurements. We standardize each feature (so they all have mean 0 and unit variance), compute the covariance matrix, find its eigenvectors (the directions of maximal variance), and project the data onto the top two eigenvectors to get PC1 and PC2.
On this page, PCA is performed on an allele basis—meaning each individual allele count across all targets is treated as a separate feature. Because alleles vary more finely than broader categories like subfamilies, this allele-level PCA tends to spread samples farther apart in the plot. If you instead grouped by subfamily (combining several alleles into one feature), the points would generally cluster more tightly, showing less separation between samples.
Hover over any point to see its Sample name and Population (if available). Use the dropdown at the top to filter by Region.