QCatch
This web provides functionality for generating QC reports summarizing the output of alevin-fry (He et al., Nature Methods 19, 316–322 (2022)).
Summary
Number of retained cells: The number of valid and high quality cells that passed the cell calling step. This includes cells identified during the initial filtering and additional cells identified by the EmptyDrops step, whose expression profiles are significantly distinct from the ambient background.
Number of all processed cells: The total number of cell barcodes observed in the processed sample. Cells with zero reads have been excluded.
Mean reads per retained cell: The total number of reads, divided by the number of retained cells(filtered by cell calling steps).
Median UMI per retained cell: The median number of deduplicated reads (UMIs) per retained cell.
Median genes per retained cell: The median number of detected genes per retained cell.
Total genes detected for retained cells: the total number of unique genes detected acorss all retained cells.
Mapping rate: Fraction of reads that mapped to the reference, calculated as mapped reads / total processed reads.
Sequencing saturation: Sequencing saturation measures the proportion of reads coming from already-observed UMIs, calculated as 1 - (deduplicated reads / total reads). High saturation suggests limited gain from additional sequencing, while low saturation indicates that further sequencing could reveal more unique molecules (UMIs).
🦒 Knee Plots
The left plot shows the number of UMIs against cell rank (ordered by UMI count). This Knee plot can help identify low-quality cells with too few UMIs.
The right plot shows the number of detected genes against cell rank (ordered by UMI count).
Rank: cells are ranked by number of UMIs.
UMI: deduplicated read.
🔢 UMI Counts and Detected Gene Across Cell Barcodes
The barcode frequency is calculated as the number of reads associated with each cell barcode.
The first two plots show cell barcodes ranked by total read count, plotted against two key metrics: the number of UMIs and the number of detected genes per barcode.
The third plot illustrates how the number of detected genes increases with UMI count per cell.
🧽 UMI Deduplication Plot
The scatter plot compares the number of mapped reads and number of UMI for each retained cell. Each point represents a cell, with the x-axis showing mapped reads count and the y-axis showing deduplicated UMIs count. The reference line indicates the mean deduplication rate across all cells.
UMI Deduplication: UMI deduplication is the process of identifying and removing duplicate reads that arise from PCR amplification of the same original molecule.
Dedup Rate: The UMI count devided by number of mapped reads for each cell.
🧬 Distribution of Detected Gene Count and Mitochondrial Percentage Plot
The left plot depicts the distribution of detected gene counts.
The right plot shows the distribution of mitochondrial gene expression percentages across cells. Note: The “All Cells” plot does not display every processed cell. To improve visualization and reduce clutter from very low-quality cells, we excluded cells with fewer than 20 detected genes—these are typically considered nearly empty. In contrast, the “Retained Cells” plot includes all retained cells, without applying this gene count filter.
🧩 Bar plot for S/U/A counts and (S+A)/(U+S+A) Ratio Plot
When using “USA mode” in alevin-fry, spliced (S), unspliced (U), and ambiguous (A) read counts are generated separately for each gene in each cell.
In the bar plot, we first sum the spliced, unspliced, and ambiguous counts across all genes and all cells. The plot then displays the total number of reads in each splicing category: Spliced (S), Unspliced (U), and Ambiguous (A).
In the histogram, we calculate the splicing ratio for each cell as (S + A) / (S + U + A), where the counts are summed across all genes. The histogram shows the distribution of these per-cell splicing ratios.
🗺️ Clustering: UMAP and t-SNE
These plots are low-dimensional projections of high-dimensional gene expression data. Each point represents a single cell. Cells that appear close together in the plot are inferred to have similar transcriptomic profiles, indicating potential similarity in cell type or state.
Note: Only retained cells are included in these visualizations. All retained cells are shown without further filtering. Standard preprocessing steps were applied using `Scanpy`, including normalization, log transformation, feature selection, and dimensionality reduction.
📜 Quant Log Information
alt_resolved_cell_numbers: A list of global cell indices where an alternative resolution strategy was applied for large connected components. If this list is empty, no cells used the alternative resolution strategy.
cmd: The command line used for this af_quant process.
dump_eq: Indicates whether equivalence class (EQ class) information was dumped.
empty_resolved_cell_numbers: A list of global cell indices with no gene expression.
num_genes: The total number of genes. When usa_mode
is enabled, this count represents the sum of gene across three categories: unspliced(U), spliced(S), and ambiguous(A).
num_quantified_cells: The number of cells that were quantified.
resolution_strategy: The resolution strategy used for quantification.
usa_mode: Indicates that data was processed in Unspliced-Spliced-Ambiguous (USA) mode to classify each transcript’s splicing state.
version_str: The tool’s version number.
# | Category | Content |
---|
📝 Permit List Log Information
cmd: The command-line input provided by users for generating the permit list.
expected_ori: The expected alignment orientation for the sequencing chemistry being processed.
gpl_options: The actual command line executed for the 'generate permit list' process, including pre-filled settings.
max-ambig-record: The maximum number of reference sequences to which a read can be mapped.
permit-list-type: The type of permit list being used.
velo_mode: A placeholder parameter reserved for future integration with alevin-fry-Forseti; currently always set to false.
version_str: The version number of the tool.
# | Category | Content |
---|