This vignette demonstrates a data analysis application using the
DPComb package. We consider a gene-based genetic
association study, where the primary goal is to assess whether each
gene—represented by a group of genetic markers—is associated with
disease status.
In this example, we use the built-in dataset
case_control, which consists of 2000 subjects (1000 cases
and 1000 controls) measured on 15 genetic markers. The markers are
divided into two groups: markers 1–5 (gene1) and markers 6–15
(gene2).
The analysis proceeds as follows:
Data Loading and Grouping:
Load the built-in dataset and group markers by gene.
Combination Testing:
For each gene, perform a combination test using one of the following
methods (detailed formulas can be found in the documentation for
DPComb_tests):
Individual Marker Analysis:
Each marker’s association with disease status is initially assessed
using Fisher’s exact test (via the fisher.test function).
This produces discrete p-values, which can be two-sided, right-sided, or
left-sided.
Null Distribution:
Under the null hypothesis that none of the markers in the gene group are
associated, mutations are assumed to be randomly distributed between
cases and controls. Consequently, for each marker, the distribution of
mutation count in the case group follows the hypergeometric
distribution, with parameters determined by the number of cases, the
number of controls, and the total mutation count of the marker.
Therefore, we input a list of such distribution descriptions. Here, the
input p-values are assumed to be independent (i.e., markers are
independent).
P-Value Aggregation:
The discrete p-values are then combined using one of the methods listed
above. The output testing p-value indicates the evidence of association
between the gene (i.e., the corresponding group of markers) and disease
status.
An equivalent and often more convenient input for the
DPComb_tests function is:
xs), andfisher.test.Either data input — using discrete p-values or the mutation counts — yields equivalent combination statistics and testing p-values.
library(DPComb)
# Load built-in data
#data(case_control)
data("case_control", package = "DPComb")
# Group markers: gene1 contains marker1 to marker5; gene2 contains marker6 to marker15.
markers_gene1 <- paste0("marker", 1:5)
markers_gene2 <- paste0("marker", 6:15)
#' Function: perform_comb_test
#'
#' This function conducts a combination test on a specified set of genetic markers in
#' a case-control study. It computes the number of mutations in cases (xs) and
#' performs two tests:
#' 1. X-value based test using a hypergeometric model with parameters constructed
#' from the data.
#' 2. Fisher's exact test based combination, where p-values derived from Fisher's exact
#' test are combined.
#'
#' @param markers A character vector of marker names.
#' @param method (Optional) A string specifying the combination method. Default is
#' "fisher_mean". Supported methods include "fisher_mean", "fisher_median",
#' "pearson", "edgington", "stouffer", and "george".
#' @param side (Optional) A string indicating the tail option for converting x values to
#' p-values. Default is "two". Options are "two" (two-tailed), "left", or "right".
#' Ensure consistency with your test specifications.
#'
#' @return A list containing two elements:
#' xs_test: The result from the combination test using mutation counts (xs).
#' ps_test: The result from the combination test using Fisher's p-values.
#'
perform_comb_test <- function(markers, Data, method = "fisher_mean", side = "two") {
##Method 1: Based on mutation counts and null distribution descriptions.
# Extract disease status
disease_status <- Data$disease_status
# Calculate xs: number of mutations (1's) in cases for each marker
xs <- sapply(markers, function(m) sum(Data[disease_status == 1, m]))
# Total cases and controls
total_cases <- sum(disease_status == 1)
total_controls <- sum(disease_status == 0)
# Construct x_distn_params for hyper distribution for each marker:
# m = number of cases, n = number of controls, k = total mutations in the marker.
x_distn_params <- lapply(markers, function(m) {
k <- sum(Data[, m])
list(distn = "hyper", m = total_cases, n = total_controls, k = k)
})
# Combination test using xs and x_distn_params:
res_xs <- DPComb_tests(xs = xs, side = side, x_distn_params = x_distn_params, method = method)
##Method 2: Based on Fisher's exact test p-values and x_distn_params.
# Compute p-values from Fisher's exact test for each marker.
# define alternative as "two.sided", "greater", or "less" correspond to side
# being "two", "right", or "left".
alternative <- switch(side, "two" = "two.sided", "right" = "greater", "left" = "less")
ps <- sapply(markers, function(m) {
tbl <- table(Data$disease_status, Data[, m])
fisher.test(tbl, alternative = alternative)$p.value
})
ps[ps > 1] <- 1 # Ensure p-values are within [0, 1].
#Results from Fisher's test may exceed 1 slightly.
res_ps <- DPComb_tests(ps = ps, side = side, x_distn_params = x_distn_params, method = method)
list(xs_test = res_xs, ps_test = res_ps)
}
# Perform tests for gene1 and gene2.
# Use default method "fisher_mean" and side "two".
result_gene1 <- perform_comb_test(markers_gene1, Data=case_control)
result_gene2 <- perform_comb_test(markers_gene2, Data=case_control)
# Print the results.
cat("Gene 1 Combination Test Results (using xs):\n")
## Gene 1 Combination Test Results (using xs):
print(result_gene1$xs_test)
## $Sn
## [1] 18.99809
##
## $pval
## [1] 0.03697353
cat("\nGene 1 Combination Test Results (using Fisher's p-values):\n")
##
## Gene 1 Combination Test Results (using Fisher's p-values):
print(result_gene1$ps_test)
## $Sn
## [1] 18.99809
##
## $pval
## [1] 0.03697353
cat("\nGene 2 Combination Test Results (using xs):\n")
##
## Gene 2 Combination Test Results (using xs):
print(result_gene2$xs_test)
## $Sn
## [1] 22.26252
##
## $pval
## [1] 0.323226
cat("\nGene 2 Combination Test Results (using Fisher's p-values):\n")
##
## Gene 2 Combination Test Results (using Fisher's p-values):
print(result_gene2$ps_test)
## $Sn
## [1] 22.26252
##
## $pval
## [1] 0.323226
## === Test all methods and sides ===
methods <- c("fisher_mean", "fisher_median", "pearson", "edgington", "stouffer", "george")
sides <- c("two", "right", "left")
# Initialize a list to store results
results <- list()
# Perform tests for each `method` and `side` option, and store the results
for (method in methods) {
for (side in sides) {
result_gene1 <- perform_comb_test(markers_gene1, Data=case_control, method = method, side = side)
result_gene2 <- perform_comb_test(markers_gene2, Data=case_control, method = method, side = side)
# Store results in the list
results[[paste(method, side, sep = "_")]] <- list(
gene1_xs = result_gene1$xs_test,
gene1_ps = result_gene1$ps_test,
gene2_xs = result_gene2$xs_test,
gene2_ps = result_gene2$ps_test
)
}
}
# Convert results to data frames for better visualization
results_gene1_df <- do.call(rbind, lapply(names(results), function(name) {
res <- results[[name]]
data.frame(
Method_Side = name,
XS_Sn = round(res$gene1_xs$Sn, 2),
XS_pval = round(res$gene1_xs$pval, 4),
PS_Sn = round(res$gene1_ps$Sn, 2),
PS_pval = round(res$gene1_ps$pval, 4)
)
}))
results_gene2_df <- do.call(rbind, lapply(names(results), function(name) {
res <- results[[name]]
data.frame(
Method_Side = name,
XS_Sn = round(res$gene2_xs$Sn, 2),
XS_pval = round(res$gene2_xs$pval, 4),
PS_Sn = round(res$gene2_ps$Sn, 2),
PS_pval = round(res$gene2_ps$pval, 4)
)
}))
# Print the results in table format
knitr::kable(results_gene1_df, caption = "Combination Test Results for Gene 1")
| Method_Side | XS_Sn | XS_pval | PS_Sn | PS_pval |
|---|---|---|---|---|
| fisher_mean_two | 19.00 | 0.0370 | 19.00 | 0.0370 |
| fisher_mean_right | 25.93 | 0.0034 | 25.93 | 0.0034 |
| fisher_mean_left | 0.84 | 0.9999 | 0.84 | 0.9999 |
| fisher_median_two | 18.59 | 0.0367 | 18.59 | 0.0367 |
| fisher_median_right | 25.52 | 0.0034 | 25.52 | 0.0034 |
| fisher_median_left | 0.84 | 0.9999 | 0.84 | 0.9999 |
| pearson_two | 1.77 | 0.0003 | 1.77 | 0.0003 |
| pearson_right | 0.84 | 0.0001 | 0.84 | 0.0001 |
| pearson_left | 25.93 | 0.9966 | 25.93 | 0.9966 |
| edgington_two | 0.80 | 0.0030 | 0.80 | 0.0030 |
| edgington_right | 0.40 | 0.0005 | 0.40 | 0.0005 |
| edgington_left | 4.60 | 0.9995 | 4.60 | 0.9995 |
| stouffer_two | -5.11 | 0.0075 | -5.11 | 0.0075 |
| stouffer_right | -7.16 | 0.0006 | -7.16 | 0.0006 |
| stouffer_left | 7.16 | 0.9994 | 7.16 | 0.9994 |
| george_two | -8.61 | 0.0111 | -8.61 | 0.0111 |
| george_right | -12.54 | 0.0009 | -12.54 | 0.0009 |
| george_left | 12.54 | 0.9991 | 12.54 | 0.9991 |
knitr::kable(results_gene2_df, caption = "Combination Test Results for Gene 2")
| Method_Side | XS_Sn | XS_pval | PS_Sn | PS_pval |
|---|---|---|---|---|
| fisher_mean_two | 22.26 | 0.3232 | 22.26 | 0.3232 |
| fisher_mean_right | 31.20 | 0.0496 | 31.20 | 0.0496 |
| fisher_mean_left | 9.72 | 0.9756 | 9.72 | 0.9756 |
| fisher_median_two | 21.58 | 0.3216 | 21.58 | 0.3216 |
| fisher_median_right | 30.63 | 0.0487 | 30.63 | 0.0487 |
| fisher_median_left | 9.57 | 0.9759 | 9.57 | 0.9759 |
| pearson_two | 13.96 | 0.1079 | 13.96 | 0.1079 |
| pearson_right | 9.72 | 0.0244 | 9.72 | 0.0244 |
| pearson_left | 31.20 | 0.9504 | 31.20 | 0.9504 |
| edgington_two | 4.05 | 0.1347 | 4.05 | 0.1347 |
| edgington_right | 3.08 | 0.0160 | 3.08 | 0.0160 |
| edgington_left | 6.92 | 0.9840 | 6.92 | 0.9840 |
| stouffer_two | -2.57 | 0.1899 | -2.57 | 0.1899 |
| stouffer_right | -6.23 | 0.0227 | -6.23 | 0.0227 |
| stouffer_left | 6.23 | 0.9773 | 6.23 | 0.9773 |
| george_two | -4.15 | 0.2145 | -4.15 | 0.2145 |
| george_right | -10.74 | 0.0284 | -10.74 | 0.0284 |
| george_left | 10.74 | 0.9716 | 10.74 | 0.9716 |
In this section, we demonstrate the analysis of gene1 (markers 1–5)
using the function test_case_control_fisher from the DPComb
package. This function performs Fisher’s exact test on each marker and
combines the p-values. It is an implementation of the above analysis to
conveniently analyze such case-control data.
library(DPComb)
data("case_control", package = "DPComb")
# Analyze gene1 using test_case_control_fisher
test_case_control_fisher(Data = case_control, response = "disease_status",
covariates = paste0("marker", 1:5),
method = "fisher_mean", alternative = "two.sided")
## $Sn
## [1] 18.99809
##
## $pval
## [1] 0.03697353