---
title: "climatestatsr: A Comprehensive Guide to Statistical Tools for Climate Change Analysis"
author: "Sadikul Islam"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 4
    number_sections: true
    fig_width: 7
    fig_height: 4.5
vignette: >
  %\VignetteIndexEntry{climatestatsr: A Comprehensive Guide}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.align = "center",
  warning   = FALSE,
  message   = FALSE
)
set.seed(2024)
library(climatestatsr)
```

---

# Introduction

The **climatestatsr** package provides a self-contained, dependency-light
collection of statistical functions for climate change research. All methods
operate on standard R vectors, matrices, and data frames and require only
base-R `stats`, `graphics`, `grDevices`, and `utils`.

## Package structure

| Family | Functions |
|---|---|
| Temporal analysis | `mk_test`, `sens_slope`, `change_point_detection`, `seasonal_decompose_climate`, `rolling_trend`, `temporal_homogeneity`, `trend_significance`, `autocorrelation_climate` |
| Spatial analysis | `morans_i`, `hot_cold_spots`, `spatial_interpolate`, `spatial_trend_field`, `cluster_climate_zones`, `spatial_anomaly`, `elevation_lapse_rate` |
| Extreme events | `fit_gev`, `rgev_sim`, `return_period`, `peaks_over_threshold`, `heat_wave_detection`, `cold_spell_detection`, `drought_spell`, `extreme_value_index` |
| Climate indices | `spi`, `spei`, `pdsi_simple`, `heat_index`, `wind_chill`, `frost_days`, `growing_degree_days`, `diurnal_temp_range` |
| Detection & attribution | `detection_attribution`, `fingerprint_analysis`, `optimal_fingerprint` |
| Utilities | `fill_gaps_climate`, `homogenize_series`, `aggregate_climate`, `anomaly_baseline`, `standardize_climate`, `climate_summary` |

## Installation

```r
install.packages("climatestatsr")
```

---

# Temporal Analysis

Temporal analysis functions test for and quantify monotonic trends, abrupt
shifts, and seasonal structure in climate time series.

## Mann-Kendall Trend Test — `mk_test()`

The **Mann-Kendall test** (Mann 1945; Kendall 1975) is the standard
non-parametric method for detecting monotonic trends in hydro-climatic
records. It is rank-based and therefore robust to non-normality and
outliers. The optional pre-whitening step (Yue and Wang 2002) removes
AR(1) autocorrelation before computing the test statistic.

```{r mk_basic}
# Simulate 50 years of warming at 0.025 °C/year
years <- 1971:2020
temp  <- 14.0 + 0.025 * seq_len(50) + stats::rnorm(50, 0, 0.4)

result <- mk_test(temp)
print(result)
```

```{r mk_plot, fig.cap="Mann-Kendall result: time series with Sen slope (left) and value distribution (right)."}
plot(result)
```

**Interpreting the output:**

- `Z` — standardised statistic; |Z| > 1.96 indicates significance at α = 0.05
- `tau` — Kendall's rank correlation; ranges from −1 (monotone decrease) to +1 (monotone increase)  
- `p.value` — two-tailed probability under H₀ of no trend
- `trend` — plain-language conclusion

```{r mk_prewhiten}
# Apply AR(1) pre-whitening for autocorrelated series
ar_temp <- as.numeric(stats::arima.sim(list(ar = 0.65), n = 60)) +
           seq(0, 3, length.out = 60) + 14
mk_pw <- mk_test(ar_temp, prewhiten = TRUE)
cat("Pre-whitened MK: Z =", round(mk_pw$statistic, 3),
    "  p =", round(mk_pw$p.value, 4), "\n")
```

## Sen's Slope Estimator — `sens_slope()`

**Sen's slope** (Sen 1968) is the median of all pairwise slopes between
observations. It provides a robust estimate of trend magnitude unaffected
by outliers.

```{r sens}
ss <- sens_slope(years, temp)
cat(sprintf(
  "Sen's slope : %+.4f °C/year\n",  ss$slope))
cat(sprintf(
  "Rate/decade : %+.3f °C\n",       ss$slope_decade))
cat(sprintf(
  "95%% CI      : [%.4f, %.4f]\n",  ss$slope_ci["lower"],
                                    ss$slope_ci["upper"]))
```

```{r sens_plot, fig.cap="Observed temperature with Sen slope (red dashed) and 95% CI band."}
plot(years, temp, pch = 16, cex = 0.7, col = "steelblue",
     xlab = "Year", ylab = "Temperature (°C)",
     main = "Annual Mean Temperature with Sen's Slope")
abline(a = ss$intercept, b = ss$slope,
       col = "firebrick", lwd = 2, lty = 2)
legend("topleft", legend = c("Observed", "Sen slope"),
       col = c("steelblue", "firebrick"),
       pch = c(16, NA), lty = c(NA, 2), lwd = c(NA, 2), bty = "n")
```

## Change-Point Detection — `change_point_detection()`

Two methods are available:

- **Pettitt's test** (Pettitt 1979) — rank-based, detects a single mean shift.
- **CUSUM** — cumulative-sum method with bootstrap p-value.

```{r cp}
# Simulate a step change at observation 30
x <- c(stats::rnorm(30, mean = 14.0, sd = 0.5),
        stats::rnorm(30, mean = 15.5, sd = 0.5))

cp <- change_point_detection(x, method = "pettitt")
cat(sprintf("Change point at index : %d\n", cp$change_point))
cat(sprintf("Mean before shift     : %.2f °C\n", cp$mean_before))
cat(sprintf("Mean after shift      : %.2f °C\n", cp$mean_after))
cat(sprintf("Significant (α=0.05)  : %s\n", cp$significant))
```

```{r cp_plot, fig.cap="CUSUM series — the peak identifies the most probable break point."}
cp_cusum <- change_point_detection(x, method = "cusum")
plot(cp_cusum$cusum, type = "l", col = "steelblue", lwd = 2,
     xlab = "Index", ylab = "CUSUM",
     main = "CUSUM Change-Point Detection")
abline(v   = cp_cusum$change_point, col = "firebrick",
       lty = 2, lwd = 2)
abline(h   = 0, lty = 3, col = "gray50")
legend("topleft",
       legend = c("CUSUM", sprintf("Break at %d", cp_cusum$change_point)),
       col    = c("steelblue", "firebrick"),
       lty    = c(1, 2), lwd = 2, bty = "n")
```

## Seasonal Decomposition — `seasonal_decompose_climate()`

Decomposes monthly or quarterly series into **trend**, **seasonal**, and
**remainder** components using STL (Loess-based) or classical additive
decomposition.

```{r decomp_data}
n     <- 360   # 30 years of monthly data
t_idx <- seq_len(n)
temp_m <- 15 + 0.003 * t_idx +
          8   * sin(2 * pi * t_idx / 12) +
          stats::rnorm(n, 0, 0.5)
```

```{r decomp_plot, fig.height=6, fig.cap="STL decomposition: original, trend, seasonal component, and remainder."}
dc <- seasonal_decompose_climate(temp_m, frequency = 12, method = "stl")
plot(dc)
```

The extracted trend can itself be tested for significance:

```{r decomp_trend}
trend_mk <- mk_test(dc$trend[!is.na(dc$trend)])
cat("Trend component MK test: tau =", round(trend_mk$tau, 3),
    " p =", round(trend_mk$p.value, 4), "\n")
```

## Rolling Trend Analysis — `rolling_trend()`

Applies Sen's slope over a moving window to reveal **periods of
accelerating or decelerating warming**.

```{r rolling, fig.cap="20-year rolling Sen slope: warming accelerated after index ~40."}
temp2  <- 13.5 + c(0.01 * seq_len(50), 0.04 * seq_len(41)) +
          stats::rnorm(91, 0, 0.4)
rt <- rolling_trend(temp2, window = 20, step = 2)

plot(rt$mid_index, rt$slope_decade, type = "l",
     col = "steelblue", lwd = 2,
     xlab = "Mid-window index",
     ylab = "Trend (°C per decade)",
     main = "Rolling 20-Year Sen Slope")
abline(h = 0, lty = 2, col = "gray50")
polygon(c(rt$mid_index, rev(rt$mid_index)),
        c(rt$slope_decade * 1.15, rev(rt$slope_decade * 0.85)),
        col = adjustcolor("steelblue", 0.15), border = NA)
```

## SNHT Homogeneity Test — `temporal_homogeneity()`

The **Standard Normal Homogeneity Test** (Alexandersson 1986) detects
inhomogeneities caused by station moves, instrument changes, or observer
changes.

```{r snht}
# Inhomogeneous series: +0.8 °C offset after observation 40
x_inh <- c(stats::rnorm(40, 0, 1), stats::rnorm(40, 0.8, 1))
snht  <- temporal_homogeneity(x_inh)
cat(sprintf("T0 statistic  : %.2f  (critical ≈ %.2f)\n",
            snht$T0, snht$critical))
cat(sprintf("Break at index: %d\n", snht$break_index))
cat(sprintf("Significant   : %s\n", snht$significant))
```

```{r snht_plot, fig.cap="SNHT statistic series — peak marks the inhomogeneity break."}
plot(snht$T_series, type = "l", col = "steelblue", lwd = 1.5,
     xlab = "Index", ylab = "T statistic",
     main = "SNHT — Homogeneity Test")
abline(v = snht$break_index, col = "firebrick", lty = 2, lwd = 2)
abline(h = snht$critical,    col = "orange",    lty = 3, lwd = 1.5)
legend("topright",
       legend = c("T statistic", "Break point", "Critical value"),
       col    = c("steelblue", "firebrick", "orange"),
       lty    = c(1, 2, 3), lwd = 2, bty = "n")
```

## Multiple-Station Trend Significance — `trend_significance()`

Runs Mann-Kendall on every column of a matrix and applies **FDR** or
**Bonferroni** correction for simultaneous testing.

```{r trendsig}
mat <- matrix(stats::rnorm(50 * 30), nrow = 50)
# impose trend in 10 stations
mat[, 1:10] <- mat[, 1:10] + outer(seq_len(50), rep(0.05, 10))

ts_res <- trend_significance(mat, correction = "fdr")
cat("Stations with significant trend:\n")
print(table(ts_res$trend))
```

## Autocorrelation Analysis — `autocorrelation_climate()`

Reports ACF, PACF and the Ljung-Box test for residual autocorrelation —
important before applying parametric models.

```{r acf, fig.cap="ACF and PACF of an AR(1) climate series (ρ ≈ 0.7)."}
ar_series <- as.numeric(stats::arima.sim(list(ar = 0.7), n = 100)) + 15
ac <- autocorrelation_climate(ar_series, max.lag = 20, plot = TRUE)
cat("AR(1) sample coefficient:", round(ac$ar1, 3), "\n")
cat("Ljung-Box p-value       :", round(ac$ljung_box$p.value, 4), "\n")
```

---

# Spatial Analysis

## Moran's I — `morans_i()`

**Global Moran's I** (Moran 1950) measures whether similar values cluster
spatially (I > 0) or disperse (I < 0).

```{r moran}
set.seed(42)
n      <- 50
coords <- data.frame(lon = stats::runif(n, -10, 10),
                     lat = stats::runif(n, 40, 60))
# Temperature decreases with latitude → positive autocorrelation
x_sp <- 26 - 0.35 * coords$lat + stats::rnorm(n, 0, 0.8)

mi <- morans_i(x_sp, coords, n_perm = 499)
cat(sprintf("Moran's I = %.4f\n", mi$I))
cat(sprintf("Z-score   = %.3f  (p = %.4f)\n", mi$Z, mi$p.value))
cat(mi$interpretation, "\n")
```

```{r moran_plot, fig.cap="Scatter plot of spatial values coloured by latitude gradient."}
col_ramp <- colorRampPalette(c("steelblue", "white", "firebrick"))(50)
val_rank <- rank(x_sp)
plot(coords$lon, coords$lat,
     pch = 21, cex = 1.5,
     bg  = col_ramp[val_rank],
     xlab = "Longitude", ylab = "Latitude",
     main = sprintf("Spatial Field  (Moran's I = %.3f)", mi$I))
```

## Hot-Spot and Cold-Spot Detection — `hot_cold_spots()`

The **Getis-Ord Gi*** statistic (Getis and Ord 1992) identifies locations
where high (hot spot) or low (cold spot) values cluster locally.

```{r hotspot}
set.seed(5)
vals <- ifelse(coords$lon > 4,
               stats::rnorm(n, 30, 2),
               stats::rnorm(n, 18, 2))
hs <- hot_cold_spots(vals, coords, dist_threshold = 5)
print(table(hs$classification))
```

```{r hotspot_plot, fig.cap="Getis-Ord Gi* classification: hot spots (east), cold spots (west)."}
cols <- c("hot spot"       = "firebrick",
          "cold spot"      = "steelblue",
          "not significant"= "gray80")
plot(hs$lon, hs$lat,
     pch = 21, cex = 1.6,
     bg  = cols[hs$classification],
     xlab = "Longitude", ylab = "Latitude",
     main = "Getis-Ord Gi* Hot/Cold Spots")
legend("topright", legend = names(cols), pt.bg = cols,
       pch = 21, pt.cex = 1.4, bty = "n")
```

## Spatial Interpolation — `spatial_interpolate()`

Interpolates station values onto a regular grid using **IDW** or **loess spline**.

```{r interp, fig.cap="IDW interpolation from 25 stations to a 20×20 grid."}
set.seed(7)
obs  <- matrix(stats::runif(50), ncol = 2,
               dimnames = list(NULL, c("lon","lat")))
vals_obs <- sin(obs[,"lon"] * 3) + cos(obs[,"lat"] * 3)

grd  <- as.matrix(expand.grid(
  lon = seq(0.05, 0.95, length.out = 20),
  lat = seq(0.05, 0.95, length.out = 20)))
pred <- spatial_interpolate(obs, vals_obs, grd, method = "idw")

image(matrix(pred, 20, 20),
      col  = colorRampPalette(c("steelblue","white","firebrick"))(64),
      main = "IDW Interpolated Climate Field",
      xlab = "Longitude", ylab = "Latitude")
points((obs[,"lon"] - 0.05) / 0.9,
       (obs[,"lat"] - 0.05) / 0.9, pch = 3, cex = 0.8)
```

## Spatial Trend Field — `spatial_trend_field()`

Computes **per-cell Mann-Kendall + Sen slope** across a space-time matrix —
essential for mapping where warming is occurring.

```{r stf}
set.seed(9)
mat_st <- matrix(stats::rnorm(50 * 200), nrow = 50)
mat_st[, 101:200] <- mat_st[, 101:200] +
                     outer(seq_len(50), rep(0.04, 100))

stf <- spatial_trend_field(mat_st)
cat(sprintf("Cells with significant trend: %d / %d (%.0f%%)\n",
            sum(stf$p.value < 0.05, na.rm = TRUE),
            nrow(stf),
            100 * mean(stf$p.value < 0.05, na.rm = TRUE)))
```

```{r stf_plot, fig.cap="Fraction of significant trend cells by group."}
stf$group <- ifelse(seq_len(nrow(stf)) <= 100, "No trend", "Trend imposed")
sig_frac  <- tapply(stf$p.value < 0.05, stf$group, mean, na.rm = TRUE)
barplot(sig_frac * 100,
        col = c("gray70", "firebrick"),
        ylab = "% Significant (α=0.05)",
        main = "Spatial Trend Field Results",
        border = NA)
```

## Climate Zone Classification — `cluster_climate_zones()`

K-means clustering on multi-variable climate normals delineates coherent
climate regions.

```{r cluster}
set.seed(1)
clim_mat <- matrix(
  c(stats::rnorm(200, rep(c(5, 15, 25, 10, 20), each = 40), 2),
    stats::rnorm(200, rep(c(1200, 600, 200, 900, 400), each = 40), 80)),
  ncol = 2,
  dimnames = list(NULL, c("temp", "precip")))

cz <- cluster_climate_zones(clim_mat, k = 5)
cat("Cluster sizes:\n")
print(table(cz$cluster))
```

```{r cluster_plot, fig.cap="Five K-means climate zones in temperature–precipitation space."}
pal <- c("#e41a1c","#377eb8","#4daf4a","#984ea3","#ff7f00")
plot(clim_mat[, "temp"], clim_mat[, "precip"],
     pch = 21, cex = 0.9,
     bg  = pal[cz$cluster],
     xlab = "Mean Temperature (°C)",
     ylab = "Annual Precipitation (mm)",
     main = "K-Means Climate Zone Classification")
legend("topright", legend = paste("Zone", 1:5),
       pt.bg = pal, pch = 21, pt.cex = 1.2, bty = "n")
```

## Elevation Lapse Rate — `elevation_lapse_rate()`

Estimates the temperature lapse rate from station data — essential for
statistical downscaling.

```{r lapse, fig.cap="Temperature vs elevation with fitted lapse rate."}
elev <- seq(100, 3000, by = 100)
temp_lapse <- 25 - 0.0065 * elev + stats::rnorm(30, 0, 0.4)
lr <- elevation_lapse_rate(temp_lapse, elev)

plot(elev, temp_lapse, pch = 16, cex = 0.8, col = "steelblue",
     xlab = "Elevation (m a.s.l.)",
     ylab = "Temperature (°C)",
     main = sprintf("Environmental Lapse Rate: %.2f °C / 1000 m",
                    lr$lapse_rate_per_1000m))
lines(lr$data$elevation, lr$data$fitted, col = "firebrick", lwd = 2)
legend("topright",
       legend = c("Station data",
                  sprintf("Lapse rate = %.3f °C/1000 m  (R² = %.3f)",
                          lr$lapse_rate_per_1000m, lr$r_squared)),
       col    = c("steelblue", "firebrick"),
       pch    = c(16, NA), lty = c(NA, 1), lwd = c(NA, 2), bty = "n")
```

---

# Extreme Event Analysis

## GEV Distribution — `fit_gev()` and `rgev_sim()`

The **Generalised Extreme Value** distribution (Coles 2001) is the
limiting distribution for block-maxima. Three shape families are unified:
Gumbel (ξ = 0), Fréchet (ξ > 0, heavy tail), Weibull (ξ < 0, bounded).

```{r gev}
set.seed(10)
ann_max <- rgev_sim(60, mu = 34, sigma = 4.5, xi = 0.12)

gev <- fit_gev(ann_max)
print(gev)
```

## Return Levels — `return_period()`

```{r rl}
rp <- return_period(gev, c(2, 5, 10, 20, 50, 100, 200))
print(rp)
```

```{r rl_plot, fig.cap="GEV return level curve with 95% delta-method confidence band."}
with(rp, {
  plot(T, level, type = "b", log = "x", pch = 16,
       col  = "steelblue",
       xlab = "Return period (years)",
       ylab = "Temperature (°C)",
       main = "GEV Return Level Curve",
       ylim = range(c(lower, upper), na.rm = TRUE))
  polygon(c(T, rev(T)), c(upper, rev(lower)),
          col = adjustcolor("steelblue", 0.18), border = NA)
  lines(T, lower, lty = 2, col = "steelblue")
  lines(T, upper, lty = 2, col = "steelblue")
  abline(v = c(10, 100), lty = 3, col = "gray60")
})
```

## Peaks-Over-Threshold — `peaks_over_threshold()`

The **POT / GPD** approach (Davison and Smith 1990) uses all exceedances
above a threshold rather than only one value per year, yielding more
efficient estimates for long return periods.

```{r pot}
set.seed(11)
daily_p <- stats::rexp(365 * 30, rate = 1 / 6)
pot <- peaks_over_threshold(daily_p, threshold = 22,
                             n_years = 30,
                             return_periods = c(10, 50, 100, 200))
cat(sprintf("Threshold     : %.0f mm\n",  pot$threshold))
cat(sprintf("Peaks retained: %d\n",        pot$n_excess))
cat(sprintf("GPD shape (ξ) : %.4f\n",      pot$xi))
cat(sprintf("GPD scale (σ) : %.4f\n",      pot$sigma))
cat("\nReturn levels:\n")
print(pot$return_levels)
```

## Heat Wave Detection — `heat_wave_detection()`

```{r heatwave}
dates <- seq(as.Date("2000-01-01"), by = "day",
             length.out = 365 * 15)
doy   <- as.integer(format(dates, "%j"))
tmax  <- 28 + 10 * sin(2 * pi * doy / 365) +
         stats::rnorm(length(dates), 0, 2.5)

hw <- heat_wave_detection(tmax, dates,
                           threshold = "p95", min_days = 3)
cat(sprintf("Heat wave events over 15 years : %d\n", nrow(hw)))
cat(sprintf("Mean duration                  : %.1f days\n",
            mean(hw$duration)))
cat(sprintf("Maximum peak temperature       : %.1f °C\n",
            max(hw$peak_temp)))
```

```{r heatwave_plot, fig.cap="Annual heat wave count and mean duration over 15 years."}
hw$year <- as.integer(format(hw$start_date, "%Y"))
ann_hw  <- tapply(hw$duration, hw$year, length)

barplot(ann_hw,
        col  = "firebrick",
        border = NA,
        xlab = "Year", ylab = "Number of events",
        main = "Annual Heat Wave Frequency (p95 threshold, ≥3 days)")
```

## Cold Spell Detection — `cold_spell_detection()`

```{r coldspell}
cs <- cold_spell_detection(
  tmin  = tmax - stats::rnorm(length(tmax), 12, 1),
  dates = dates,
  threshold = "p05",
  min_days  = 3)
cat(sprintf("Cold spell events: %d\n", nrow(cs)))
if (nrow(cs) > 0) {
  cat(sprintf("Mean duration    : %.1f days\n", mean(cs$duration)))
}
```

## Drought Spell Detection — `drought_spell()`

```{r drought}
set.seed(12)
spi_vals <- stats::rnorm(360)
dates_m  <- seq(as.Date("1990-01-01"), by = "month",
                length.out = 360)
droughts <- drought_spell(spi_vals, dates_m, threshold = -1.0,
                          min_duration = 2)
cat(sprintf("Drought spells detected : %d\n", nrow(droughts)))
cat(sprintf("Mean duration           : %.1f months\n",
            mean(droughts$duration)))
cat(sprintf("Maximum severity        : %.2f\n",
            max(droughts$severity)))
```

## Hill Tail-Index Estimator — `extreme_value_index()`

The **Hill estimator** (Hill 1975) characterises how heavy-tailed a
distribution is — important for wind speed, flood peaks, and other
power-law climate variables.

```{r hill, fig.cap="Hill stability plot — plateau indicates a good choice of k."}
set.seed(13)
ws <- abs(stats::rnorm(500, 10, 4))^1.5
ev <- extreme_value_index(ws, k = 30)
cat(sprintf("Hill tail index (k=%d): %.4f\n", ev$k_used, ev$hill_index))

plot(ev$hill_plot$k, ev$hill_plot$hill, type = "l",
     col = "steelblue", lwd = 1.5,
     xlab = "k (number of order statistics)",
     ylab = "Hill estimate",
     main = "Hill Stability Plot")
abline(v = ev$k_used, col = "firebrick", lty = 2)
```

---

# Climate Indices

## Standardised Precipitation Index — `spi()`

The **SPI** (McKee et al. 1993) standardises accumulated precipitation at
any time scale (1 to 48+ months) against a gamma reference distribution.
Values below −1.0 indicate moderate drought; below −2.0 indicate extreme
drought.

```{r spi_data}
precip <- stats::rgamma(360, shape = 2, scale = 35)  # 30 years
spi3  <- spi(precip, scale = 3)
spi12 <- spi(precip, scale = 12)

cat(sprintf("SPI-3  — mean: %.3f  SD: %.3f\n",
            mean(spi3,  na.rm = TRUE), stats::sd(spi3,  na.rm = TRUE)))
cat(sprintf("SPI-12 — mean: %.3f  SD: %.3f\n",
            mean(spi12, na.rm = TRUE), stats::sd(spi12, na.rm = TRUE)))
```

```{r spi_plot, fig.height=5, fig.cap="SPI-3 (top) and SPI-12 (bottom) drought indices over 30 years."}
old_par <- graphics::par(mfrow = c(2, 1), mar = c(3, 4, 2, 1))

col3 <- ifelse(!is.na(spi3) & spi3 >= 0, "steelblue", "firebrick")
graphics::plot(spi3, type = "h", col = col3,
               xlab = "", ylab = "SPI-3",
               main = "Standardised Precipitation Index")
graphics::abline(h = c(-1, 1), lty = 2, col = "gray40")

col12 <- ifelse(!is.na(spi12) & spi12 >= 0, "steelblue", "firebrick")
graphics::plot(spi12, type = "h", col = col12,
               xlab = "Month", ylab = "SPI-12")
graphics::abline(h = c(-1, 1), lty = 2, col = "gray40")

graphics::par(old_par)
```

## SPEI — `spei()`

The **SPEI** (Vicente-Serrano et al. 2010) fits a log-logistic distribution
to the climatic water balance (P − PET), making it sensitive to temperature-
driven evapotranspiration increases — a key advantage under climate change.

```{r spei_calc}
set.seed(14)
tmin_m <- abs(stats::rnorm(360, 8, 3)) + 2
tmax_m <- tmin_m + stats::runif(360, 7, 14)
pr_m   <- stats::rgamma(360, 5, 0.06)

sp6 <- spei(precip = pr_m, tmin = tmin_m, tmax = tmax_m,
            lat = 45, scale = 6)
cat(sprintf("SPEI-6 — mean: %.3f  SD: %.3f\n",
            mean(sp6, na.rm = TRUE), stats::sd(sp6, na.rm = TRUE)))
```

```{r spei_plot, fig.cap="SPEI-6 — the index accounts for evapotranspiration demand."}
col_sp <- ifelse(!is.na(sp6) & sp6 >= 0, "steelblue", "firebrick")
graphics::plot(sp6, type = "h", col = col_sp,
               xlab = "Month", ylab = "SPEI-6",
               main = "SPEI-6  (Hargreaves PET, lat = 45°)")
graphics::abline(h = c(-1, 1), lty = 2, col = "gray40")
```

## Simplified PDSI — `pdsi_simple()`

The **Palmer Drought Severity Index** integrates precipitation deficit and
evapotranspiration over multiple months, making it sensitive to slowly
evolving droughts.

```{r pdsi, fig.cap="Simplified PDSI — values below −2 indicate severe drought."}
doy_m  <- rep(1:12, 30)
temp_p <- 10 + 12 * sin(pi * doy_m / 6) + stats::rnorm(360, 0, 1)
pr_p   <- pmax(0, 50 + 20 * cos(pi * doy_m / 6) +
               stats::rnorm(360, 0, 15))

pdsi_vals <- pdsi_simple(temp_p, pr_p, lat = 40)
graphics::plot(pdsi_vals, type = "l", col = "steelblue",
               xlab = "Month", ylab = "PDSI",
               main = "Simplified Palmer Drought Severity Index")
graphics::abline(h = c(-2, 2), lty = 2, col = c("firebrick","steelblue"))
graphics::abline(h = 0, lty = 3, col = "gray50")
```

## Heat Index — `heat_index()`

The **Heat Index** (Rothfusz 1990; Steadman 1979) converts temperature and
relative humidity to an apparent "feels-like" temperature.

```{r hi, fig.cap="Heat index surface: apparent temperature rises steeply with humidity."}
temp_grid <- seq(25, 45, by = 2)
rh_grid   <- seq(30, 100, by = 5)
hi_mat    <- outer(temp_grid, rh_grid,
                    FUN = function(t, r) heat_index(t, r))

image(temp_grid, rh_grid, hi_mat,
      col  = colorRampPalette(c("lightyellow","orange","firebrick"))(64),
      xlab = "Air temperature (°C)",
      ylab = "Relative humidity (%)",
      main = "Heat Index (°C)")
contour(temp_grid, rh_grid, hi_mat,
        levels = c(27, 32, 41, 54), add = TRUE, col = "white")
```

## Wind Chill — `wind_chill()`

```{r wc, fig.cap="Wind chill surface: perceived temperature can be far below air temperature."}
temp_wc <- seq(-30, 5, by = 2)
wind_wc <- seq(5, 80, by = 5)
wc_mat  <- outer(temp_wc, wind_wc, FUN = wind_chill)

image(temp_wc, wind_wc, wc_mat,
      col  = colorRampPalette(c("navy","steelblue","white"))(64),
      xlab = "Air temperature (°C)",
      ylab = "Wind speed (km/h)",
      main = "Wind Chill Temperature (°C)")
contour(temp_wc, wind_wc, wc_mat,
        levels = c(-40, -30, -20, -10), add = TRUE, col = "gray20")
```

## Frost Days — `frost_days()`

```{r frost, fig.cap="Annual frost day count simulated over 10 years."}
dates_d <- seq(as.Date("2010-01-01"), by = "day",
               length.out = 365 * 10)
doy_d   <- as.integer(format(dates_d, "%j"))
tmin_d  <- 5 - 15 * sin(2 * pi * doy_d / 365) +
            stats::rnorm(length(dates_d), 0, 2)

fd <- frost_days(tmin_d, dates_d, by = "year")
barplot(fd, col = "steelblue", border = NA,
        xlab = "Year", ylab = "Frost days",
        main = "Annual Frost Day Count (Tmin < 0 °C)")
```

## Growing Degree Days — `growing_degree_days()`

```{r gdd, fig.cap="Cumulative GDD accumulation through the growing season."}
dates_g <- seq(as.Date("2020-01-01"),
               as.Date("2020-12-31"), by = "day")
doy_g   <- as.integer(format(dates_g, "%j"))
tmax_g  <- 22 + 12 * sin(2 * pi * doy_g / 365)
tmin_g  <- 10 +  8 * sin(2 * pi * doy_g / 365)

gdd_cum <- growing_degree_days(tmax_g, tmin_g,
                                base_temp  = 10,
                                cumulative = TRUE)
plot(gdd_cum, type = "l", col = "darkgreen", lwd = 2,
     xlab = "Day of year",
     ylab = "Cumulative GDD (base 10 °C)",
     main = "Growing Degree Days 2020")
abline(v = 91,  lty = 2, col = "gray60")   # ~April 1
abline(v = 274, lty = 2, col = "gray60")   # ~Oct 1
text(91,  max(gdd_cum) * 0.05, "Apr", col = "gray40", adj = 0)
text(274, max(gdd_cum) * 0.05, "Oct", col = "gray40", adj = 1)
```

## Diurnal Temperature Range — `diurnal_temp_range()`

```{r dtr, fig.cap="Monthly mean DTR — a proxy for cloudiness and land-surface change."}
dtr <- diurnal_temp_range(tmax_g, tmin_g, dates_g, by = "month")
barplot(dtr, col = "steelblue", border = NA,
        names.arg = month.abb,
        xlab = "Month", ylab = "DTR (°C)",
        main = "Mean Diurnal Temperature Range by Month")
```

---

# Detection and Attribution

## Signal Detection — `detection_attribution()`

A projection-based approach that tests whether the observed climate change
is distinguishable from natural internal variability (Hasselmann 1979;
Allen and Tett 1999).

```{r detect}
set.seed(20)
obs_anom  <- cumsum(stats::rnorm(50, 0.03, 0.12))
nat_ens   <- matrix(stats::rnorm(50 * 30, 0, 0.45), ncol = 30)
forc_sig  <- seq(0, 1.5, length.out = 50)

da <- detection_attribution(obs_anom, nat_ens, forc_sig)
cat(sprintf("Signal detected     : %s\n", da$detected))
cat(sprintf("Z-score             : %.3f\n", da$z_score))
cat(sprintf("p-value             : %.4f\n", da$p.value))
cat(sprintf("Attribution fraction: %.1f%%\n",
            da$attribution_fraction * 100))
```

```{r detect_plot, fig.cap="Observed projection vs natural ensemble distribution — signal clearly separated."}
hist(da$projection_natural,
     col    = "steelblue",
     border = "white",
     breaks = 15,
     xlab   = "Projection onto forced signal (Pearson r)",
     main   = "Detection Test: Observed vs Natural Variability")
abline(v   = da$projection_observed,
       col = "firebrick", lwd = 3)
legend("topright",
       legend = c("Natural ensemble",
                  sprintf("Observed (r = %.3f)", da$projection_observed)),
       fill   = c("steelblue", NA),
       border = c("white", NA),
       lty    = c(NA, 1),
       col    = c(NA, "firebrick"),
       lwd    = c(NA, 3), bty = "n")
```

## EOF Fingerprint Analysis — `fingerprint_analysis()`

Extracts leading **Empirical Orthogonal Functions** (Lorenz 1956;
von Storch and Zwiers 1999) to identify dominant spatial modes of
climate variability or change.

```{r eof}
set.seed(21)
mat_eof <- matrix(stats::rnorm(60 * 200), nrow = 60)
# Inject a dominant warming pattern in first 80 locations
mat_eof[, 1:80] <- mat_eof[, 1:80] +
                   outer(seq(0, 2, length.out = 60), rep(1, 80))

fp <- fingerprint_analysis(mat_eof, n_eof = 3)
cat(sprintf("EOF1 explains: %.1f%% of variance\n",
            fp$var_explained[1] * 100))
cat(sprintf("EOF2 explains: %.1f%% of variance\n",
            fp$var_explained[2] * 100))
cat(sprintf("Cumulative (3 EOFs): %.1f%%\n",
            fp$cumvar[3] * 100))
```

```{r eof_plot, fig.cap="Leading PC time series — EOF1 captures the forced warming trend."}
matplot(fp$pc[, 1:3], type = "l",
        lty = 1, lwd = 1.5,
        col = c("firebrick", "steelblue", "darkgreen"),
        xlab = "Time step", ylab = "PC score",
        main = "Leading EOF Principal Components")
legend("topleft",
       legend = sprintf("PC%d (%.1f%%)", 1:3,
                        fp$var_explained * 100),
       col    = c("firebrick", "steelblue", "darkgreen"),
       lty    = 1, lwd = 1.5, bty = "n")
```

## Optimal Fingerprint Regression — `optimal_fingerprint()`

Estimates **scaling factors** for anthropogenic (ANT) and natural (NAT)
signals using GLS regression (Allen and Tett 1999; Hegerl and Zwiers 2011).

```{r ofp}
set.seed(22)
obs_c <- cumsum(stats::rnorm(50, 0.03, 0.10))
all_c <- cumsum(stats::rnorm(50, 0.028, 0.05)) +
         cumsum(stats::rnorm(50, 0.005, 0.03))
nat_c <- cumsum(stats::rnorm(50, 0, 0.12))

ofp <- optimal_fingerprint(obs_c, all_c, nat_c)
cat(sprintf("ANT scaling factor: %.3f\n", ofp$beta_all))
cat(sprintf("NAT scaling factor: %.3f\n", ofp$beta_nat))
cat(sprintf("Residual variance : %.4f\n", ofp$residual_variance))
```

---

# Data Quality and Utilities

## Gap Filling — `fill_gaps_climate()`

Three methods: **linear interpolation**, **monthly climatology**, and
**reference-station regression**.

```{r gaps}
x_gaps <- c(10.2, NA, NA, NA, 14.0, 15.1, NA, 17.3, 18.0)
cat("Original  :", x_gaps, "\n")
cat("Filled    :", round(fill_gaps_climate(x_gaps), 2), "\n")
```

## Homogenisation — `homogenize_series()`

Corrects a detected SNHT break by shifting the earlier segment to match
the mean of the later segment.

```{r homog}
set.seed(25)
x_inh <- c(stats::rnorm(40, 14.0, 0.5),
             stats::rnorm(40, 15.8, 0.5))
x_hom <- homogenize_series(x_inh)
cat(sprintf("Before adjustment — mean of segment 1: %.2f\n",
            mean(x_inh[1:40])))
cat(sprintf("After  adjustment — mean of segment 1: %.2f\n",
            mean(x_hom[1:40])))
cat(sprintf("Mean of segment 2 (reference)       : %.2f\n",
            mean(x_hom[41:80])))
```

## Temporal Aggregation — `aggregate_climate()`

```{r agg}
dates_agg <- seq(as.Date("2000-01-01"), by = "day",
                 length.out = 365 * 5)
temp_agg  <- stats::rnorm(length(dates_agg), 15, 6)

ann  <- aggregate_climate(temp_agg, dates_agg, to = "annual")
seas <- aggregate_climate(temp_agg, dates_agg, to = "seasonal")
cat("Annual means:\n"); print(ann)
```

## Anomaly Calculation — `anomaly_baseline()`

```{r anom, fig.cap="Temperature anomalies relative to 1961–1990 baseline."}
yr   <- 1950:2020
temp_b <- 13.5 + 0.022 * seq_len(71) + stats::rnorm(71, 0, 0.45)
anom <- anomaly_baseline(temp_b, yr, 1961, 1990)

plot(yr, anom, type = "l", col = "steelblue", lwd = 1.5,
     xlab = "Year", ylab = "Anomaly (°C)",
     main = "Temperature Anomalies  (1961–1990 baseline)")
abline(h = 0, lty = 2, col = "gray50")
lines(stats::lowess(yr, anom, f = 0.3),
      col = "firebrick", lwd = 2.5)
legend("topleft",
       legend = c("Annual anomaly", "LOWESS smoother"),
       col    = c("steelblue", "firebrick"),
       lty    = 1, lwd = c(1.5, 2.5), bty = "n")
```

## Standardisation — `standardize_climate()`

```{r std}
x_raw <- stats::rnorm(120, mean = 18, sd = 5)
z     <- standardize_climate(x_raw)
cat(sprintf("Raw  — mean: %.2f  SD: %.2f\n",
            mean(x_raw), stats::sd(x_raw)))
cat(sprintf("Std  — mean: %.6f  SD: %.6f\n",
            mean(z), stats::sd(z)))
```

## Comprehensive Summary — `climate_summary()`

```{r csummary}
temp_long <- 13.5 + 0.022 * seq_len(71) + stats::rnorm(71, 0, 0.45)
res <- climate_summary(temp_long,
                        variable_name = "Annual Mean Temperature (°C)")
```

---

# References

Allen, M. R. and Tett, S. F. B. (1999). Checking for model consistency in
optimal fingerprinting. *Climate Dynamics* **15**(6), 419–434.
[doi:10.1007/s003820050291](https://doi.org/10.1007/s003820050291)

Alexandersson, H. (1986). A homogeneity test applied to precipitation data.
*International Journal of Climatology* **6**(6), 661–675.
[doi:10.1002/joc.3370060607](https://doi.org/10.1002/joc.3370060607)

Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate.
*Journal of the Royal Statistical Society: Series B* **57**(1), 289–300.
[doi:10.1111/j.2517-6161.1995.tb02031.x](https://doi.org/10.1111/j.2517-6161.1995.tb02031.x)

Coles, S. (2001). *An Introduction to Statistical Modeling of Extreme
Values*. Springer-Verlag, London.
[doi:10.1007/978-1-4471-3675-0](https://doi.org/10.1007/978-1-4471-3675-0)

Davison, A. C. and Smith, R. L. (1990). Models for exceedances over high
thresholds. *Journal of the Royal Statistical Society: Series B* **52**(3),
393–442. [doi:10.1111/j.2517-6161.1990.tb01796.x](https://doi.org/10.1111/j.2517-6161.1990.tb01796.x)

Getis, A. and Ord, J. K. (1992). The analysis of spatial association by use
of distance statistics. *Geographical Analysis* **24**(3), 189–206.
[doi:10.1111/j.1538-4632.1992.tb00261.x](https://doi.org/10.1111/j.1538-4632.1992.tb00261.x)

Hegerl, G. C. and Zwiers, F. (2011). Use of models in detection and
attribution of climate change. *Wiley Interdisciplinary Reviews: Climate
Change* **2**(4), 570–591.
[doi:10.1002/wcc.121](https://doi.org/10.1002/wcc.121)

Hill, B. M. (1975). A simple general approach to inference about the tail of
a distribution. *Annals of Statistics* **3**(5), 1163–1174.
[doi:10.1214/aos/1176343247](https://doi.org/10.1214/aos/1176343247)

Mann, H. B. (1945). Nonparametric tests against trend. *Econometrica*
**13**(3), 245–259. [doi:10.2307/1907187](https://doi.org/10.2307/1907187)

McKee, T. B., Doesken, N. J. and Kleist, J. (1993). The relationship of
drought frequency and duration to time scales. *Proceedings of the 8th
Conference on Applied Climatology*, 179–184. American Meteorological Society.

Moran, P. A. P. (1950). Notes on continuous stochastic phenomena.
*Biometrika* **37**(1), 17–23.
[doi:10.2307/2332142](https://doi.org/10.2307/2332142)

Pettitt, A. N. (1979). A nonparametric approach to the change-point problem.
*Applied Statistics* **28**(2), 126–135.
[doi:10.2307/2346729](https://doi.org/10.2307/2346729)

Sen, P. K. (1968). Estimates of the regression coefficient based on
Kendall's tau. *Journal of the American Statistical Association* **63**(324),
1379–1389. [doi:10.2307/2285891](https://doi.org/10.2307/2285891)

Steadman, R. G. (1979). The assessment of sultriness. Part I. *Journal of
Applied Meteorology* **18**(7), 861–873.
[doi:10.1175/1520-0450(1979)018%3C0861:TAOSPI%3E2.0.CO;2](https://doi.org/10.1175/1520-0450(1979)018%3C0861:TAOSPI%3E2.0.CO;2)

Thornthwaite, C. W. (1948). An approach toward a rational classification of
climate. *Geographical Review* **38**(1), 55–94.
[doi:10.2307/210739](https://doi.org/10.2307/210739)

Vicente-Serrano, S. M., Begueria, S. and Lopez-Moreno, J. I. (2010).
A Multiscalar Drought Index Sensitive to Global Warming.
*Journal of Climate* **23**(7), 1696–1718.
[doi:10.1175/2009JCLI2909.1](https://doi.org/10.1175/2009JCLI2909.1)

von Storch, H. and Zwiers, F. W. (1999). *Statistical Analysis in Climate
Research*. Cambridge University Press, Cambridge.
[doi:10.1017/CBO9780511612336](https://doi.org/10.1017/CBO9780511612336)

Yue, S. and Wang, C. Y. (2002). Applicability of prewhitening to eliminate
the influence of serial correlation on the Mann-Kendall test. *Water
Resources Research* **38**(6), 4-1–4-7.
[doi:10.1029/2001WR000861](https://doi.org/10.1029/2001WR000861)

---

```{r session}
sessionInfo()
```