Getting Started with nowcastr

Nowcasting is the process of estimating the current state of a phenomenon when the data are incomplete due to reporting delays. The nowcastr package implements the chain-ladder method for nowcasting, supporting both non-cumulative delay-based estimation and model-based completeness fitting (e.g., logistic or Gompertz curves). This vignette provides a quick start guide to using the package with demo data.

Setup

The package is available on GitHub. Install it with:

pak::pak("whocov/nowcastr")
library(nowcastr)

Data Structure

Your dataset must contain at least three columns:

The package includes a demo dataset nowcast_demo that follows this structure

print(nowcast_demo)
#> # A tibble: 1,624 × 4
#>     value date_occurrence date_report group        
#>     <dbl> <date>          <date>      <chr>        
#>  1 251563 2024-12-16      2025-05-26  Syndromic ARI
#>  2 219818 2024-12-23      2025-05-26  Syndromic ARI
#>  3 219815 2024-12-23      2025-06-02  Syndromic ARI
#>  4 253451 2024-12-30      2025-05-26  Syndromic ARI
#>  5 253454 2024-12-30      2025-06-09  Syndromic ARI
#>  6 311660 2025-01-06      2025-05-26  Syndromic ARI
#>  7 311666 2025-01-06      2025-06-02  Syndromic ARI
#>  8 311654 2025-01-06      2025-06-09  Syndromic ARI
#>  9 311657 2025-01-06      2025-06-16  Syndromic ARI
#> 10 313798 2025-01-13      2025-05-26  Syndromic ARI
#> # ℹ 1,614 more rows

The demo data also includes a group column for demonstrating grouped processing, though you can have multiple grouping columns.

Workflow

A typical nowcasting workflow with nowcastr involves the following steps.

1. Visualize Input Data

Before nowcasting, inspect the reporting pattern of your data:

nowcast_demo %>%
  plot_nc_input(
    option = "triangle",
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group"
  )

The “millipede” plot provides an alternative view of delays:

nowcast_demo %>%
  plot_nc_input(
    option = "millipede",
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group"
  )

2. Prepare Data (Optional)

You may want to fill missing values with the last known reporting values to ensure consistent time units:

data_filled <- nowcast_demo %>%
  fill_future_reported_values(
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group",
    max_delay = "auto"
  )
data_filled %>%
  plot_nc_input(
    option = "triangle",
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group"
  )

This step is optional; nowcast_cl can handle unfilled data.

3. Run Nowcast

Perform the nowcasting using the chain-ladder method:

nc_obj <-
  data_filled %>%
  nowcast_cl(
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = "group",
    time_units = "weeks",
    do_model_fitting = TRUE
  )

The nowcast_cl() function returns a nowcast_results object containing predictions, delay distributions, completeness estimates, and parameters.

S7::prop_names(nc_obj)
#>  [1] "name"         "params"       "time_start"   "time_end"     "n_groups"    
#>  [6] "max_delay"    "data"         "completeness" "delays"       "models"      
#> [11] "results"

4. Explore Results

Access the components of the result object:

nc_obj@results # Final nowcasted values
#> # A tibble: 95 × 7
#>    group    date_occurrence last_r_date delay value value_predicted completeness
#>    <chr>    <date>          <date>      <dbl> <dbl>           <dbl>        <dbl>
#>  1 SARS-Co… 2025-11-17      2025-11-17      0     0             0        0.00391
#>  2 SARS-Co… 2025-11-10      2025-11-17      1     0             0        0.0316 
#>  3 SARS-Co… 2025-11-03      2025-11-17      2     1            11.4      0.0878 
#>  4 SARS-Co… 2025-10-27      2025-11-17      3     6            36.5      0.164  
#>  5 SARS-Co… 2025-10-20      2025-11-17      4     7            27.8      0.252  
#>  6 SARS-Co… 2025-10-13      2025-11-17      5    21            61.5      0.341  
#>  7 SARS-Co… 2025-10-06      2025-11-17      6    52           122.       0.427  
#>  8 SARS-Co… 2025-09-29      2025-11-17      7    55           109.       0.507  
#>  9 SARS-Co… 2025-09-22      2025-11-17      8    70           121.       0.577  
#> 10 SARS-Co… 2025-09-15      2025-11-17      9    60            93.9      0.639  
#> # ℹ 85 more rows
nc_obj@delays # Delay distribution
#> # A tibble: 96 × 5
#>    group                      delay     n completeness_obs completeness_modelled
#>    <chr>                      <dbl> <int>            <dbl>                 <dbl>
#>  1 SARS-CoV-2 Hospital Admis…     0    10          0.0169                0.00391
#>  2 SARS-CoV-2 Hospital Admis…     1    10          0.00670               0.0316 
#>  3 SARS-CoV-2 Hospital Admis…     2    10          0.0646                0.0878 
#>  4 SARS-CoV-2 Hospital Admis…     3    10          0.163                 0.164  
#>  5 SARS-CoV-2 Hospital Admis…     4    10          0.250                 0.252  
#>  6 SARS-CoV-2 Hospital Admis…     5    10          0.321                 0.341  
#>  7 SARS-CoV-2 Hospital Admis…     6    10          0.442                 0.427  
#>  8 SARS-CoV-2 Hospital Admis…     7    10          0.537                 0.507  
#>  9 SARS-CoV-2 Hospital Admis…     8    10          0.611                 0.577  
#> 10 SARS-CoV-2 Hospital Admis…     9    10          0.668                 0.639  
#> # ℹ 86 more rows
nc_obj@completeness # Data with completeness estimates
#> # A tibble: 2,478 × 8
#>    group         date_occurrence date_report value delay last_value completeness
#>    <chr>         <date>          <date>      <dbl> <dbl>      <dbl>        <dbl>
#>  1 SARS-CoV-2 H… 2025-11-17      2025-11-17      0     0          0        1    
#>  2 SARS-CoV-2 H… 2025-11-10      2025-11-17      0     1          0        1    
#>  3 SARS-CoV-2 H… 2025-11-10      2025-11-10      0     0          0        1    
#>  4 SARS-CoV-2 H… 2025-11-03      2025-11-17      1     2          1        1    
#>  5 SARS-CoV-2 H… 2025-11-03      2025-11-10      0     1          1        0    
#>  6 SARS-CoV-2 H… 2025-11-03      2025-11-03      0     0          1        0    
#>  7 SARS-CoV-2 H… 2025-10-27      2025-11-17      6     3          6        1    
#>  8 SARS-CoV-2 H… 2025-10-27      2025-11-10      2     2          6        0.333
#>  9 SARS-CoV-2 H… 2025-10-27      2025-11-03      0     1          6        0    
#> 10 SARS-CoV-2 H… 2025-10-20      2025-11-17      7     4          7        1    
#> # ℹ 2,468 more rows
#> # ℹ 1 more variable: reportweight <dbl>
str(nc_obj@params) # Parameters used
#> List of 15
#>  $ col_date_occurrence         : chr "date_occurrence"
#>  $ col_date_reporting          : chr "date_report"
#>  $ col_value                   : chr "value"
#>  $ group_cols                  : chr "group"
#>  $ time_units                  : chr "weeks"
#>  $ max_delay                   : NULL
#>  $ max_reportunits             : num 10
#>  $ max_completeness            : num 5
#>  $ min_completeness_samples    : num 1
#>  $ use_weighted_method         : logi TRUE
#>  $ do_propagate_missing_delays : logi FALSE
#>  $ do_model_fitting            : logi TRUE
#>  $ model_names                 : chr [1:6] "monomolecular" "vonbertalanffy" "logistic" "gompertz" ...
#>  $ do_use_modelled_completeness: logi TRUE
#>  $ rss_threshold               : num 0.01

Plot the results:

plot(nc_obj, which = "delays") # Delay distribution

plot(nc_obj, which = "results") # Nowcast time series

Open a Shiny app to explore results group by group:

explore_nowcast(nc_obj)

How It Works

The chain-ladder method estimates “completeness” for each delay bucket:

Recent occurrence dates have shorter delays and lower completeness. The method upweights these observations to estimate the true count.

Grouped Processing

You can nowcast multiple groups (e.g., regions, diseases) in a single call by specifying multiple grouping columns:

nowcast_cl(
  # ...
  group_cols = c("region", "disease")
)

Other Utility Functions

Calculate Retro Scores of input data

retro_score = number of actual value changes / max possible value changes [0-1]

# Calculate retro-scores (= number of actual value changes / max possible value changes)
nowcast_demo %>%
  calculate_retro_score(
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = c("group")
  )
#> # A tibble: 4 × 4
#>   group                          n_changes max_retro_adj retro_score
#>   <chr>                              <dbl>         <dbl>       <dbl>
#> 1 SARS-CoV-2 non-STL Positivity        385           575       0.670
#> 2 SARS-CoV-2 Hospital Admissions       374           575       0.650
#> 3 Syndromic ILI                        371           575       0.645
#> 4 Syndromic ARI                        299           575       0.52

Remove duplicated data

This is the opposite of fill_future_reported_values(). This can be useful to reduce data size without losing information.

# Remove duplicate reported values (same value and higher reporting date)
nowcast_demo %>%
  rm_repeated_values(
    col_date_occurrence = date_occurrence,
    col_date_reporting = date_report,
    col_value = value,
    group_cols = c("group")
  )
#> # A tibble: 1,624 × 4
#>    value date_occurrence date_report group                         
#>    <dbl> <date>          <date>      <chr>                         
#>  1    12 2024-12-16      2025-05-26  SARS-CoV-2 Hospital Admissions
#>  2    31 2024-12-23      2025-05-26  SARS-CoV-2 Hospital Admissions
#>  3    22 2024-12-30      2025-05-26  SARS-CoV-2 Hospital Admissions
#>  4    21 2024-12-30      2025-06-02  SARS-CoV-2 Hospital Admissions
#>  5    18 2025-01-06      2025-05-26  SARS-CoV-2 Hospital Admissions
#>  6    19 2025-01-06      2025-06-16  SARS-CoV-2 Hospital Admissions
#>  7    11 2025-01-13      2025-05-26  SARS-CoV-2 Hospital Admissions
#>  8     7 2025-01-20      2025-05-26  SARS-CoV-2 Hospital Admissions
#>  9     8 2025-01-20      2025-06-16  SARS-CoV-2 Hospital Admissions
#> 10    17 2025-01-27      2025-05-26  SARS-CoV-2 Hospital Admissions
#> # ℹ 1,614 more rows