---
title: "Health Surveillance Data with ANVISA"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Health Surveillance Data with ANVISA}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)
```

## Overview

**ANVISA (Agencia Nacional de Vigilancia Sanitaria)** is Brazil's health surveillance agency responsible for regulating medicines, health products, food, cosmetics, sanitizers, tobacco, and pesticides. Its open data portal provides CSV files on product registrations, post-market surveillance (pharmacovigilance, hemovigilance, technovigilance), and controlled substance sales (SNGPC).

| Feature | Details |
|---------|---------|
| Source | HTTPS CSV (dados.anvisa.gov.br) |
| Data types | 14 (12 snapshot + 2 time-series) |
| Snapshot types | Current registry/database state |
| SNGPC types | Monthly controlled substance sales (2014--2026) |

## Data types

Use `anvisa_types()` to see all available types:

```{r setup}
library(healthbR)
library(dplyr)

anvisa_types()
```

### Snapshot types (12)

These download the current snapshot of the registry or database. No `year` or `month` parameter is needed.

| Code | Name | Category |
|------|------|----------|
| **medicines** | Medicamentos | Product Registration |
| **medical_devices** | Produtos para Saude | Product Registration |
| **food** | Alimentos | Product Registration |
| **cosmetics** | Cosmeticos | Product Registration |
| **sanitizers** | Saneantes | Product Registration |
| **tobacco** | Produtos Fumigenos | Product Registration |
| **pesticides** | Agrotoxicos | Reference |
| **hemovigilance** | Hemovigilancia | Surveillance |
| **technovigilance** | Tecnovigilancia | Surveillance |
| **vigimed_notifications** | VigiMed Notificacoes | Surveillance |
| **vigimed_medicines** | VigiMed Medicamentos | Surveillance |
| **vigimed_reactions** | VigiMed Reacoes | Surveillance |

### Time-series types (2)

These require a `year` parameter and optionally a `month` parameter.

| Code | Name | Availability |
|------|------|-------------|
| **sngpc** | SNGPC Industrializados | Jan 2014 -- Oct 2021, Jan 2026+ |
| **sngpc_compounded** | SNGPC Manipulados | Jan 2014 -- Oct 2021, Jan 2026+ |

## Product registrations

### Medicines

Download the complete registry of medicines approved by ANVISA:

```{r}
medicines <- anvisa_data(type = "medicines")

# explore the data
nrow(medicines)
names(medicines)

# filter active medicines
active <- medicines |>
  filter(SITUACAO_REGISTRO == "ATIVO")

# count by therapeutic class
active |>
  count(CLASSE_TERAPEUTICA, sort = TRUE)
```

### Medical devices

```{r}
devices <- anvisa_data(type = "medical_devices")

# count by risk class
devices |>
  count(CLASSE_RISCO, sort = TRUE)
```

### Food and cosmetics

```{r}
food <- anvisa_data(type = "food")
cosmetics <- anvisa_data(type = "cosmetics")
```

### Select specific variables

```{r}
# only keep product name and active ingredient
med_slim <- anvisa_data(
  type = "medicines",
  vars = c("NOME_PRODUTO", "PRINCIPIO_ATIVO", "SITUACAO_REGISTRO")
)
```

## Pesticides

Pesticide monographs list authorized active ingredients and their maximum residue limits (LMR):

```{r}
pesticides <- anvisa_data(type = "pesticides")

# search for a specific substance
anvisa_variables(type = "pesticides", search = "substancia")

# substances authorized for coffee
coffee <- pesticides |>
  filter(NO_CULTURA == "Cafe")
```

## Post-market surveillance

### Hemovigilance

Adverse events related to blood transfusions:

```{r}
hemo <- anvisa_data(type = "hemovigilance")

# count by reaction type
hemo |>
  count(TIPO_REACAO_TRANSFUSIONAL, sort = TRUE)

# count by state
hemo |>
  count(UF_NOTIFICACAO, sort = TRUE)
```

### Technovigilance

Adverse events related to medical devices:

```{r}
techno <- anvisa_data(type = "technovigilance")

# count by notification type
techno |>
  count(TIPO_NOTIFICACAO, sort = TRUE)
```

### Pharmacovigilance (VigiMed)

Drug/vaccine adverse event reports are split into three linked datasets sharing the `IDENTIFICACAO_NOTIFICACAO` key:

```{r}
# notifications (patient info + event summary)
notif <- anvisa_data(type = "vigimed_notifications")

# medicines involved
meds <- anvisa_data(type = "vigimed_medicines")

# adverse reactions (MedDRA coded)
reactions <- anvisa_data(type = "vigimed_reactions")

# link notifications to reactions
linked <- notif |>
  select(IDENTIFICACAO_NOTIFICACAO, SEXO, IDADE_MOMENTO_REACAO, GRAVE) |>
  inner_join(reactions, by = "IDENTIFICACAO_NOTIFICACAO")

# most common reactions
linked |>
  count(PT, sort = TRUE) |>
  head(20)
```

## SNGPC -- Controlled substance sales

The SNGPC (Sistema Nacional de Gerenciamento de Produtos Controlados) tracks sales of controlled substances (psychotropics, narcotics) by pharmacies across Brazil.

### Industrialized medicines

```{r}
# download January 2020 data
sngpc_jan <- anvisa_data(type = "sngpc", year = 2020, month = 1)

# top prescribed controlled substances
sngpc_jan |>
  count(DS_PRINCIPIO_ATIVO, sort = TRUE) |>
  head(10)

# sales by state
sngpc_jan |>
  count(SG_UF_VENDA, sort = TRUE)
```

### Multiple months

```{r}
# download Q1 2020 (Jan-Mar)
sngpc_q1 <- anvisa_data(type = "sngpc", year = 2020, month = 1:3)

# monthly trend
sngpc_q1 |>
  count(month, name = "sales")
```

### Compounded medicines

```{r}
# compounded (manipulated) controlled substances
manip <- anvisa_data(type = "sngpc_compounded", year = 2020, month = 1)

# most common active ingredients
manip |>
  count(NO_PRINCIPIO_ATIVO, sort = TRUE) |>
  head(10)
```

### Lazy evaluation for large queries

For large SNGPC queries spanning many months, use lazy evaluation to defer computation:

```{r}
# lazy query (requires arrow package)
lazy_sngpc <- anvisa_data(
  type = "sngpc", year = 2020, month = 1:12,
  lazy = TRUE
)

# filter and collect only what you need
result <- lazy_sngpc |>
  filter(SG_UF_VENDA == "SP") |>
  collect()
```

## Exploring variables

Use `anvisa_variables()` to see available variables for any type:

```{r}
# all medicines variables
anvisa_variables(type = "medicines")

# search across descriptions
anvisa_variables(type = "hemovigilance", search = "paciente")

# SNGPC variables
anvisa_variables(type = "sngpc")
```

## Caching

Downloaded data is cached locally for faster subsequent access:

```{r}
# check cache status
anvisa_cache_status()

# clear all cached ANVISA data
anvisa_clear_cache()
```

## Module information

```{r}
# full module overview
anvisa_info()
```
