The gendertext package provides simple, transparent tools for identifying gendered language in text and suggesting gender neutral alternatives. It is designed for researchers, policy analysts, editors, and practitioners who want to assess and improve inclusive language in documents.
The package follows a dictionary based approach. All results come from a built in corpus of gendered terms paired with suggested neutral replacements, so every match can be traced back to a specific dictionary entry.
The package ships with gender_dictionary, a curated
dictionary of 208 gendered words and phrases. It covers occupational
titles, pronouns, forms of address, family terms, and common idioms,
informed by the United Nations guidelines for gender inclusive language
and the European Parliament guidance on gender neutral language.
data(gender_dictionary)
head(gender_dictionary, 10)
#> gendered neutral
#> 1 actress actor
#> 2 airman aviator
#> 3 airmen aviators
#> 4 alderman council member
#> 5 alumna graduate
#> 6 alumni graduates
#> 7 alumnus graduate
#> 8 anchorman news anchor
#> 9 assemblyman assembly member
#> 10 authoress author
nrow(gender_dictionary)
#> [1] 208The simplest way to use gendertext is to score a character string. The result reports how many tokens the text contains, how many of them are gendered according to the dictionary, and the corresponding percentages.
gender_score(
text = "Ladies and gentlemen, the chairman said he will call the policeman."
)
#> total_units gendered_units neutral_units gendered_percent neutral_percent
#> 1 11 6 5 54.54545 45.45455The reported neutral percentage is a proxy: it is the share of tokens not matched by any dictionary entry. Multi word phrases are matched before single words and each piece of text is counted at most once, so the phrase “ladies and gentlemen” is counted as one match spanning three tokens, never as “ladies” plus “gentlemen” on top of the phrase.
If you only need the number of dictionary matches, use
unit = "matches":
gender_suggestions() returns the gendered terms found in
a text together with the suggested neutral replacement for each one.
gender_replace() applies the dictionary to the original
text and returns a rewritten version. Capitalisation follows the matched
text.
gender_replace(
text = "The Chairman called the policeman and the FIREMAN."
)
#> [1] "The Chair called the police officer and the FIREFIGHTER."Replacement is plain substitution: the function does not adjust the surrounding grammar, so a replacement such as “they” for “he” may need a manual touch afterwards. Treat the output as a draft.
Every function accepts a custom dictionary through the
dictionary argument: a data frame with character columns
gendered and neutral. This makes it easy to
extend, restrict, or fully replace the built in corpus.
The functions also accept a path argument. Plain text
files are read with base R, so no additional packages are required.
txt <- system.file("extdata", "test.txt", package = "gendertext")
gender_score(path = txt)
#> total_units gendered_units neutral_units gendered_percent neutral_percent
#> 1 113 20 93 17.69912 82.30088
head(gender_suggestions(path = txt))
#> gendered suggested_neutral count
#> 1 his their 5
#> 2 her them 3
#> 3 actress actor 1
#> 4 brotherhood community 1
#> 5 businessmen businesspeople 1
#> 6 chairman chair 1Other document formats, such as PDF and Word, are supported through
the optional readtext package. Install it with
install.packages("readtext").
pdf <- system.file("extdata", "test.pdf", package = "gendertext")
gender_score(path = pdf)
#> total_units gendered_units neutral_units gendered_percent neutral_percent
#> 1 114 20 94 17.54386 82.45614Please note: PDF analysis depends on the presence of extractable text. Scanned or image only documents may not yield readable content.
gendertext offers a lightweight and reproducible way to examine gendered language in text. Its transparent, dictionary based design makes it suitable for research, policy review, editorial work, and exploratory analysis.