| abbreviations | Common Abbreviations for Linguistic Processing |
| dict_generations | Demo dictionary of generation-name variants for NER |
| dict_political | Demo dictionary of political / partisan term variants for NER |
| fetch_urls | Fetch URLs from a search engine |
| fetch_wiki_refs | Fetch external citation URLs from Wikipedia |
| fetch_wiki_urls | Fetch Wikipedia page URLs by search query |
| get_search_urls | Get the search URL(s) used by fetch_urls (for debugging or browser use) |
| nlp_cast_tokens | Convert Token List to Data Frame |
| nlp_index_tokens | Create a BM25 Search Index |
| nlp_roll_chunks | Roll units into fixed-size chunks with optional context |
| nlp_split_paragraphs | Split Text into Paragraphs |
| nlp_split_sentences | Split Text into Sentences |
| nlp_tokenize_text | Tokenize Text Data (mostly) Non-Destructively |
| read_urls | Read content from URLs |
| search_dict | Exact n-gram matcher (vector of terms) |
| search_index | Search the BM25 Index |
| search_regex | Search corpus via regex |
| search_vector | Vector search by cosine similarity |
| util_fetch_embeddings | Fetch embeddings (Hugging Face utility) |