| abbreviations | Common abbreviations for NLP |
| dict_generations | Demo dictionary of generation-name variants for NER |
| dict_political | Demo dictionary of political / partisan term variants for NER |
| fetch_urls | Fetch URLs from a search engine |
| fetch_wiki_refs | Fetch external citation URLs from Wikipedia article(s) |
| fetch_wiki_urls | Fetch Wikipedia page URLs by search query |
| nlp_cast_tokens | Convert token list to data frame |
| nlp_index_tokens | Build a BM25 index for ranked keyword search |
| nlp_roll_chunks | Roll units into fixed-size chunks with optional context |
| nlp_split_paragraphs | Split text into paragraphs |
| nlp_split_sentences | Split text into sentences |
| nlp_tokenize_text | Tokenize text into a clean token stream |
| read_urls | Read content from URLs |
| search_dict | Exact phrase / MWE matcher |
| search_index | Search the BM25 index |
| search_regex | Search corpus by regex |
| search_vector | Semantic search by cosine similarity |
| util_fetch_embeddings | Fetch embeddings from a Hugging Face inference endpoint |