A Lightweight and Versatile NLP Toolkit


[Up] [Top]

Documentation for package ‘textpress’ version 1.1.1

Help Pages

abbreviations Common abbreviations for NLP
dict_generations Demo dictionary of generation-name variants for NER
dict_political Demo dictionary of political / partisan term variants for NER
fetch_urls Fetch URLs from a search engine
fetch_wiki_refs Fetch external citation URLs from Wikipedia article(s)
fetch_wiki_urls Fetch Wikipedia page URLs by search query
nlp_cast_tokens Convert token list to data frame
nlp_index_tokens Build a BM25 index for ranked keyword search
nlp_roll_chunks Roll units into fixed-size chunks with optional context
nlp_split_paragraphs Split text into paragraphs
nlp_split_sentences Split text into sentences
nlp_tokenize_text Tokenize text into a clean token stream
read_urls Read content from URLs
search_dict Exact phrase / MWE matcher
search_index Search the BM25 index
search_regex Search corpus by regex
search_vector Semantic search by cosine similarity
util_fetch_embeddings Fetch embeddings from a Hugging Face inference endpoint