lexnlp.nlp.en.segments.sections: Segmenting sentences in text

The lexnlp.nlp.en.segments.sentences module contains methods for segmenting text into zero or more sentences.

Attention

The sections below are a work in progress. Thank you for your patience while we continue to expand and improve our documentation coverage.

If you have any questions in the meantime, please feel free to log issues on GitHub at the URL below or contact us at the email below:

lexnlp.nlp.en.segments.sentences Module

Sentence segmentation for English.

This module implements sentence segmentation in English using simple machine learning classifiers.

Todo:
  • Standardize model (re-)generation

Functions

build_sentence_model(text[, extra_abbrevs]) Build a sentence model from text with optional extra abbreviations to include.
get_sentence_list(text) Get sentences from text.
get_sentence_span_list(…) Given a text, returns a list of the (start, end) spans of sentences in the text.

Variables

MODULE_PATH str(object=’‘) -> str
SENTENCE_SEGMENTER_MODEL A sentence tokenizer which uses an unsupervised algorithm to build a model for abbreviation words, collocations, and words that start sentences; and then uses that model to find sentence boundaries.
extra_abbreviations list() -> new empty list