lexnlp.nlp.en.segments.pages: Segmenting paragraphs in text

The lexnlp.nlp.en.segments.paragraphs module contains methods for segmenting text into zero or more paragraphs.

Attention

The sections below are a work in progress. Thank you for your patience while we continue to expand and improve our documentation coverage.

If you have any questions in the meantime, please feel free to log issues on GitHub at the URL below or contact us at the email below:

lexnlp.nlp.en.segments.paragraphs Module

Paragraph segmentation for English.

This module implements paragraph segmentation in English using simple machine learning classifiers.

Todo:
  • Standardize model (re-)generation

Functions

build_document_line_distribution(text[, …]) Build document and line character distribution for section segmenting based on fixed character, optionally normalizing vector.
build_paragraph_break_features(lines, …[, …]) Build a feature vector for a given line ID with given parameters.
get_paragraphs((text[, window_pre, …]) Get paragraphs.

Variables

MODULE_PATH str(object=’‘) -> str
PARAGRAPH_SEGMENTER_MODEL A decision tree classifier.