lexnlp.nlp.en.segments.pages
: Segmenting pages in text¶
The lexnlp.nlp.en.segments.pages
module contains methods for segmenting text
into zero or more pages.
Attention
The sections below are a work in progress. Thank you for your patience while we continue to expand and improve our documentation coverage.
If you have any questions in the meantime, please feel free to log issues on GitHub at the URL below or contact us at the email below:
- GitHub issues: https://github.com/LexPredict/lexpredict-lexnlp
- Email: support@contraxsuite.com
lexnlp.nlp.en.segments.pages Module¶
Page segmentation for English.
This module implements page segmentation in English using simple machine learning classifiers.
- Todo:
- Standardize model (re-)generation
Functions¶
build_document_distribution (text[, …]) |
Build document character distribution based on fixed character, optionally norming. |
build_page_break_features (lines, line_id, …) |
Build a feature vector for a given line ID with given parameters. |
get_pages ((text[, window_pre, window_post, …]) |
Get pages from text. |
Variables¶
MODULE_PATH |
str(object=’‘) -> str |
PAGE_SEGMENTER_MODEL |
A decision tree classifier. |