lexnlp.extract.common.copyrights package¶

Submodules¶

lexnlp.extract.common.copyrights.copyright_en_style_parser module¶

Copyright extraction for English using NLTK and NLTK pre-trained maximum entropy classifier.

This module implements basic Copyright extraction functionality in English relying on the pre-trained NLTK functionality, including POS tagger and NE (fuzzy) chunkers.

class lexnlp.extract.common.copyrights.copyright_en_style_parser.CopyrightEnStyleParser¶

Bases: object

copyright_dates_re = regex.Regex('\\d{2,}', flags=regex.V0)¶

copyright_ptn = '((Copyright\\W\\s*|\$\\s*[Cc]\\s*\$\\s*|©)+\\s*(\\d{4}(?:\\s*[-,–]\\s*\\d{4})?)?\\s*(.+))'¶

copyright_ptn_re = regex.Regex('((Copyright\\W\\s*|\$\\s*[Cc]\\s*\$\\s*|©)+\\s*(\\d{4}(?:\\s*[-,–]\\s*\\d{4})?)?\\s*(.+))', flags=regex.V0)¶

classmethod derive_company_name(ant: lexnlp.extract.common.annotations.copyright_annotation.CopyrightAnnotation, phrase: str) → None¶

classmethod extract_phrases_with_coords(sentence: str) → List[Tuple[str, int, int]]¶

static get_copyright(text: str, return_sources=False) → Generator[[lexnlp.extract.common.annotations.copyright_annotation.CopyrightAnnotation, None], None]¶

classmethod get_copyright_annotations(text: str, return_sources=False) → Generator[[lexnlp.extract.common.annotations.copyright_annotation.CopyrightAnnotation, None], None]¶: Find copyright in text. :param text: :param return_sources: :return:

reg_company_name = regex.Regex('[\\p{Lu}]+[\\p{L}\\s]*', flags=regex.V0)¶

reg_valid_company_name = regex.Regex('\\p{L}[\\p{L}\\s,]+', flags=regex.V0)¶

classmethod split_copyright_date(ant: lexnlp.extract.common.annotations.copyright_annotation.CopyrightAnnotation) → None¶

classmethod take_best_company_name(names: List[str]) → str¶

year_ptn = '(\\d{4}(?:\\s*[-,–]\\s*\\d{4})?)'¶

year_ptn_re = regex.Regex('(\\d{4}(?:\\s*[-,–]\\s*\\d{4})?)$', flags=regex.V0)¶

lexnlp.extract.common.copyrights.copyright_parser module¶

class lexnlp.extract.common.copyrights.copyright_parser.CopyrightParser(parsing_functions: List[Callable[str, List[lexnlp.extract.common.pattern_found.PatternFound]]], split_params: lexnlp.utils.lines_processing.line_processor.LineSplitParams)¶

Bases: lexnlp.extract.common.text_pattern_collector.TextPatternCollector

get_annotations_as_dictionaries() → List[dict]¶

make_annotation_from_pattrn(locale: str, ptrn: lexnlp.extract.common.pattern_found.PatternFound, phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase) → lexnlp.extract.common.annotations.text_annotation.TextAnnotation¶

lexnlp.extract.common.copyrights.copyright_parsing_methods module¶

class lexnlp.extract.common.copyrights.copyright_parsing_methods.CopyrightParsingMethods¶

Bases: object

get_company_name_from_match(text: str, company_search_options: str, years: List[Tuple[int, int, int]]) → str¶

init_regexes()¶

init_trigger_words()¶

match_c_years_word(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶

Parameters:	phrase – Copyright 1996 – 2019, Siemens
Returns:	{name: ‘1996 – 2019, Siemens’, probability: 100, …}

match_word_c_years(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶

Parameters:	phrase – © Siemens 1996 – 2019
Returns:	{name: ‘© Siemens 1996 – 2019’, probability: 100, …}

pre_process_found_matches(matches: List[lexnlp.extract.common.pattern_found.PatternFound], company_search_options: str) → List[lexnlp.extract.common.copyrights.copyright_pattern_found.CopyrightPatternFound]¶

lexnlp.extract.common.copyrights.copyright_pattern_found module¶

class lexnlp.extract.common.copyrights.copyright_pattern_found.CopyrightPatternFound(ptrn: lexnlp.extract.common.pattern_found.PatternFound = None)¶

Bases: lexnlp.extract.common.pattern_found.PatternFound

get_detalization_level(text: str) → int¶

get_length() → int¶

pattern_worse_than_target(p, text: str) → bool¶: check what pattern is better then 2 patterns are considered duplicated “text” may be used in derived classes

reg_uppercase = regex.Regex('[\\p{Lu}]+', flags=regex.V0)¶

lexnlp.extract.common.copyrights package¶

Submodules¶

lexnlp.extract.common.copyrights.copyright_en_style_parser module¶

lexnlp.extract.common.copyrights.copyright_parser module¶

lexnlp.extract.common.copyrights.copyright_parsing_methods module¶

lexnlp.extract.common.copyrights.copyright_pattern_found module¶

Module contents¶

LexNLP

Navigation

Related Topics