lexnlp.extract.common package¶

Subpackages¶

Submodules¶

lexnlp.extract.common.annotation_locator_type module¶

class lexnlp.extract.common.annotation_locator_type.AnnotationLocatorType¶

Bases: enum.Enum

An enumeration.

MlWordVectorBased = 2¶

RegexpBased = 1¶

lexnlp.extract.common.annotation_type module¶

class lexnlp.extract.common.annotation_type.AnnotationType¶

Bases: enum.Enum

An enumeration.

act = 1¶

amount = 2¶

citation = 3¶

condition = 4¶

constraint = 5¶

copyright = 6¶

court = 7¶

court_citation = 8¶

cusip = 9¶

date = 10¶

definition = 11¶

distance = 12¶

duration = 13¶

geoentity = 14¶

laws = 24¶

money = 15¶

percent = 16¶

phone = 18¶

pii = 17¶

ratio = 20¶

regulation = 21¶

ssn = 19¶

trademark = 22¶

url = 23¶

lexnlp.extract.common.base_path module¶

lexnlp.extract.common.dates module¶

lexnlp.extract.common.fact_extracting module¶

lexnlp.extract.common.language_dictionary_reader module¶

class lexnlp.extract.common.language_dictionary_reader.LanguageDictionaryReader¶

Bases: object

This class reads text files, where values are separated by <line_breaks>, strips the values if needed and returns them as List or Dict.

We use this class, e.g., while reading De locale common abbreviations.

static read_str_set(file_path: str, encoding='utf8', strip_symbols=' ') → Set[str]¶

lexnlp.extract.common.pattern_found module¶

class lexnlp.extract.common.pattern_found.PatternFound¶

Bases: object

used inside EsDefinitionsParser and SpanishParsingMethods to store intermediate parsing results

pattern_worse_than_target(p, text: str) → bool¶: check what pattern is better then 2 patterns are considered duplicated “text” may be used in derived classes

lexnlp.extract.common.special_characters module¶

class lexnlp.extract.common.special_characters.SpecialCharacters¶

Bases: object

punctuation = {'!', '"', '$', '%', '&', "'", '(', ')', '*', ',', '-', '.', '/', ':', ';', '?', '@', '\\', ']', '^', '{', '}['}¶

lexnlp.extract.common.text_beautifier module¶

class lexnlp.extract.common.text_beautifier.TextBeautifier¶

Bases: object

APOS_SEPARATORS = {'\t', ' ', '(', ')', ',', '.', ';', '[', ']', '{', '}'}¶

BRACES_C = {')', ']', '}'}¶

BRACES_O = {'(', '[', '{'}¶

BRACE_CL_BY_OP = {'(': ')', '[': ']', '{': '}'}¶

PAIR_BRACES = {'""', "''", '()', '[]', '``', '{}', '“”'}¶

PROPER_CLOSE_QUOTE = {'"': '"', '“': '”'}¶

QUOTES = {'"', '“', '”'}¶

TRANSFORMED_WORDS = {"''": ['"', '``', '“', '”'], '(': ['(', '[', '{'], ')': [')', ']', '}'], ':': [':', ';', '|'], '``': ['"', '``', '“', '”']}¶

static find_pair_among_apostrophe(text: str, apos_coords: List[int], quote: Tuple[str, int]) → int¶

static find_transformed_word(txt: str, word: str, offset: int) → Optional[Tuple[str, int]]¶: Searches for transformed word into text, returns transformed words with its start position

static lstrip_string_coords(text: str, start: int, end: int, trim_symbols: Optional[str] = None) → Tuple[str, int, int]¶

static normalize_smb_preserve_len(text: str) → str¶: Normalize some of the string characters, preserving original length :param text: string to normalize :return: normalized string

static rstrip_string_coords(text: str, start: int, end: int, trim_symbols: Optional[str] = None) → Tuple[str, int, int]¶

static strip_pair_symbols(term_coords: Union[str, Tuple[str, int, int]]) → Union[str, Tuple[str, int, int]]¶

static strip_string_coords(text: str, start: int, end: int, trim_symbols: Optional[str] = None) → Tuple[str, int, int]¶

static unify_quotes_braces(text: str, empty_replacement: str = '') → str¶

static unify_quotes_braces_coords(text: str, start: int, end: int, empty_replacement: str = '') → Tuple[str, int, int]¶

static unify_quotes_braces_unsafe(text: str, start: int, end: int, empty_replacement: str = '') → Tuple[str, int, int]¶

Parameters:	text – source text to “beautify” start – start coordinate of the text end – end coordinate of the text empty_replacement – replace unbalanced braces / quotes with this substring
Returns:	str with all quotes and braces replaced with their “normal” forms

lexnlp.extract.common.text_pattern_collector module¶

class lexnlp.extract.common.text_pattern_collector.TextPatternCollector(parsing_functions: List[Callable[str, List[lexnlp.extract.common.pattern_found.PatternFound]]], split_params: lexnlp.utils.lines_processing.line_processor.LineSplitParams)¶

Bases: object

basic_line_processor = <lexnlp.utils.lines_processing.line_processor.LineProcessor object>¶: EsDefinitionsParser searches for definitions in text according to the rules of Spanish. See the “parse” method

choose_best_matches(matches: List[lexnlp.extract.common.pattern_found.PatternFound]) → List[lexnlp.extract.common.pattern_found.PatternFound]¶

choose_more_precise_matches(matches: List[lexnlp.extract.common.pattern_found.PatternFound], text: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶: look for a match “consumed” by other matches and spare the consuming! matches

static estimate_match_quality(match: lexnlp.extract.common.pattern_found.PatternFound) → int¶

make_annotation_from_pattrn(locale: str, ptrn: lexnlp.extract.common.pattern_found.PatternFound, phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase) → lexnlp.extract.common.annotations.text_annotation.TextAnnotation¶

parse(text: str, locale: str = None) → List[lexnlp.extract.common.annotations.text_annotation.TextAnnotation]¶

Parameters:	locale – ‘En’, ‘De’, ‘Es’, … text – En este acuerdo, el término “Software” se refiere a: (i) el programa informático
Returns:	{ “attrs”: {“start”: 28, “end”: 82}, “tags”: {“Extracted Entity Type”: “definition”, “Extracted Entity Definition Name”: “Software”, “Extracted Entity Text”: “”Software” se refiere a: (i) el programa informático”} }

remove_prohibited_words(matches: List[lexnlp.extract.common.pattern_found.PatternFound]) → List[lexnlp.extract.common.pattern_found.PatternFound]¶

lexnlp.extract.common.universal_court_parser module¶

class lexnlp.extract.common.universal_court_parser.MatchFound(subset, entry_start: int, entry_end: int, text: str)¶

Bases: object

make_sort_key()¶

class lexnlp.extract.common.universal_court_parser.ParserInitParams¶

Bases: object

UniversalCourtsParser initialization parameters

class lexnlp.extract.common.universal_court_parser.UniversalCourtsParser(ptrs: lexnlp.extract.common.universal_court_parser.ParserInitParams)¶

Bases: object

The class describes a “constructor” for building locale (and region) specific parsers, that find reference to courts within the text.

Use the parse() method to find all reference to courts from the text provided. Each reference is a dictionary with two keys: - “attrs” key leads to the “coordinates” (starting and ending characters) of the

occurrence within the provided text

“tags” key leads to another dictionary, which contains: - court official name - court’s jurisdiction …

In order to parse the text you are supposed to create your locale (or region) specific instance of UniversalCourtsParser. See the constructor below:

add_annotation(match: lexnlp.extract.common.universal_court_parser.MatchFound)¶

find_court_by_any_key(phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase)¶

find_court_by_key_column(phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase, phrase_finder: lexnlp.utils.lines_processing.phrase_finder.PhraseFinder, column: str) → Tuple[lexnlp.extract.common.universal_court_parser.MatchFound, List[Tuple[str, int, int]]]¶

find_court_by_name(phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase) → List[lexnlp.extract.common.universal_court_parser.MatchFound]¶

find_court_by_type_and_jurisdiction(phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase) → List[lexnlp.extract.common.universal_court_parser.MatchFound]¶

find_courts_by_alias_in_whole_text(text: str) → None¶

static get_unique_col_values(col_values)¶

load_courts(dataframe_paths: List[str])¶

parse(text: str, locale: str = None) → List[lexnlp.extract.common.annotations.court_annotation.CourtAnnotation]¶

Parameters:	text – the text being processed locale – ‘En’, ‘Es’, …
Returns:	annotations - List[dict]

Here is an example of the method’s call: ret = processor.parse(“Bei dir läuft, deine Verfassungsgerichtshof des Freistaates Sachsen rauchen Joints vor der Kamera”)

ret[0][‘attrs’] = {‘start’: 14, ‘end’: 97} ret[0][‘tags’] = {‘Extracted Entity Type’: ‘court’,

‘Extracted Entity Court Name’: ‘Verfassungsgerichtshof des Freistaates Sachsen’, ‘Extracted Entity Court Type’: ‘Verfassungsgericht’, ‘Extracted Entity Court Jurisdiction’: ‘Sachsen’}

lexnlp.extract.common.year_parser module¶

class lexnlp.extract.common.year_parser.YearParser¶

Bases: object

finds years in the string passed

check_year_ok(year: int, min_year: int = 1800, max_year=0)¶

get_years_with_coords_from_string(text: str, min_year: int = 1800, max_year=0) → List[Tuple[int, int, int]]¶

lexnlp.extract.common package¶

Subpackages¶

Submodules¶

lexnlp.extract.common.annotation_locator_type module¶

lexnlp.extract.common.annotation_type module¶

lexnlp.extract.common.base_path module¶

lexnlp.extract.common.dates module¶

lexnlp.extract.common.fact_extracting module¶

lexnlp.extract.common.language_dictionary_reader module¶

lexnlp.extract.common.pattern_found module¶

lexnlp.extract.common.special_characters module¶

lexnlp.extract.common.text_beautifier module¶

lexnlp.extract.common.text_pattern_collector module¶

lexnlp.extract.common.universal_court_parser module¶

lexnlp.extract.common.year_parser module¶

Module contents¶

LexNLP

Navigation

Related Topics