lexnlp.extract.common package¶
Subpackages¶
- lexnlp.extract.common.annotations package
- Submodules
- lexnlp.extract.common.annotations.act_annotation module
- lexnlp.extract.common.annotations.amount_annotation module
- lexnlp.extract.common.annotations.citation_annotation module
- lexnlp.extract.common.annotations.company_annotation module
- lexnlp.extract.common.annotations.condition_annotation module
- lexnlp.extract.common.annotations.constraint_annotation module
- lexnlp.extract.common.annotations.copyright_annotation module
- lexnlp.extract.common.annotations.court_annotation module
- lexnlp.extract.common.annotations.court_citation_annotation module
- lexnlp.extract.common.annotations.cusip_annotation module
- lexnlp.extract.common.annotations.date_annotation module
- lexnlp.extract.common.annotations.definition_annotation module
- lexnlp.extract.common.annotations.distance_annotation module
- lexnlp.extract.common.annotations.duration_annotation module
- lexnlp.extract.common.annotations.geo_annotation module
- lexnlp.extract.common.annotations.law_annotation module
- lexnlp.extract.common.annotations.money_annotation module
- lexnlp.extract.common.annotations.percent_annotation module
- lexnlp.extract.common.annotations.phone_annotation module
- lexnlp.extract.common.annotations.phrase_position_finder module
- lexnlp.extract.common.annotations.ratio_annotation module
- lexnlp.extract.common.annotations.regulation_annotation module
- lexnlp.extract.common.annotations.ssn_annotation module
- lexnlp.extract.common.annotations.text_annotation module
- lexnlp.extract.common.annotations.trademark_annotation module
- lexnlp.extract.common.annotations.url_annotation module
- Module contents
- lexnlp.extract.common.copyrights package
- lexnlp.extract.common.date_parsing package
- lexnlp.extract.common.definitions package
- lexnlp.extract.common.durations package
- lexnlp.extract.common.tests package
- Submodules
- lexnlp.extract.common.tests.definitions_text_annotator module
- lexnlp.extract.common.tests.test_annotation module
- lexnlp.extract.common.tests.test_datefinder module
- lexnlp.extract.common.tests.test_datefinder_tokenizer module
- lexnlp.extract.common.tests.test_dates module
- lexnlp.extract.common.tests.test_fact_extractor module
- lexnlp.extract.common.tests.test_phrase_position_finder module
- lexnlp.extract.common.tests.test_text_beautifier module
- lexnlp.extract.common.tests.test_universal_courts_parser module
- Module contents
Submodules¶
lexnlp.extract.common.annotation_locator_type module¶
lexnlp.extract.common.annotation_type module¶
-
class
lexnlp.extract.common.annotation_type.
AnnotationType
¶ Bases:
enum.Enum
An enumeration.
-
act
= 1¶
-
amount
= 2¶
-
citation
= 3¶
-
condition
= 4¶
-
constraint
= 5¶
-
copyright
= 6¶
-
court
= 7¶
-
court_citation
= 8¶
-
cusip
= 9¶
-
date
= 10¶
-
definition
= 11¶
-
distance
= 12¶
-
duration
= 13¶
-
geoentity
= 14¶
-
laws
= 24¶
-
money
= 15¶
-
percent
= 16¶
-
phone
= 18¶
-
pii
= 17¶
-
ratio
= 20¶
-
regulation
= 21¶
-
ssn
= 19¶
-
trademark
= 22¶
-
url
= 23¶
-
lexnlp.extract.common.base_path module¶
lexnlp.extract.common.dates module¶
lexnlp.extract.common.fact_extracting module¶
lexnlp.extract.common.language_dictionary_reader module¶
-
class
lexnlp.extract.common.language_dictionary_reader.
LanguageDictionaryReader
¶ Bases:
object
This class reads text files, where values are separated by <line_breaks>, strips the values if needed and returns them as List or Dict.
We use this class, e.g., while reading De locale common abbreviations.
-
static
read_str_set
(file_path: str, encoding='utf8', strip_symbols=' ') → Set[str]¶
-
static
lexnlp.extract.common.pattern_found module¶
-
class
lexnlp.extract.common.pattern_found.
PatternFound
¶ Bases:
object
used inside EsDefinitionsParser and SpanishParsingMethods to store intermediate parsing results
-
pattern_worse_than_target
(p, text: str) → bool¶ check what pattern is better then 2 patterns are considered duplicated “text” may be used in derived classes
-
lexnlp.extract.common.special_characters module¶
lexnlp.extract.common.text_beautifier module¶
-
class
lexnlp.extract.common.text_beautifier.
TextBeautifier
¶ Bases:
object
-
APOS_SEPARATORS
= {'\t', ' ', '(', ')', ',', '.', ';', '[', ']', '{', '}'}¶
-
BRACES_C
= {')', ']', '}'}¶
-
BRACES_O
= {'(', '[', '{'}¶
-
BRACE_CL_BY_OP
= {'(': ')', '[': ']', '{': '}'}¶
-
PAIR_BRACES
= {'""', "''", '()', '[]', '``', '{}', '“”'}¶
-
PROPER_CLOSE_QUOTE
= {'"': '"', '“': '”'}¶
-
QUOTES
= {'"', '“', '”'}¶
-
TRANSFORMED_WORDS
= {"''": ['"', '``', '“', '”'], '(': ['(', '[', '{'], ')': [')', ']', '}'], ':': [':', ';', '|'], '``': ['"', '``', '“', '”']}¶
-
static
find_pair_among_apostrophe
(text: str, apos_coords: List[int], quote: Tuple[str, int]) → int¶
-
static
find_transformed_word
(txt: str, word: str, offset: int) → Optional[Tuple[str, int]]¶ Searches for transformed word into text, returns transformed words with its start position
-
static
lstrip_string_coords
(text: str, start: int, end: int, trim_symbols: Optional[str] = None) → Tuple[str, int, int]¶
-
static
normalize_smb_preserve_len
(text: str) → str¶ Normalize some of the string characters, preserving original length :param text: string to normalize :return: normalized string
-
static
rstrip_string_coords
(text: str, start: int, end: int, trim_symbols: Optional[str] = None) → Tuple[str, int, int]¶
-
static
strip_pair_symbols
(term_coords: Union[str, Tuple[str, int, int]]) → Union[str, Tuple[str, int, int]]¶
-
static
strip_string_coords
(text: str, start: int, end: int, trim_symbols: Optional[str] = None) → Tuple[str, int, int]¶
-
static
unify_quotes_braces
(text: str, empty_replacement: str = '') → str¶
-
static
unify_quotes_braces_coords
(text: str, start: int, end: int, empty_replacement: str = '') → Tuple[str, int, int]¶
-
static
unify_quotes_braces_unsafe
(text: str, start: int, end: int, empty_replacement: str = '') → Tuple[str, int, int]¶ Parameters: - text – source text to “beautify”
- start – start coordinate of the text
- end – end coordinate of the text
- empty_replacement – replace unbalanced braces / quotes with this substring
Returns: str with all quotes and braces replaced with their “normal” forms
-
lexnlp.extract.common.text_pattern_collector module¶
-
class
lexnlp.extract.common.text_pattern_collector.
TextPatternCollector
(parsing_functions: List[Callable[str, List[lexnlp.extract.common.pattern_found.PatternFound]]], split_params: lexnlp.utils.lines_processing.line_processor.LineSplitParams)¶ Bases:
object
-
basic_line_processor
= <lexnlp.utils.lines_processing.line_processor.LineProcessor object>¶ EsDefinitionsParser searches for definitions in text according to the rules of Spanish. See the “parse” method
-
choose_best_matches
(matches: List[lexnlp.extract.common.pattern_found.PatternFound]) → List[lexnlp.extract.common.pattern_found.PatternFound]¶
-
choose_more_precise_matches
(matches: List[lexnlp.extract.common.pattern_found.PatternFound], text: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶ look for a match “consumed” by other matches and spare the consuming! matches
-
static
estimate_match_quality
(match: lexnlp.extract.common.pattern_found.PatternFound) → int¶
-
make_annotation_from_pattrn
(locale: str, ptrn: lexnlp.extract.common.pattern_found.PatternFound, phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase) → lexnlp.extract.common.annotations.text_annotation.TextAnnotation¶
-
parse
(text: str, locale: str = None) → List[lexnlp.extract.common.annotations.text_annotation.TextAnnotation]¶ Parameters: - locale – ‘En’, ‘De’, ‘Es’, …
- text – En este acuerdo, el término “Software” se refiere a: (i) el programa informático
Returns: { “attrs”: {“start”: 28, “end”: 82}, “tags”: {“Extracted Entity Type”: “definition”, “Extracted Entity Definition Name”: “Software”, “Extracted Entity Text”: “”Software” se refiere a: (i) el programa informático”} }
-
remove_prohibited_words
(matches: List[lexnlp.extract.common.pattern_found.PatternFound]) → List[lexnlp.extract.common.pattern_found.PatternFound]¶
-
lexnlp.extract.common.universal_court_parser module¶
-
class
lexnlp.extract.common.universal_court_parser.
MatchFound
(subset, entry_start: int, entry_end: int, text: str)¶ Bases:
object
-
make_sort_key
()¶
-
-
class
lexnlp.extract.common.universal_court_parser.
ParserInitParams
¶ Bases:
object
UniversalCourtsParser initialization parameters
-
class
lexnlp.extract.common.universal_court_parser.
UniversalCourtsParser
(ptrs: lexnlp.extract.common.universal_court_parser.ParserInitParams)¶ Bases:
object
The class describes a “constructor” for building locale (and region) specific parsers, that find reference to courts within the text.
Use the parse() method to find all reference to courts from the text provided. Each reference is a dictionary with two keys: - “attrs” key leads to the “coordinates” (starting and ending characters) of the
occurrence within the provided text- “tags” key leads to another dictionary, which contains: - court official name - court’s jurisdiction …
In order to parse the text you are supposed to create your locale (or region) specific instance of UniversalCourtsParser. See the constructor below:
-
add_annotation
(match: lexnlp.extract.common.universal_court_parser.MatchFound)¶
-
find_court_by_any_key
(phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase)¶
-
find_court_by_key_column
(phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase, phrase_finder: lexnlp.utils.lines_processing.phrase_finder.PhraseFinder, column: str) → Tuple[lexnlp.extract.common.universal_court_parser.MatchFound, List[Tuple[str, int, int]]]¶
-
find_court_by_name
(phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase) → List[lexnlp.extract.common.universal_court_parser.MatchFound]¶
-
find_court_by_type_and_jurisdiction
(phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase) → List[lexnlp.extract.common.universal_court_parser.MatchFound]¶
-
find_courts_by_alias_in_whole_text
(text: str) → None¶
-
static
get_unique_col_values
(col_values)¶
-
load_courts
(dataframe_paths: List[str])¶
-
parse
(text: str, locale: str = None) → List[lexnlp.extract.common.annotations.court_annotation.CourtAnnotation]¶ Parameters: - text – the text being processed
- locale – ‘En’, ‘Es’, …
Returns: annotations - List[dict]
Here is an example of the method’s call: ret = processor.parse(“Bei dir läuft, deine Verfassungsgerichtshof des Freistaates Sachsen rauchen Joints vor der Kamera”)
ret[0][‘attrs’] = {‘start’: 14, ‘end’: 97} ret[0][‘tags’] = {‘Extracted Entity Type’: ‘court’,
‘Extracted Entity Court Name’: ‘Verfassungsgerichtshof des Freistaates Sachsen’, ‘Extracted Entity Court Type’: ‘Verfassungsgericht’, ‘Extracted Entity Court Jurisdiction’: ‘Sachsen’}