lexnlp.extract.es package¶
Subpackages¶
Submodules¶
lexnlp.extract.es.copyrights module¶
-
class
lexnlp.extract.es.copyrights.
CopyrightEsParser
¶ Bases:
lexnlp.extract.common.copyrights.copyright_en_style_parser.CopyrightEnStyleParser
-
classmethod
extract_phrases_with_coords
(sentence: str) → List[Tuple[str, int, int]]¶
-
static
init_parser
()¶
-
line_processor
= <lexnlp.utils.lines_processing.line_processor.LineProcessor object>¶
-
classmethod
-
lexnlp.extract.es.copyrights.
get_copyright_annotations
(text: str, return_sources=False) → Generator[[lexnlp.extract.common.annotations.copyright_annotation.CopyrightAnnotation, None], None]¶
-
lexnlp.extract.es.copyrights.
get_copyright_list
(text: str, return_sources=False) → List[lexnlp.extract.common.annotations.copyright_annotation.CopyrightAnnotation]¶
-
lexnlp.extract.es.copyrights.
get_copyrights
(text: str, return_sources=False) → Generator[[dict, None], None]¶
lexnlp.extract.es.courts module¶
Court extraction for Spanish.
This module implements extraction functionality for courts in Spain, including formal names, abbreviations, and aliases.
-
lexnlp.extract.es.courts.
get_court_annotations
(text: str, language: str = None) → Generator[[dict, None], None]¶
-
lexnlp.extract.es.courts.
get_courts
(text: str, court_config_list: List[lexnlp.extract.en.dict_entities.DictionaryEntry], priority: bool = False, text_languages: List[str] = None) → Generator[[Tuple[lexnlp.extract.en.dict_entities.DictionaryEntry, lexnlp.extract.en.dict_entities.DictionaryEntryAlias], Any], Any]¶ See lexnlp/extract/en/tests/test_courts.py
-
lexnlp.extract.es.courts.
setup_es_parser
()¶
lexnlp.extract.es.dates module¶
Date extraction for Spanish. Dates parser based on dateparser package
-
class
lexnlp.extract.es.dates.
ESDateParser
(text=None, language='en', dateparser_settings=None, enable_classifier_check=None, classifier_model=None, classifier_threshold=None)¶ Bases:
lexnlp.extract.common.dates.DateParser
-
DATEPARSER_SETTINGS
= {'DATE_ORDER': 'DMY', 'PREFER_DAY_OF_MONTH': 'first', 'STRICT_PARSING': False}¶
-
ENABLE_CLASSIFIER_CHECK
= False¶
-
SEQUENTIAL_DATES_RE
= regex.Regex('(?P<text>(?P<day>\\d{1,2}) de (?P<month>septiembre|diciembre|noviembre|setiembre|febrero|octubre|agosto|abril|enero|julio|junio|marzo|mayo|sept|abr|ago|dic|ene|feb|jul|jun|mar|may|nov|oct|sep|set)(?:, | y | de (?P<year>\\d{4})))', flags=regex.I | regex.M | regex.V0)¶
-
WEIRD_DATES_NORM
= [(regex.Regex('(\\d+º\\s?de (?:septiembre|diciembre|noviembre|setiembre|febrero|octubre|agosto|abril|enero|julio|junio|marzo|mayo|sept|abr|ago|dic|ene|feb|jul|jun|mar|may|nov|oct|sep|set)(?: de \\d{4})?)', flags=regex.I | regex.M | regex.V0), <function ESDateParser.<lambda>>)]¶
-
get_extra_dates
()¶ Add custom search logic; use self.TEXT, self.LANGUAGE, self.DATES; update self.DATES :return: None
-
lexnlp.extract.es.definitions module¶
-
class
lexnlp.extract.es.definitions.
SpanishParsingMethods
¶ Bases:
object
- the class contains methods with the same signature:
- def method_name(phrase: str) -> List[DefinitionMatch]:
the methods are used for finding definition “candidates”
-
static
match_es_def_by_hereafter
(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶ Parameters: phrase – las instrucciones de uso o instalación del software o todas las descripciones de uso del mismo (de aquí en adelante, la “Documentación”); Returns: {name: ‘Documentación’, probability: 100, …}
-
static
match_es_def_by_reffered
(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶ Parameters: phrase – En este acuerdo, el término “Software” se refiere a: (i) el programa informático que acompaña a este Acuerdo y todos sus componentes; Returns: definitions (objects)
-
static
match_first_word_is
(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶ Parameters: phrase – El tabaquismo es la adicción al tabaco, provocada principalmente. Returns: definitions (objects)
-
reg_first_word_is
= re.compile('^.+?(?=es\\s+\\w+\\W+\\w+|está\\s+\\w+\\W+\\w+)')¶
-
reg_hereafter
= re.compile('(?<=(en adelante[,\\s]))[\\w\\s*\\"*]+')¶
-
reg_reffered
= re.compile('^.+(?=se refiere)')¶
-
lexnlp.extract.es.definitions.
get_definition_annotations
(text: str, language=None) → Generator[[lexnlp.extract.common.annotations.definition_annotation.DefinitionAnnotation, None], None]¶
-
lexnlp.extract.es.definitions.
get_definition_list
(text: str, language=None) → List[lexnlp.extract.common.annotations.definition_annotation.DefinitionAnnotation]¶
-
lexnlp.extract.es.definitions.
get_definitions
(text: str, language=None) → Generator[[dict, None], None]¶
-
lexnlp.extract.es.definitions.
make_es_definitions_parser
()¶
lexnlp.extract.es.language_tokens module¶
lexnlp.extract.es.regulations module¶
-
class
lexnlp.extract.es.regulations.
RegulationsParser
(regulations_dataframe: pandas.core.frame.DataFrame = None)¶ Bases:
object
Parses Spanish regulations (acts, institutions and so on): - “la emisión de instrumentos inscritos en el Registro Nacional de Valores, colocados”
boils down to ‘Registro Nacional de Valores’- expects words like ‘registro’, ‘comisión’, ‘comision’, ‘ley del’ that open the following phrase
-
get_annotations_as_dictionaries
() → List¶
-
load_trigger_words
() → None¶
-
match_start_trigger
(phrase: str) → None¶ Parameters: phrase – mediante la emisión de instrumentos inscritos en el Registro Nacional de Valores, colocados Returns: {name: ‘Registro Nacional de Valores’, probability: 100, …}
-
parse
(text: str, locale: str = None) → List[lexnlp.extract.common.annotations.regulation_annotation.RegulationAnnotation]¶
-
setup_regexes
() → None¶
-
trim_annotations
() → None¶
-
lexnlp.extract.es.regulations.
get_regulation_annotations
(text: str, language: str = None) → Generator[[lexnlp.extract.common.annotations.regulation_annotation.RegulationAnnotation, None], None]¶
-
lexnlp.extract.es.regulations.
get_regulation_list
(text: str, language: str = None) → List[lexnlp.extract.common.annotations.regulation_annotation.RegulationAnnotation]¶
-
lexnlp.extract.es.regulations.
get_regulations
(text: str, language: str = None) → Generator[[dict, None], None]¶
-
lexnlp.extract.es.regulations.
make_de_regulations_parser
()¶