lexnlp.extract.es package¶

Subpackages¶

lexnlp.extract.es.tests package

Submodules¶

lexnlp.extract.es.copyrights module¶

class lexnlp.extract.es.copyrights.CopyrightEsParser¶

Bases: lexnlp.extract.common.copyrights.copyright_en_style_parser.CopyrightEnStyleParser

classmethod extract_phrases_with_coords(sentence: str) → List[Tuple[str, int, int]]¶

static init_parser()¶

line_processor = <lexnlp.utils.lines_processing.line_processor.LineProcessor object>¶

lexnlp.extract.es.copyrights.get_copyright_annotations(text: str, return_sources=False) → Generator[[lexnlp.extract.common.annotations.copyright_annotation.CopyrightAnnotation, None], None]¶

lexnlp.extract.es.copyrights.get_copyright_list(text: str, return_sources=False) → List[lexnlp.extract.common.annotations.copyright_annotation.CopyrightAnnotation]¶

lexnlp.extract.es.copyrights.get_copyrights(text: str, return_sources=False) → Generator[[dict, None], None]¶

lexnlp.extract.es.courts module¶

Court extraction for Spanish.

This module implements extraction functionality for courts in Spain, including formal names, abbreviations, and aliases.

lexnlp.extract.es.courts.get_court_annotations(text: str, language: str = None) → Generator[[dict, None], None]¶

lexnlp.extract.es.courts.get_courts(text: str, court_config_list: List[lexnlp.extract.en.dict_entities.DictionaryEntry], priority: bool = False, text_languages: List[str] = None) → Generator[[Tuple[lexnlp.extract.en.dict_entities.DictionaryEntry, lexnlp.extract.en.dict_entities.DictionaryEntryAlias], Any], Any]¶: See lexnlp/extract/en/tests/test_courts.py

lexnlp.extract.es.courts.setup_es_parser()¶

lexnlp.extract.es.dates module¶

Date extraction for Spanish. Dates parser based on dateparser package

class lexnlp.extract.es.dates.ESDateParser(text=None, language='en', dateparser_settings=None, enable_classifier_check=None, classifier_model=None, classifier_threshold=None)¶

Bases: lexnlp.extract.common.dates.DateParser

DATEPARSER_SETTINGS = {'DATE_ORDER': 'DMY', 'PREFER_DAY_OF_MONTH': 'first', 'STRICT_PARSING': False}¶

ENABLE_CLASSIFIER_CHECK = False¶

SEQUENTIAL_DATES_RE = regex.Regex('(?P<text>(?P<day>\\d{1,2}) de (?P<month>septiembre|diciembre|noviembre|setiembre|febrero|octubre|agosto|abril|enero|julio|junio|marzo|mayo|sept|abr|ago|dic|ene|feb|jul|jun|mar|may|nov|oct|sep|set)(?:, | y | de (?P<year>\\d{4})))', flags=regex.I | regex.M | regex.V0)¶

WEIRD_DATES_NORM = [(regex.Regex('(\\d+º\\s?de (?:septiembre|diciembre|noviembre|setiembre|febrero|octubre|agosto|abril|enero|julio|junio|marzo|mayo|sept|abr|ago|dic|ene|feb|jul|jun|mar|may|nov|oct|sep|set)(?: de \\d{4})?)', flags=regex.I | regex.M | regex.V0), <function ESDateParser.<lambda>>)]¶

get_extra_dates()¶: Add custom search logic; use self.TEXT, self.LANGUAGE, self.DATES; update self.DATES :return: None

lexnlp.extract.es.definitions module¶

class lexnlp.extract.es.definitions.SpanishParsingMethods¶

Bases: object

the class contains methods with the same signature:: def method_name(phrase: str) -> List[DefinitionMatch]:

the methods are used for finding definition “candidates”

static match_es_def_by_hereafter(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶

Parameters:	phrase – las instrucciones de uso o instalación del software o todas las descripciones de uso del mismo (de aquí en adelante, la “Documentación”);
Returns:	{name: ‘Documentación’, probability: 100, …}

static match_es_def_by_reffered(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶

Parameters:	phrase – En este acuerdo, el término “Software” se refiere a: (i) el programa informático que acompaña a este Acuerdo y todos sus componentes;
Returns:	definitions (objects)

static match_first_word_is(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]¶

Parameters:	phrase – El tabaquismo es la adicción al tabaco, provocada principalmente.
Returns:	definitions (objects)

reg_first_word_is = re.compile('^.+?(?=es\\s+\\w+\\W+\\w+|está\\s+\\w+\\W+\\w+)')¶

reg_hereafter = re.compile('(?<=(en adelante[,\\s]))[\\w\\s*\\"*]+')¶

reg_reffered = re.compile('^.+(?=se refiere)')¶

lexnlp.extract.es.definitions.get_definition_annotations(text: str, language=None) → Generator[[lexnlp.extract.common.annotations.definition_annotation.DefinitionAnnotation, None], None]¶

lexnlp.extract.es.definitions.get_definition_list(text: str, language=None) → List[lexnlp.extract.common.annotations.definition_annotation.DefinitionAnnotation]¶

lexnlp.extract.es.definitions.get_definitions(text: str, language=None) → Generator[[dict, None], None]¶

lexnlp.extract.es.definitions.make_es_definitions_parser()¶

lexnlp.extract.es.language_tokens module¶

class lexnlp.extract.es.language_tokens.EsLanguageTokens¶

Bases: object

Spanish parts of speech, used in a number of parsing methods

abbreviations = {'abs.', 'act.', 'inc.', 'no.', 'nr.', 'p.'}¶

articles = ['el', 'la', 'los', 'las']¶

conjunctions = ['und', 'oder']¶

lexnlp.extract.es.regulations module¶

class lexnlp.extract.es.regulations.RegulationsParser(regulations_dataframe: pandas.core.frame.DataFrame = None)¶

Bases: object

Parses Spanish regulations (acts, institutions and so on): - “la emisión de instrumentos inscritos en el Registro Nacional de Valores, colocados”

boils down to ‘Registro Nacional de Valores’

expects words like ‘registro’, ‘comisión’, ‘comision’, ‘ley del’ that open the following phrase

get_annotations_as_dictionaries() → List¶

load_trigger_words() → None¶

match_start_trigger(phrase: str) → None¶

Parameters:	phrase – mediante la emisión de instrumentos inscritos en el Registro Nacional de Valores, colocados
Returns:	{name: ‘Registro Nacional de Valores’, probability: 100, …}

parse(text: str, locale: str = None) → List[lexnlp.extract.common.annotations.regulation_annotation.RegulationAnnotation]¶

setup_regexes() → None¶

trim_annotations() → None¶

lexnlp.extract.es.regulations.get_regulation_annotations(text: str, language: str = None) → Generator[[lexnlp.extract.common.annotations.regulation_annotation.RegulationAnnotation, None], None]¶

lexnlp.extract.es.regulations.get_regulation_list(text: str, language: str = None) → List[lexnlp.extract.common.annotations.regulation_annotation.RegulationAnnotation]¶

lexnlp.extract.es.regulations.get_regulations(text: str, language: str = None) → Generator[[dict, None], None]¶

lexnlp.extract.es.regulations.make_de_regulations_parser()¶

lexnlp.extract.es package¶

Subpackages¶

Submodules¶

lexnlp.extract.es.copyrights module¶

lexnlp.extract.es.courts module¶

lexnlp.extract.es.dates module¶

lexnlp.extract.es.definitions module¶

lexnlp.extract.es.language_tokens module¶

lexnlp.extract.es.regulations module¶

Module contents¶

LexNLP

Navigation

Related Topics