lexnlp.extract.common.definitions package

Submodules

lexnlp.extract.common.definitions.common_definition_patterns module

class lexnlp.extract.common.definitions.common_definition_patterns.CommonDefinitionPatterns

Bases: object

static collect_regex_matches(phrase: str, reg: <module 'regex' from '/home/docs/checkouts/readthedocs.org/user_builds/lexpredict-lexnlp/envs/1.6.0/lib/python3.6/site-packages/regex.py'>, prob: int, def_start: Callable[[str, Match[~AnyStr]], int], def_end: Callable[[str, Match[~AnyStr]], int]) → List[lexnlp.extract.common.pattern_found.PatternFound]

find all matches by ‘reg’ ptr :param quoted_def_start: (phrase, match, quoted_match) -> definition’s start :param quoted_def_end: (phrase, match, quoted_match) -> definition’s end :param def_start: (phrase, match) -> definition’s start :param def_end: (phrase, match) -> definition’s end :return:

static collect_regex_matches_with_quoted_chunks(phrase: str, reg: <module 'regex' from '/home/docs/checkouts/readthedocs.org/user_builds/lexpredict-lexnlp/envs/1.6.0/lib/python3.6/site-packages/regex.py'>, prob: int, quoted_def_start: Callable[[str, Match[~AnyStr], Match[~AnyStr]], int], quoted_def_end: Callable[[str, Match[~AnyStr], Match[~AnyStr]], int], def_start: Callable[[str, Match[~AnyStr]], int], def_end: Callable[[str, Match[~AnyStr]], int]) → List[lexnlp.extract.common.pattern_found.PatternFound]

First, find all matches by ‘reg’ ptr Second, go through matches For each match try to find a set of quoted words If found, use them as matches Or use the whole match :param quoted_def_start: (phrase, match, quoted_match) -> definition’s start :param quoted_def_end: (phrase, match, quoted_match) -> definition’s end :param def_start: (phrase, match) -> definition’s start :param def_end: (phrase, match) -> definition’s end :return:

static get_acronym_words_start(phrase: str, match: Match[~AnyStr]) → int

each acronym match should be preceded by capitalized words that start from the same letters :param phrase: “rompió el silencio tras ser despedido del Canal del Fútbol (CDF). ” :param match: “(CDF)” Match object for this example :return: start letter (42 for this case) index or -1

static match_acronyms(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]
Parameters:phrase – rompió el silencio tras ser despedido del Canal del Fútbol (CDF).
Returns:{name: ‘CDF’, probability: 100, …}
static match_es_def_by_semicolon(phrase: str) → List[lexnlp.extract.common.pattern_found.PatternFound]
Parameters:phrase – “Modern anatomy human”: a human of modern anatomy.
Returns:{name: ‘Modern anatomy human’, probability: 100, …}
static peek_quoted_part(phrase: str, match: Match[~AnyStr], start_func: Callable[[str, Match[~AnyStr], Match[~AnyStr]], int], end_func: Callable[[str, Match[~AnyStr], Match[~AnyStr]], int], match_prob: int) → List[lexnlp.extract.common.pattern_found.PatternFound]
Parameters:
  • phrase – the whole text, may be used for getting the definition’s text length
  • match – the matched part of the phrase that may contain several quote-packed definitions
  • start_func – (phrase, match, quoted_match) -> definition’s start
  • end_func – (phrase, match, quoted_match) -> definition’s end
  • match_prob – definition’s probability
Returns:

a list of definitions found or an empty list

reg_acronyms = regex.Regex('\\(\\p{Lu}\\p{L}*\\p{Lu}\\)', flags=regex.V0)
reg_quoted = regex.Regex('(["\'“„])(?:(?=(\\\\?))\\2.)*?\\1', flags=regex.I | regex.V0)
reg_semicolon = regex.Regex('(["\'“„])(?:(?=(\\\\?))\\2.)*?\\1(?=:)', flags=regex.I | regex.V0)

lexnlp.extract.common.definitions.definition_match module

class lexnlp.extract.common.definitions.definition_match.DefinitionMatch

Bases: object

used inside EsDefinitionsParser and SpanishParsingMethods to store intermediate parsing results

lexnlp.extract.common.definitions.universal_definition_parser module

class lexnlp.extract.common.definitions.universal_definition_parser.UniversalDefinitionsParser(parsing_functions: List[Callable[str, List[lexnlp.extract.common.pattern_found.PatternFound]]], split_params: lexnlp.utils.lines_processing.line_processor.LineSplitParams)

Bases: lexnlp.extract.common.text_pattern_collector.TextPatternCollector

EsDefinitionsParser searches for definitions in text according to the rules of Spanish. See the “parse” method

get_definition_dictionaries()
make_annotation_from_pattrn(locale: str, ptrn: lexnlp.extract.common.pattern_found.PatternFound, phrase: lexnlp.utils.lines_processing.line_processor.LineOrPhrase) → lexnlp.extract.common.annotations.text_annotation.TextAnnotation

Module contents