lexnlp.utils package

Submodules

lexnlp.utils.decorators module

lexnlp.utils.decorators.safe_failure(func)

return None on failure, either skip result if generator

lexnlp.utils.iterating_helpers module

lexnlp.utils.iterating_helpers.collapse_sequence(sequence: collections.abc.Iterable, predicate: Callable[[Any, Any], Any], accumulator: Any = 0.0) → Any
lexnlp.utils.iterating_helpers.count_sequence_matches(sequence: collections.abc.Iterable, predicate: Callable[Any, bool]) → int

lexnlp.utils.map module

class lexnlp.utils.map.Map(*args, **kwargs)

Bases: dict

Example: m = Map(some_dict)

objectify(a_dict)

lexnlp.utils.parse_df module

class lexnlp.utils.parse_df.DataframeEntityParser(dataframe, parse_columns, result_columns=None, preformed_entity=None, priority_sort_column=None, priority_sort_ascending=True, cell_values_separator=';', unique_column_values=True, line_processor: lexnlp.utils.lines_processing.line_processor.LineProcessor = None)

Bases: object

Class that provides ability to extract entities from a text having some collection of entities formed as dataframe. By default it means that dataframe has UNIQUE values in those columns you use for search. Returns dict of start/end positions of found item in a text and other user-defined key-value pairs

Params:
  • dataframe: pandas.DataFrame with entities collection
  • parse_columns: list or tuple - these columns will be used to search their values in a text
  • result_columns: dict - map like {‘dataframe column name to take a value corresponding with extracted entity’: ‘new_column_name’}
  • preformed_entity: dict - initial, static key-value pairs to use for each extracted entity
  • priority_sort_column: str - column name to sort by and get first match if multiple results found, otherwise the first matched row will be used
  • priority_sort_ascending: bool - sort order for priority_sort_column
  • cell_values_separator: str or None - multiple values in datafame cell separated by that separator
  • unique_column_values: bool - dataframe columns have unique values
E.g.:
>>> parse_columns = ('Kurztitel', 'Titel', 'Abkürzung')
>>> result_columns = {'Titel': 'name'}
>>> preformed_entity = {'entity_type': 'Laws and Rules',
>>>                     'source': 'BaFin',
>>>                     'country': 'Germany'}
>>> sort_column = 'Titel'
>>> items = DataframeEntityParser(
>>>     df, parse_columns, result_columns, preformed_entity, sort_column).parse(text)
SEARCH_PTN = '(?:^|\\W)({})(?:\\W|$)'
get_collection_ptn(collection)

Convert list of values to regex pattern :param collection: list of entities to search in :return: compilled regex pattern

get_entities(text: str)
get_entities_from_text(text: str) → Generator[[dict, None], None]
get_entity_list(text)
get_formed_entity(match, col_name)

Get formed entity from matched row in dataframe :param match: re.match object :param col_name: df column name :return: dict

get_single_result(rows)

By default we mean that all values we filter by in dataframe are UNIQUE, so just take 1st Implement your own logic to choose from multiple matched dataframe rows

lexnlp.utils.parse_df.get_entities(text: str, config: pandas.core.frame.DataFrame, parse_columns: Union[List[str], Tuple[str]], result_columns: Optional[dict] = None, preformed_entity: Optional[dict] = None, priority_sort_column: Optional[str] = None, priority_sort_ascending: bool = True, cell_values_separator: Optional[str] = ';', unique_column_values: bool = True) → Generator

Simple wrapper around DataframeEntityParser

lexnlp.utils.parse_df.get_entity_list(text: str, config: pandas.core.frame.DataFrame, parse_columns: Union[List[str], Tuple[str]], result_columns: Optional[dict] = None, preformed_entity: Optional[dict] = None, priority_sort_column: Optional[str] = None, priority_sort_ascending: bool = True, cell_values_separator: Optional[str] = ';', unique_column_values: bool = True) → List

Simple wrapper around DataframeEntityParser

Module contents