lexnlp.utils package


lexnlp.utils.decorators module


return None on failure, either skip result if generator

lexnlp.utils.iterating_helpers module

lexnlp.utils.iterating_helpers.collapse_sequence(sequence: collections.abc.Iterable, predicate: Callable[[Any, Any], Any], accumulator: Any = 0.0) → Any
lexnlp.utils.iterating_helpers.count_sequence_matches(sequence: collections.abc.Iterable, predicate: Callable[Any, bool]) → int

lexnlp.utils.map module

class lexnlp.utils.map.Map(*args, **kwargs)

Bases: dict

Example: m = Map(some_dict)


lexnlp.utils.parse_df module

class lexnlp.utils.parse_df.DataframeEntityParser(dataframe, parse_columns, result_columns=None, preformed_entity=None, priority_sort_column=None, priority_sort_ascending=True, cell_values_separator=';', unique_column_values=True, line_processor: lexnlp.utils.lines_processing.line_processor.LineProcessor = None)

Bases: object

Class that provides ability to extract entities from a text having some collection of entities formed as dataframe. By default it means that dataframe has UNIQUE values in those columns you use for search. Returns dict of start/end positions of found item in a text and other user-defined key-value pairs

  • dataframe: pandas.DataFrame with entities collection
  • parse_columns: list or tuple - these columns will be used to search their values in a text
  • result_columns: dict - map like {‘dataframe column name to take a value corresponding with extracted entity’: ‘new_column_name’}
  • preformed_entity: dict - initial, static key-value pairs to use for each extracted entity
  • priority_sort_column: str - column name to sort by and get first match if multiple results found, otherwise the first matched row will be used
  • priority_sort_ascending: bool - sort order for priority_sort_column
  • cell_values_separator: str or None - multiple values in datafame cell separated by that separator
  • unique_column_values: bool - dataframe columns have unique values
>>> parse_columns = ('Kurztitel', 'Titel', 'Abkürzung')
>>> result_columns = {'Titel': 'name'}
>>> preformed_entity = {'entity_type': 'Laws and Rules',
>>>                     'source': 'BaFin',
>>>                     'country': 'Germany'}
>>> sort_column = 'Titel'
>>> items = DataframeEntityParser(
>>>     df, parse_columns, result_columns, preformed_entity, sort_column).parse(text)
SEARCH_PTN = '(?:^|\\W)({})(?:\\W|$)'

Convert list of values to regex pattern :param collection: list of entities to search in :return: compilled regex pattern

get_entities(text: str)
get_entities_from_text(text: str) → Generator[[dict, None], None]
get_formed_entity(match, col_name)

Get formed entity from matched row in dataframe :param match: re.match object :param col_name: df column name :return: dict


By default we mean that all values we filter by in dataframe are UNIQUE, so just take 1st Implement your own logic to choose from multiple matched dataframe rows

lexnlp.utils.parse_df.get_entities(text: str, config: pandas.core.frame.DataFrame, parse_columns: Union[List[str], Tuple[str]], result_columns: Optional[dict] = None, preformed_entity: Optional[dict] = None, priority_sort_column: Optional[str] = None, priority_sort_ascending: bool = True, cell_values_separator: Optional[str] = ';', unique_column_values: bool = True) → Generator

Simple wrapper around DataframeEntityParser

lexnlp.utils.parse_df.get_entity_list(text: str, config: pandas.core.frame.DataFrame, parse_columns: Union[List[str], Tuple[str]], result_columns: Optional[dict] = None, preformed_entity: Optional[dict] = None, priority_sort_column: Optional[str] = None, priority_sort_ascending: bool = True, cell_values_separator: Optional[str] = ';', unique_column_values: bool = True) → List

Simple wrapper around DataframeEntityParser

Module contents