lexnlp.utils package¶

Subpackages¶

Submodules¶

lexnlp.utils.decorators module¶

lexnlp.utils.decorators.safe_failure(func)¶: return None on failure, either skip result if generator

lexnlp.utils.iterating_helpers module¶

lexnlp.utils.iterating_helpers.collapse_sequence(sequence: collections.abc.Iterable, predicate: Callable[[Any, Any], Any], accumulator: Any = 0.0) → Any¶

lexnlp.utils.iterating_helpers.count_sequence_matches(sequence: collections.abc.Iterable, predicate: Callable[Any, bool]) → int¶

lexnlp.utils.map module¶

class lexnlp.utils.map.Map(*args, **kwargs)¶

Bases: dict

Example: m = Map(some_dict)

objectify(a_dict)¶

lexnlp.utils.parse_df module¶

class lexnlp.utils.parse_df.DataframeEntityParser(dataframe, parse_columns, result_columns=None, preformed_entity=None, priority_sort_column=None, priority_sort_ascending=True, cell_values_separator=';', unique_column_values=True, line_processor: lexnlp.utils.lines_processing.line_processor.LineProcessor = None)¶

Bases: object

Class that provides ability to extract entities from a text having some collection of entities formed as dataframe. By default it means that dataframe has UNIQUE values in those columns you use for search. Returns dict of start/end positions of found item in a text and other user-defined key-value pairs

Params:

dataframe: pandas.DataFrame with entities collection
parse_columns: list or tuple - these columns will be used to search their values in a text
result_columns: dict - map like {‘dataframe column name to take a value corresponding with extracted entity’: ‘new_column_name’}
preformed_entity: dict - initial, static key-value pairs to use for each extracted entity
priority_sort_column: str - column name to sort by and get first match if multiple results found, otherwise the first matched row will be used
priority_sort_ascending: bool - sort order for priority_sort_column
cell_values_separator: str or None - multiple values in datafame cell separated by that separator
unique_column_values: bool - dataframe columns have unique values

E.g.:

>>> parse_columns = ('Kurztitel', 'Titel', 'Abkürzung')
>>> result_columns = {'Titel': 'name'}
>>> preformed_entity = {'entity_type': 'Laws and Rules',
>>>                     'source': 'BaFin',
>>>                     'country': 'Germany'}
>>> sort_column = 'Titel'
>>> items = DataframeEntityParser(
>>>     df, parse_columns, result_columns, preformed_entity, sort_column).parse(text)

SEARCH_PTN = '(?:^|\\W)({})(?:\\W|$)'¶

get_collection_ptn(collection)¶: Convert list of values to regex pattern :param collection: list of entities to search in :return: compilled regex pattern

get_entities(text: str)¶

get_entities_from_text(text: str) → Generator[[dict, None], None]¶

get_entity_list(text)¶

get_formed_entity(match, col_name)¶: Get formed entity from matched row in dataframe :param match: re.match object :param col_name: df column name :return: dict

get_single_result(rows)¶: By default we mean that all values we filter by in dataframe are UNIQUE, so just take 1st Implement your own logic to choose from multiple matched dataframe rows

lexnlp.utils.parse_df.get_entities(text: str, config: pandas.core.frame.DataFrame, parse_columns: Union[List[str], Tuple[str]], result_columns: Optional[dict] = None, preformed_entity: Optional[dict] = None, priority_sort_column: Optional[str] = None, priority_sort_ascending: bool = True, cell_values_separator: Optional[str] = ';', unique_column_values: bool = True) → Generator¶: Simple wrapper around DataframeEntityParser

lexnlp.utils.parse_df.get_entity_list(text: str, config: pandas.core.frame.DataFrame, parse_columns: Union[List[str], Tuple[str]], result_columns: Optional[dict] = None, preformed_entity: Optional[dict] = None, priority_sort_column: Optional[str] = None, priority_sort_ascending: bool = True, cell_values_separator: Optional[str] = ';', unique_column_values: bool = True) → List¶: Simple wrapper around DataframeEntityParser

lexnlp.utils package¶

Subpackages¶

Submodules¶

lexnlp.utils.decorators module¶

lexnlp.utils.iterating_helpers module¶

lexnlp.utils.map module¶

lexnlp.utils.parse_df module¶

Module contents¶

LexNLP

Navigation

Related Topics