get_geoentities

lexnlp.extract.en.geoentities.get_geoentities(text: str, geo_config_list: typing.List[typing.Tuple[int, str, typing.List[typing.Tuple[str, str, bool, int]]]], priority: bool = False, priority_by_id: bool = False, text_languages: typing.List[str] = None, min_alias_len: int = 2, prepared_alias_black_list: typing.Union[NoneType, typing.Dict[str, typing.Tuple[typing.List[str], typing.List[str]]]] = None) → typing.Generator[[typing.Tuple[typing.Tuple, typing.Tuple], typing.Any], typing.Any]

Searches for geo entities from the provided config list and yields pairs of (entity, alias). Entity is: (entity_id, name, [list of aliases]) Alias is: (alias_text, lang, is_abbrev, alias_id)

This method uses general searching routines for dictionary entities from dict_entities.py module. Methods of dict_entities module can be used for comfortable creating the config: entity_config(), entity_alias(), add_aliases_to_entity(). :param text: :param geo_config_list: List of all possible known geo entities in the form of tuples (id, name, [(alias, lang, is_abbrev, alias_id), …]). :param priority: If two entities found with the totally equal matching aliases - then use the one with the greatest priority field. :param priority_by_id: If two entities found with the totally equal matching aliases - then use the one with the lowest id. :param text_languages: Language(s) of the source text. If a language is specified then only aliases of this language will be searched for. For example: this allows ignoring “Island” - a German language

alias of Iceland for English texts.
Parameters:
  • min_alias_len – Minimal length of geo entity aliases to search for.
  • prepared_alias_black_list – List of aliases to exclude from searching in the form: dict of lang -> (list of normalized non-abbreviation aliases, list of normalized abbreviation aliases). Use dict_entities.prepare_alias_blacklist_dict() for preparing this dict.
Returns:

Generates tuples: (entity, alias)