get_geoentities¶
-
lexnlp.extract.en.geoentities.
get_geoentities
()¶ Searches for geo entities from the provided config list and yields pairs of (entity, alias). Entity is: (entity_id, name, [list of aliases]) Alias is: (alias_text, lang, is_abbrev, alias_id)
This method uses general searching routines for dictionary entities from dict_entities.py module. Methods of dict_entities module can be used for comfortable creating the config: entity_config(), entity_alias(), add_aliases_to_entity(). :param text: :param geo_config_list: List of all possible known geo entities in the form of tuples (id, name, [(alias, lang, is_abbrev, alias_id), …]). :param priority: If two entities found with the totally equal matching aliases - then use the one with the greatest priority field. :param priority_by_id: If two entities found with the totally equal matching aliases - then use the one with the lowest id. :param text_languages: Language(s) of the source text. If a language is specified then only aliases of this language will be searched for. For example: this allows ignoring “Island” - a German language
alias of Iceland for English texts.Parameters: - min_alias_len – Minimal length of geo entity aliases to search for.
- prepared_alias_black_list – List of aliases to exclude from searching in the form: dict of lang -> (list of normalized non-abbreviation aliases, list of normalized abbreviation aliases). Use dict_entities.prepare_alias_blacklist_dict() for preparing this dict.
Returns: Generates tuples: (entity, alias)