lexnlp.extract.en.distances: Extracting distances

The lexnlp.extract.en.distances module contains methods that allow for the extraction of distance references from text. Distances that are covered by default in this module include:

  • km
  • kilometer
  • mile
  • miles
  • mi

The full list of current unit test cases can be found here: https://github.com/LexPredict/lexpredict-lexnlp/tree/master/test_data/lexnlp/extract/en/tests/test_distances

Extracting conditions

lexnlp.extract.en.distances.get_distances(text: str, return_sources=False, float_digits=4) → Generator


>>> import lexnlp.extract.en.distances
>>> text = "Within 50 miles of office."
>>> print(list(lexnlp.extract.en.distances.get_distances(text)))
[(50.0, 'mile')]

Customizing distance extraction

Distance extraction can be customized. There are three key module variables that store the default configuration and one function used to create a matching instance:

  • DISTANCE_TOKEN_MAP: This Dictionary stores the map from tokens to standard distance types. See customization example below.
  • DISTANCE_SYMBOL_MAP: This Dictionary stores the map from abbreviations to standard distance types. See customization example below.
  • DISTANCE_PTN: This String defines the regular expression pattern used to match distances.

The default behavior of this module can be customized by overriding the value of DISTANCE_PTN_RE with a new regular expression. The example below demonstrates a simple addition of a new distance:

>>> # Out of the box behavior
>>> import lexnlp.extract.en.conditions
>>> text = "This improvement shall extend for no more than fifteen yards."
>>> print(list(lexnlp.extract.en.distances.get_distances(text)))

>>> # Customize the regular expression pattern
>>> import regex as re
>>> import lexnlp.extract.en.amounts
>>> lexnlp.extract.en.distances.DISTANCE_TOKEN_MAP["yard"] = "yard"
>>> lexnlp.extract.en.distances.DISTANCE_TOKEN_MAP["yards"] = "yard"
>>> lexnlp.extract.en.distances.DISTANCE_SYMBOL_MAP["yd"] = "yard"
>>> lexnlp.extract.en.distances.DISTANCE_PTN = r"""
    num_ptn=lexnlp.extract.en.amounts.NUM_PTN.replace('(?:\\W|$)', '').replace('(?<=\\W|^)', ''),
>>> lexnlp.extract.en.distances.DISTANCE_PTN_RE = re.compile(lexnlp.extract.en.distances.DISTANCE_PTN,

>>> # Run the method again to test
>>> print(list(lexnlp.extract.en.distances.get_distances(text)))
[(15, 'yard')]