lexnlp.extract.en.durations: Extracting durations

The lexnlp.extract.en.durations module contains methods that allow for the extraction of durations from text. Statements that are covered by default in this module include:

  • after
  • smallest among

The full list of current unit test cases can be found here: https://github.com/LexPredict/lexpredict-lexnlp/tree/master/test_data/lexnlp/extract/en/tests/test_durations

Extracting constraints

lexnlp.extract.en.durations.get_durations(text, return_sources=False, float_digits=4) → typing.Generator

Example

>>> import lexnlp.extract.en.durations
>>> text = "This Agreement shall terminate in nine (9) months."
>>> print(list(lexnlp.extract.en.durations.get_durations(text)))
[('month', 9.0, 270.0)]

>>> import lexnlp.extract.en.durations
>>> text = "The period shall not exceed a dozen seconds."
>>> print(list(lexnlp.extract.en.durations.get_durations(text)))
[('second', 12, 0.0001388888888888889)]

Customizing duration statement extraction

Duration statement extraction can be customized. There are three key module variables that store the default configuration:

  • DURATION_MAP: This List stores the “trigger” phrases that are used to identify constraint statements. They are typically adverbial or prepositional phrases.
  • DURATION_PTN: This String stores the regular expression pattern that drives matching in this module.

The default behavior of this module can be customized by overriding the value of DURATION_PTN_RE with a new regular expression created as the code below demonstrates. The example below demonstrates a simple addition of a new phrase:

>>> # Out of the box behavior
>>> import lexnlp.extract.en.durations
>>> text = "The Agreement shall terminate in two fortnights."
>>> print(list(lexnlp.extract.en.durations.get_durations(text)))
[]

>>> # Customize the durations
>>> import regex as re
>>> import lexnlp.extract.en.amounts
>>> lexnlp.extract.en.durations.DURATION_MAP["fortnight"] = 14
>>> lexnlp.extract.en.durations.DURATION_PTN = r"""
(({num_ptn})
(?:\s*(?:calendar|business|actual))?[\s-]*
({duration_list})s?)(?:\W|$)
""".format(
    num_ptn=lexnlp.extract.en.amounts.NUM_PTN,
    duration_list='|'.join(lexnlp.extract.en.durations.DURATION_MAP)
)
>>> lexnlp.extract.en.durations.DURATION_PTN_RE = re.compile(lexnlp.extract.en.durations.DURATION_PTN,
re.IGNORECASE | re.MULTILINE | re.DOTALL | re.VERBOSE)

>>> # Run the `get_durations` method again to test
>>> print(list(lexnlp.extract.en.durations.get_durations(text)))
[('fortnight', 2, 28)]