lexnlp.extract.en.pii: Extracting personally-identifiable information (PII)

The lexnlp.extract.en.pii module contains methods that allow for the extraction of personally identifying information from text. Examples include:

  • phone numbers
  • US social security numbers
  • names

The full list of current unit test cases can be found here: https://github.com/LexPredict/lexpredict-lexnlp/tree/master/test_data/lexnlp/extract/en/tests/test_pii

Extracting PII

lexnlp.extract.en.pii.get_pii(text: str, return_sources=False) → Generator

Find possible PII references in the text. :param text: :param return_sources: :return:

Example

>>> import lexnlp.extract.en.pii
>>> text = "John Doe (999-12-3456)"
>>> print(list(lexnlp.extract.en.pii.get_pii(text)))
[('ssn', '999-12-3456')]
>>> text = "Mary Doe (212-123-4567)"
>>> print(list(lexnlp.extract.en.pii.get_pii(text)))
[('us_phone', '(212) 123-4567')]
lexnlp.extract.en.pii.get_ssns(text, return_sources=False) → Generator

Find possible SSN references in the text.

lexnlp.extract.en.pii.get_us_phones(text: str, return_sources=False) → Generator

Find possible telephone numbers in the text.