lexnlp.extract.en.addresses package

Submodules

lexnlp.extract.en.addresses.address_features module

Features extraction for addresses detecting classifier.

lexnlp.extract.en.addresses.address_features.build_country_words()
lexnlp.extract.en.addresses.address_features.build_provinces_words()
lexnlp.extract.en.addresses.address_features.get_word_features(word: str, part_of_speech: str) → List[int]
lexnlp.extract.en.addresses.address_features.is_datetime(word: str) → bool
lexnlp.extract.en.addresses.address_features.is_email(word: str) → bool
lexnlp.extract.en.addresses.address_features.is_lowercase_char(word: str) → bool
lexnlp.extract.en.addresses.address_features.is_single_initial(word: str) → bool
lexnlp.extract.en.addresses.address_features.is_uppercase_char(word: str) → bool
lexnlp.extract.en.addresses.address_features.is_url(word: str) → bool
lexnlp.extract.en.addresses.address_features.is_zip_code(s: str) → bool
lexnlp.extract.en.addresses.address_features.prepare_pos_tagset_index_file()

lexnlp.extract.en.addresses.addresses module

Addresses extraction for English language.

class lexnlp.extract.en.addresses.addresses.Address(zip_code: str, country: str, state: str, city: str, addr1: str, addr2: str)

Bases: object

members()
class lexnlp.extract.en.addresses.addresses.NGramType

Bases: object

ADDR_END = 3
ADDR_MIDDLE = 2
ADDR_START = 1
OTHER = 0
lexnlp.extract.en.addresses.addresses.align_tokens(tokens, sentence)

Copy of the same function from nltk fixing processing of double quotes. :param tokens: :param sentence: :return:

lexnlp.extract.en.addresses.addresses.cleanup(address: str) → str
lexnlp.extract.en.addresses.addresses.get_address_spans(text: str) → Generator[[Tuple[str, int, int], None], None]
lexnlp.extract.en.addresses.addresses.get_addresses(text: str) → Generator[[str, None], None]
lexnlp.extract.en.addresses.addresses.load_classifier()
lexnlp.extract.en.addresses.addresses.prepare_ngrams_in_text(text: str, window_half_width: int, window_step: int) → Generator[[Tuple[List[int], List[str], int, int], None], None]

Module contents