lexnlp.extract.en.entities package¶
Subpackages¶
Submodules¶
lexnlp.extract.en.entities.nltk_maxent module¶
Entity extraction for English using NLTK and NLTK pre-trained maximum entropy classifier.
This module implements basic entity extraction functionality in English relying on the pre-trained NLTK functionality, including POS tagger and NE (fuzzy) chunkers.
- Todo:
- Better define interface for sentences vs. raw text
- Standardize generator vs list
-
class
lexnlp.extract.en.entities.nltk_maxent.
CompanyNPExtractor
(grammar=None)¶ Bases:
lexnlp.extract.en.utils.NPExtractor
-
cleanup_leaves
(leaves)¶
-
get_tokenizer
()¶
-
static
strip_np
(np)¶
-
-
lexnlp.extract.en.entities.nltk_maxent.
contains_companies
(person: str, companies) → bool¶
-
lexnlp.extract.en.entities.nltk_maxent.
get_companies
(text: str, strict: bool = False, use_gnp: bool = False, detail_type: bool = False, count_unique: bool = False, name_upper: bool = False, parse_name_abbr: bool = False, return_source: bool = False)¶ Find company names in text, optionally using the stricter article/prefix expression. :param text: :param strict: :param use_gnp: use get_noun_phrases or NPExtractor :param detail_type: return detailed type (type, unified type, label) vs type only :param name_upper: return company name in upper case. :param count_unique: return only unique companies - case insensitive. :param parse_name_abbr: return company abbreviated name if exists. :param return_source: :return:
-
lexnlp.extract.en.entities.nltk_maxent.
get_company_annotations
(text: str, strict: bool = False, use_gnp: bool = False, count_unique: bool = False, name_upper: bool = False) → Generator[[lexnlp.extract.common.annotations.company_annotation.CompanyAnnotation, None], None]¶ Find company names in text, optionally using the stricter article/prefix expression. :param parse_name_abbr: :param text: :param strict: :param use_gnp: use get_noun_phrases or NPExtractor :param name_upper: return company name in upper case. :param count_unique: return only unique companies - case insensitive. :return:
-
lexnlp.extract.en.entities.nltk_maxent.
get_geopolitical
(text, strict=False, return_source=False, window=2) → Generator¶ Get GPEs from text. :param window: :param return_source: :param strict: :param text: :return:
-
lexnlp.extract.en.entities.nltk_maxent.
get_noun_phrases
(text, strict=False, return_source=False, window=3, valid_punctuation=None) → Generator¶ Get NNP phrases from text. :param window: :param return_source: :param strict: :param text: :return:
-
lexnlp.extract.en.entities.nltk_maxent.
get_persons
(text, strict=False, return_source=False, window=2) → Generator¶ Get names from text. :param window: :param return_source: :param strict: :param text: :return:
lexnlp.extract.en.entities.nltk_re module¶
Entity extraction for English using NLTK and basic regular expressions with master data.
This module implements basic entity extraction functionality in English, but does NOT rely on the pre-trained NLTK maximum entropy classifier. Instead, it uses the NLTK English grammar in combination with regular expressions and tested master data re: company types and abbreviations (e.g., LLC).
- Todo:
- Better define interface for sentences vs. raw text
- Standardize generator vs list
-
lexnlp.extract.en.entities.nltk_re.
check_backtrack_catastrophy
(text: str) → bool¶
-
lexnlp.extract.en.entities.nltk_re.
create_company_pattern
(company_pattern_template=None, company_name_pattern=None, company_type_list=None, company_description_list=None, article_pattern='')¶ Create a company pattern for regular expression. :param company_pattern_template: :param company_name_pattern: :param article_pattern: :param company_type_list: :param company_description_list: :return:
-
lexnlp.extract.en.entities.nltk_re.
get_companies
(text: str, use_article: bool = False, use_sentence_splitter: bool = True) → Generator[[lexnlp.extract.common.annotations.company_annotation.CompanyAnnotation, None], None]¶ Find company names in text, optionally using the stricter article/prefix expression.
-
lexnlp.extract.en.entities.nltk_re.
get_company_description_pipe
(company_description_list=None)¶
-
lexnlp.extract.en.entities.nltk_re.
get_company_type_pipe
(company_type_list=None)¶
-
lexnlp.extract.en.entities.nltk_re.
get_parties_as
(text: str, detail_type=False) → Generator[[Tuple[str, str, str, str], None], None]¶ Parameters: - text – source text to search for companies
- detail_type – obsolete
Returns: parties: [(name, company type, company description, party type), …]
lexnlp.extract.en.entities.nltk_tokenizer module¶
-
class
lexnlp.extract.en.entities.nltk_tokenizer.
NltkTokenizer
(punctuation: Optional[List[Any]] = None, starting_quotes: Optional[Any] = None)¶ Bases:
nltk.tokenize.treebank.TreebankWordTokenizer
It’s almost a copy of TreebankWordTokenizer, but NltkTokenizer allows changing punctuation and starting_quotes settings
-
tokenize
(text, convert_parentheses=False, return_str=False)¶ Return a tokenized copy of s.
Return type: list of str
-
lexnlp.extract.en.entities.stanford_ner module¶
Entity extraction for English using Stanford Named Entity Recognition (NER).
This module implements basic entity extraction functionality in English relying on the pre-trained Stanford NLP NER classifiers.
- Todo:
- Better define interface for sentences vs. raw text
- Standardize generator vs list
-
lexnlp.extract.en.entities.stanford_ner.
get_locations
(text, strict=False, return_source=False, window=2) → Generator¶ Get locations from text using Stanford libraries. :param window: :param return_source: :param strict: :param text: :return:
-
lexnlp.extract.en.entities.stanford_ner.
get_model_file
(language)¶ Return the appropriate model file for each language. :param language: :return:
-
lexnlp.extract.en.entities.stanford_ner.
get_organizations
(text, strict=False, return_source=False, window=2) → Generator¶ Get organizations from text using Stanford libraries. :param window: :param return_source: :param strict: :param text: :return:
-
lexnlp.extract.en.entities.stanford_ner.
get_persons
(text, strict=False, return_source=False, window=2) → Generator¶ Get persons from text using Stanford libraries. :param window: :param return_source: :param strict: :param text: :return: