lexnlp.extract.ml.detector package¶
Subpackages¶
Submodules¶
lexnlp.extract.ml.detector.artifact_detector module¶
-
class
lexnlp.extract.ml.detector.artifact_detector.
ArtifactDetector
¶ Bases:
object
-
build_amount_tokens
() → List[str]¶
-
load
(file_path: str)¶
-
load_compressed
(file_path: str)¶
-
load_from_stream
(stream: Any)¶
-
predict
(sample_df: pandas.core.frame.DataFrame, size_limit: int = 0) → Tuple[numpy.ndarray, numpy.ndarray]¶
-
predict_text
(text: str, join_settings: lexnlp.extract.ml.detector.phrase_constructor.PhraseConstructorSettings = None, feature_mask: List[int] = None) → Generator[[Tuple[int, int], None], None]¶
-
process_sample
(sample_df: pandas.core.frame.DataFrame, build_target_data: bool = False) → Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]¶
-
read_sample_df
(train_file: str, train_size: int) → pandas.core.frame.DataFrame¶
-
save_compressed_model
(save_path)¶
-
save_model
(save_path: str) → None¶
-
train_and_save
(settings: lexnlp.extract.ml.detector.detecting_settings.DetectingSettings, train_file: str, train_size: int = -1, save_path: str = '', compress: bool = False) → None¶ Create a percent identification model using tokens. :param settings: Model settings :param train_file: File to load training samples from :param train_size: Number of records to use :param save_path: Output (pickle model) file path :param compress: Save compressed file
-
train_and_save_on_tokens
(tokens: List[str], save_path: str, settings: lexnlp.extract.ml.detector.detecting_settings.DetectingSettings, train_sample_df: pandas.core.frame.DataFrame, punc_set: str = '.,/-', symbol_set: Optional[str] = None, string_checks: bool = False, compress: bool = False)¶
-
lexnlp.extract.ml.detector.detecting_settings module¶
-
class
lexnlp.extract.ml.detector.detecting_settings.
DetectingSettings
(use_spacy: bool = False, pre_window: int = 0, post_window: int = 0, model_type: str = 'random_forest')¶ Bases:
object
lexnlp.extract.ml.detector.phrase_constructor module¶
-
class
lexnlp.extract.ml.detector.phrase_constructor.
PhraseConstructor
¶ Bases:
object
Join “empty”, “start”, “middle” and “end” tokens into phrases.
-
DEFAULT_CONSTRUCTOR_SETTINGS
= by class, strict=False¶
-
DEFAULT_TOKEN_CLASSES
= <lexnlp.extract.ml.detector.phrase_constructor.PhraseTokenClasses object>¶
-
static
join_tokens
(tokens, predicted_class, feature_mask: List[int] = None, settings: lexnlp.extract.ml.detector.phrase_constructor.PhraseConstructorSettings = None, token_classes: lexnlp.extract.ml.detector.phrase_constructor.PhraseTokenClasses = None) → Generator[[Tuple[int, int], None], None]¶
-
static
join_tokens_by_class
(tokens, predicted_class, strict: bool = False, token_classes: lexnlp.extract.ml.detector.phrase_constructor.PhraseTokenClasses = None) → Generator[[Tuple[int, int], None], None]¶ Run model on text
-
static
join_tokens_by_score
(tokens, predicted_class, feature_mask: List[int] = None, max_zeros: int = 2, min_token_score: int = 2, token_classes: lexnlp.extract.ml.detector.phrase_constructor.PhraseTokenClasses = None) → Generator[[Tuple[int, int], None], None]¶ Run model on text
-
-
class
lexnlp.extract.ml.detector.phrase_constructor.
PhraseConstructorMethod
¶ Bases:
enum.Enum
An enumeration.
-
by_class
= 1¶
-
by_score
= 2¶
-
-
class
lexnlp.extract.ml.detector.phrase_constructor.
PhraseConstructorSettings
(method: lexnlp.extract.ml.detector.phrase_constructor.PhraseConstructorMethod = <PhraseConstructorMethod.by_class: 1>, strict: bool = False, max_zeros: int = 2, min_token_score: int = 2)¶ Bases:
object
-
class
lexnlp.extract.ml.detector.phrase_constructor.
PhraseTokenClasses
(outer_class: int = 0, start_class: int = 1, inner_class: int = 2, end_class: int = 3)¶ Bases:
object
lexnlp.extract.ml.detector.sample_processor module¶
-
lexnlp.extract.ml.detector.sample_processor.
get_target_start_end_from_corgetes
(_: str, column_name_formatted: str, row) → List[Tuple[int, int]]¶
-
lexnlp.extract.ml.detector.sample_processor.
get_target_start_end_from_text
(text: str, column_name_formatted: str, row) → List[Tuple[int, int]]¶
-
lexnlp.extract.ml.detector.sample_processor.
process_sample
(sample_df: pandas.core.frame.DataFrame, s: lexnlp.extract.ml.classifier.base_token_sequence_classifier_model.BaseTokenSequenceClassifierModel, build_target_data: bool = True, pre_alloc_multiple: int = 30, column_name_formatted: str = 'quantity_formatted', outer_class: int = 0, start_class: int = 1, inner_class: int = 2, end_class: int = 3, get_target_start_end: Callable[[str, str, Any], List[Tuple[int, int]]] = <function get_target_start_end_from_text>, feature_mask_column: Optional[str] = None) → Union[numpy.ndarray, Tuple[numpy.ndarray, numpy.ndarray]]¶ Process a sample file to create feature and target data. :param sample_df: dataframe with at least ‘sentence’ column :param s: TokenSequenceClassifierModel or SpacyTokenSequenceClassifierModel :param build_target_data: build target data vector (if true) :param pre_alloc_multiple: :param column_name_formatted: “quantity_formatted” or “noun_phrase_formatted” … :param outer_class: :param start_class: :param inner_class: :param end_class: :return: (feature_data, target_data) if build_target_data = True or just feature_data