preprocessing.tabular¶
-
ayniy.preprocessing.tabular.
aggregation
(train: pandas.core.frame.DataFrame, test: pandas.core.frame.DataFrame, groupby_dict: dict, nunique_dict: dict) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶ Aggregation
- Parameters
train (pd.DataFrame) – train
test (pd.DataFrame) – test
groupby_dict (dict) – settings for groupby
nunique_dict (dict) – settings for nunique
- Returns
train, test
- Return type
Tuple[pd.DataFrame, pd.DataFrame]
-
ayniy.preprocessing.tabular.
circle_encoding
(train: pandas.core.frame.DataFrame, test: pandas.core.frame.DataFrame, encode_col: List[str]) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶ Circle encoding
- Parameters
train (pd.DataFrame) – train
test (pd.DataFrame) – test
encode_col (List[str]) – encoded columns
- Returns
train, test
- Return type
Tuple[pd.DataFrame, pd.DataFrame]
-
ayniy.preprocessing.tabular.
count_null
(train: pandas.core.frame.DataFrame, test: pandas.core.frame.DataFrame, encode_col: List[str]) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶ Count NaN
- Parameters
train (pd.DataFrame) – train
test (pd.DataFrame) – test
encode_col (List[str]) – encoded columns
- Returns
train, test
- Return type
Tuple[pd.DataFrame, pd.DataFrame]
-
ayniy.preprocessing.tabular.
datetime_parser
(train: pandas.core.frame.DataFrame, test: pandas.core.frame.DataFrame, encode_col: List[str]) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶ Datetime columns parser
- Parameters
train (pd.DataFrame) – train
test (pd.DataFrame) – test
encode_col (List[str]) – encoded columns
- Returns
train, test
- Return type
Tuple[pd.DataFrame, pd.DataFrame]
-
ayniy.preprocessing.tabular.
detect_delete_cols
(train: pandas.core.frame.DataFrame, test: pandas.core.frame.DataFrame, escape_col: List[str], threshold: float) → Tuple[List, List, List]¶ Detect unnecessary columns for deleting
- Parameters
train (pd.DataFrame) – train
test (pd.DataFrame) – test
escape_col (List[str]) – columns not encoded
threshold (float) – deleting threshold for correlations of columns
- Returns
unique_cols, duplicated_cols, high_corr_cols
- Return type
Tuple[List, List, List]
-
ayniy.preprocessing.tabular.
fillna
(train: pandas.core.frame.DataFrame, test: pandas.core.frame.DataFrame, encode_col: List[str], how: str) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶ Replace NaN
- Parameters
train (pd.DataFrame) – train
test (pd.DataFrame) – test
encode_col (List[str]) – encoded columns
how (str) – how to fill Nan, chosen from ‘median’ or ‘mean’
- Returns
train, test
- Return type
Tuple[pd.DataFrame, pd.DataFrame]
-
ayniy.preprocessing.tabular.
frequency_encoding
(train: pandas.core.frame.DataFrame, test: pandas.core.frame.DataFrame, encode_col: List[str]) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶ Frequency encoding
- Parameters
train (pd.DataFrame) – train
test (pd.DataFrame) – test
encode_col (List[str]) – encoded columns
- Returns
train, test
- Return type
Tuple[pd.DataFrame, pd.DataFrame]
-
ayniy.preprocessing.tabular.
matrix_factorization
(train: pandas.core.frame.DataFrame, test: pandas.core.frame.DataFrame, encode_col: List[str], n_components_lda: int = 5, n_components_svd: int = 3) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶ Matrix factorization
- Parameters
train (pd.DataFrame) – train
test (pd.DataFrame) – test
encode_col (List[str]) – encoded columns
n_components_lda (int, optional) – the output dimensions for lda. Defaults to 5.
n_components_svd (int, optional) – the output dimensions for svd. Defaults to 3.
- Returns
train, test
- Return type
Tuple[pd.DataFrame, pd.DataFrame]
-
ayniy.preprocessing.tabular.
standerize
(train: pandas.core.frame.DataFrame, test: pandas.core.frame.DataFrame, encode_col: List[str]) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]¶ Standerization
- Parameters
train (pd.DataFrame) – train
test (pd.DataFrame) – test
encode_col (List[str]) – encoded columns
- Returns
train, test
- Return type
Tuple[pd.DataFrame, pd.DataFrame]