Pandas Utils

sklego.pandas_utils.add_lags(X, cols, lags, drop_na=True)[source]

Appends lag column(s).

Parameters:
  • X – array-like, shape=(n_columns, n_samples,) training data.
  • cols – column name(s) or index (indices).
  • lags – the amount of lag for each col.
  • drop_na – remove rows that contain NA values.
Returns:

pd.DataFrame | np.ndarray with only the selected cols.

Example:
>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3],
...                    [4, 5, 6],
...                    [7, 8, 9]],
...                    columns=['a', 'b', 'c'],
...                    index=[1, 2, 3])
>>> add_lags(df, 'a', [1]) # doctest: +NORMALIZE_WHITESPACE
   a  b  c  a1
1  1  2  3  4.0
2  4  5  6  7.0
>>> add_lags(df, ['a', 'b'], 2) # doctest: +NORMALIZE_WHITESPACE
   a  b  c  a2   b2
1  1  2  3  7.0  8.0
>>> import numpy as np
>>> X = np.array([[1, 2, 3],
...               [4, 5, 6],
...               [7, 8, 9]])
>>> add_lags(X, 0, [1])
array([[1, 2, 3, 4],
       [4, 5, 6, 7]])
>>> add_lags(X, 1, [-1, 1])
array([[4, 5, 6, 2, 8]])
sklego.pandas_utils.log_step(func=None, *, level=20)[source]

Decorates a function that transforms a pandas dataframe to add automated logging statements

Example:
>>> @log_step
... def remove_outliers(df, min_obs=5):
...     pass
>>> @log_step(level=logging.INFO)
... def remove_outliers(df, min_obs=5):
...     pass