Decomposition

class sklego.decomposition.PCAOutlierDetection(n_components=None, threshold=None, variant='relative', whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', random_state=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.OutlierMixin

Does outlier detection based on the reconstruction error from PCA.

decision_function(X)[source]
difference(X)[source]

Shows the calculated difference between original and reconstructed data. Row by row.

Parameters:X – array-like, shape=(n_columns, n_samples, ) training data.
Returns:array, shape=(n_samples,) the difference
fit(X, y=None)[source]

Fit the model using X as training data.

Parameters:
  • X – array-like, shape=(n_columns, n_samples,) training data.
  • y – ignored but kept in for pipeline support
Returns:

Returns an instance of self.

predict(X)[source]

Predict if a point is an outlier.

Parameters:X – array-like, shape=(n_columns, n_samples, ) training data.
Returns:array, shape=(n_samples,) the predicted data. 1 for inliers, -1 for outliers.
score_samples(X)[source]
transform(X)[source]

Uses the underlying PCA method to transform the data.

class sklego.decomposition.UMAPOutlierDetection(n_components=2, threshold=None, variant='relative', n_neighbors=15, min_dist=0.1, metric='euclidean', random_state=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.OutlierMixin

Does outlier detection based on the reconstruction error from UMAP.

difference(X)[source]

Shows the calculated difference between original and reconstructed data. Row by row.

Parameters:X – array-like, shape=(n_columns, n_samples, ) training data.
Returns:array, shape=(n_samples,) the difference
fit(X, y=None)[source]

Fit the model using X as training data.

Parameters:
  • X – array-like, shape=(n_columns, n_samples,) training data.
  • y – ignored but kept in for pipeline support
Returns:

Returns an instance of self.

predict(X)[source]

Predict if a point is an outlier.

Parameters:X – array-like, shape=(n_columns, n_samples, ) training data.
Returns:array, shape=(n_samples,) the predicted data. 1 for inliers, -1 for outliers.
transform(X)[source]

Uses the underlying UMAP method to transform the data.