DataCleaning¶
- class persalys.DataCleaning(*args)¶
DataModel sample cleaning.
Allows one to remove/replace irrelevant/erroneous values on-the-fly.
- Parameters:
- sampleopenturns.Sample
Sample instance that requires cleaning. Original sample is copied and will not be modified. Use sample accessor for the modified sample version.
Methods
Column by column sample analysis.
Computes sample geometric median absolute deviation
Computes sample median absolute deviation
Accessor to the object's name.
Geom.
getMAD()MAD accessor.
getMean()Mean accessor.
Median accessor.
Returns number of Nans/Infs in each sample column.
Sample accessor.
Removes Nans/Infs in sample.
removeNansByColumn(col)Removes Nans/Infs in sample column.
replaceAllNans(point)Replaces Nans/Infs in sample point by point values.
replaceNansByColumn(col, val)Replaces Nans/Infs in sample column by value.
Examples
>>> import math >>> import openturns as ot >>> import persalys >>> sample = ot.Sample(0,3) >>> sample.add([4,2,4]) >>> sample.add([2,math.nan,4]) >>> sample.add([2,3,7]) >>> cleaner = persalys.DataCleaning(sample) >>> cleaner.removeAllNans() >>> cleaned_sample = cleaner.getSample()
- __init__(*args)¶
- analyseSample()¶
Column by column sample analysis. Allows marginals mean/median computation by ignoring Nans/Infs. Evaluates number of Nans/Infs for each marginal
- computeGeometricMAD()¶
Computes sample geometric median absolute deviation
- computeMAD()¶
Computes sample median absolute deviation
- getClassName()¶
Accessor to the object’s name.
- Returns:
- class_namestr
The object class name (object.__class__.__name__).
- getGeometricMAD()¶
Geom. MAD accessor.
- Returns:
- geomMadfloat
- getMAD()¶
MAD accessor.
- Returns:
- getMean()¶
Mean accessor.
- Returns:
- mean
openturns.Point
- mean
- getMedian()¶
Median accessor.
- Returns:
- median
openturns.Point
- median
- getNanNumbers()¶
Returns number of Nans/Infs in each sample column.
- Returns:
- nNans
openturns.Point
- nNans
- getSample()¶
Sample accessor.
- Returns:
- sample
openturns.Sample Sample associated to the DataCleaning API. Constructed from a copy of the original sample and edited on the fly.
- sample
- removeAllNans()¶
Removes Nans/Infs in sample.
- removeNansByColumn(col)¶
Removes Nans/Infs in sample column.
- Parameters:
- colint
Column index to clean
- replaceAllNans(point)¶
Replaces Nans/Infs in sample point by point values.
- Parameters:
- pointopenturns.Point
Replacement values
- replaceNansByColumn(col, val)¶
Replaces Nans/Infs in sample column by value.
- Parameters:
- colint
Column index to clean
- valfloat
Replacement value