DataCleaning

class persalys.DataCleaning(*args)

DataModel sample manipulation. Allows one to remove/replace irrelevant/erroneous values on-the-fly.

Parameters:
sampleopenturns.Sample

Sample instance that requires cleaning. Original sample is copied and will not be modified. Use sample accessor for the modified sample version.

Examples

>>> import math
>>> import openturns as ot
>>> import persalys
>>> sample = ot.Sample(0,3)
>>> sample.add([4,2,4])
>>> sample.add([2,math.nan,4])
>>> sample.add([2,3,7])
>>> cleaner = persalys.DataCleaning(sample)
>>> cleaner.removeAllNans()
>>> cleaned_sample = cleaner.getSample()

Methods

analyseSample()

Column by column sample analysis.

computeGeometricMAD()

Computes sample geometric median absolute deviation

computeMAD()

Computes sample median absolute deviation

getClassName()

Accessor to the object's name.

getGeometricMAD()

Geom.

getMAD()

MAD accessor Returns ------- MAD : openturns.Point

getMean()

Mean accessor

getMedian()

Median accessor

getNanNumbers()

Returns number of Nans/Infs in each sample column Returns ------- nNans : openturns.Point

getSample()

Sample accessor

removeAllNans()

Removes Nans/Infs in sample

removeNansByColumn(col)

Removes Nans/Infs in sample column

replaceAllNans(point)

Replaces Nans/Infs in sample point by point values

replaceNansByColumn(col, val)

Replaces Nans/Infs in sample column by value

__init__(*args)
analyseSample()

Column by column sample analysis. Allows marginals mean/median computation by ignoring Nans/Infs. Evaluates number of Nans/Infs for each marginal

computeGeometricMAD()

Computes sample geometric median absolute deviation

computeMAD()

Computes sample median absolute deviation

getClassName()

Accessor to the object’s name.

Returns:
class_namestr

The object class name (object.__class__.__name__).

getGeometricMAD()

Geom. MAD accessor Returns ——- geomMad : openturns.Scalar

getMAD()

MAD accessor Returns ——- MAD : openturns.Point

getMean()

Mean accessor

Returns:
meanopenturns.Point
getMedian()

Median accessor

Returns:
medianopenturns.Point
getNanNumbers()

Returns number of Nans/Infs in each sample column Returns ——- nNans : openturns.Point

getSample()

Sample accessor

Returns:
sampleopenturns.Sample

Sample associated to the DataCleaning API. Constructed from a copy of the original sample and edited on the fly.

removeAllNans()

Removes Nans/Infs in sample

removeNansByColumn(col)

Removes Nans/Infs in sample column

Parameters:
colint

Column index to clean

replaceAllNans(point)

Replaces Nans/Infs in sample point by point values

Parameters:
pointopenturns.Point

Replacement values

replaceNansByColumn(col, val)

Replaces Nans/Infs in sample column by value

Parameters:
colint

Column index to clean

valopenturns.Scalar

Replacement value