DataCleaning

class persalys.DataCleaning(*args)

DataModel sample cleaning.

Allows one to remove/replace irrelevant/erroneous values on-the-fly.

Parameters:
sampleopenturns.Sample

Sample instance that requires cleaning. Original sample is copied and will not be modified. Use sample accessor for the modified sample version.

Methods

analyseSample()

Column by column sample analysis.

computeGeometricMAD()

Computes sample geometric median absolute deviation

computeMAD()

Computes sample median absolute deviation

getClassName()

Accessor to the object's name.

getGeometricMAD()

Geom.

getMAD()

MAD accessor.

getMean()

Mean accessor.

getMedian()

Median accessor.

getNanNumbers()

Returns number of Nans/Infs in each sample column.

getSample()

Sample accessor.

removeAllNans()

Removes Nans/Infs in sample.

removeNansByColumn(col)

Removes Nans/Infs in sample column.

replaceAllNans(point)

Replaces Nans/Infs in sample point by point values.

replaceNansByColumn(col, val)

Replaces Nans/Infs in sample column by value.

Examples

>>> import math
>>> import openturns as ot
>>> import persalys
>>> sample = ot.Sample(0,3)
>>> sample.add([4,2,4])
>>> sample.add([2,math.nan,4])
>>> sample.add([2,3,7])
>>> cleaner = persalys.DataCleaning(sample)
>>> cleaner.removeAllNans()
>>> cleaned_sample = cleaner.getSample()
__init__(*args)
analyseSample()

Column by column sample analysis. Allows marginals mean/median computation by ignoring Nans/Infs. Evaluates number of Nans/Infs for each marginal

computeGeometricMAD()

Computes sample geometric median absolute deviation

computeMAD()

Computes sample median absolute deviation

getClassName()

Accessor to the object’s name.

Returns:
class_namestr

The object class name (object.__class__.__name__).

getGeometricMAD()

Geom. MAD accessor.

Returns:
geomMadfloat
getMAD()

MAD accessor.

Returns:
MADopenturns.Point
getMean()

Mean accessor.

Returns:
meanopenturns.Point
getMedian()

Median accessor.

Returns:
medianopenturns.Point
getNanNumbers()

Returns number of Nans/Infs in each sample column.

Returns:
nNansopenturns.Point
getSample()

Sample accessor.

Returns:
sampleopenturns.Sample

Sample associated to the DataCleaning API. Constructed from a copy of the original sample and edited on the fly.

removeAllNans()

Removes Nans/Infs in sample.

removeNansByColumn(col)

Removes Nans/Infs in sample column.

Parameters:
colint

Column index to clean

replaceAllNans(point)

Replaces Nans/Infs in sample point by point values.

Parameters:
pointopenturns.Point

Replacement values

replaceNansByColumn(col, val)

Replaces Nans/Infs in sample column by value.

Parameters:
colint

Column index to clean

valfloat

Replacement value