preprocess

_preprocess

This file is a base class for preprocessing module.

class scalr.data.preprocess._preprocess.PreprocessorBase(**kwargs)[source]

Bases: object

Base class for preprocessor

fit(data: AnnData | AnnCollection, sample_chunksize: int) None[source]

A function to calculate attributes for transformation.

It is required only when you need to see the entire train data and calculate attributes, as required in StdScaler, etc. This method should not return anything, it should be used to store attributes that will be used by the transform method.

Parameters:
  • data (Union[AnnData, AnnCollection]) – train_data in backed mode.

  • sample_chunksize (int) – several samples of data that can at most be loaded in memory.

process_data(full_data: AnnData | AnnCollection, sample_chunksize: int, dirpath: str)[source]

A function to process the entire data chunkwise and write the processed data to disk.

Parameters:
  • full_data (Union[AnnData, AnnCollection]) – Full data for transformation.

  • sample_chunksize (int) – Number of samples in one chunk.

  • dirpath (str) – Path to write the data to.

transform(data: ndarray) ndarray[source]

A required function to transform a numpy array.

Parameters:

data (np.ndarray) – Input raw data.

Returns:

Processed data.

Return type:

np.ndarray

scalr.data.preprocess._preprocess.build_preprocessor(preprocessing_config: dict) tuple[PreprocessorBase, dict][source]

Builder object to get a processor, updated preprocessing_config.

sample_norm

This file performs Sample-wise normalization on the data.

class scalr.data.preprocess.sample_norm.SampleNorm(scaling_factor: float = 1.0)[source]

Bases: PreprocessorBase

Class for Samplewise Normalization

classmethod get_default_params() dict[source]

Class method to get default params for preprocess_config.

transform(data: ndarray) ndarray[source]

A function to transform provided input data.

Parameters:

data (np.ndarray) – Input raw data.

Returns:

Processed data.

Return type:

np.ndarray

standard_scale

This file performs standard scaler normalization on the data.

class scalr.data.preprocess.standard_scale.StandardScaler(with_mean: bool = True, with_std: bool = True)[source]

Bases: PreprocessorBase

Class for Standard Normalization

calculate_mean(data: AnnData | AnnCollection, sample_chunksize: int) None[source]

Function to calculate mean for each feature in the train data

Parameters:
  • data – Data to calculate the mean of.

  • sample_chunksize – Chunks of data that can be loaded into memory at once.

Returns:

Nothing, stores mean per feature of the train data.

calculate_std(data: AnnData | AnnCollection, sample_chunksize: int) None[source]

A function to calculate standard deviation for each feature in the train data.

Parameters:
  • data – Data to calculate the standard deviation of

  • sample_chunksize – Chunks of data that can be loaded into memory at once.

Returns:

Nothing, stores standard deviation per feature of the train data.

fit(data: AnnData | AnnCollection, sample_chunksize: int) None[source]

This function calculate parameters for standard scaler object from the train data.

Parameters:
  • data – Data to calculate the required parameters of.

  • sample_chunksize – Chunks of data that can be loaded into memory at once.

classmethod get_default_params() dict[source]

Class method to get default params for preprocess_config.

transform(data: ndarray) ndarray[source]

A function to transform provided input data.

Parameters:

data (np.ndarray) – raw data

Returns:

processed data

Return type:

np.ndarray

test_sample_norm

This is a test file for Sample-wise normalization.

scalr.data.preprocess.test_sample_norm.test_transform()[source]

This function tests the transform function of Sample-wise normalization.

There is no fit() involved in Sample-wise normalization.

test_standard_scale

This is a test file for standard-scaler normalization.

scalr.data.preprocess.test_standard_scale.test_fit()[source]

This function tests fit() function of sample-norm normalization.

fit() function is enough for testing, as we can compare mean and std with sklean standard-scaler object params.