preprocess
_preprocess
This file is a base class for preprocessing module.
- class scalr.data.preprocess._preprocess.PreprocessorBase(**kwargs)[source]
Bases:
object
Base class for preprocessor
- fit(data: AnnData | AnnCollection, sample_chunksize: int) None [source]
A function to calculate attributes for transformation.
It is required only when you need to see the entire train data and calculate attributes, as required in StdScaler, etc. This method should not return anything, it should be used to store attributes that will be used by the transform method.
- Parameters:
data (Union[AnnData, AnnCollection]) – train_data in backed mode.
sample_chunksize (int) – several samples of data that can at most be loaded in memory.
- process_data(full_data: AnnData | AnnCollection, sample_chunksize: int, dirpath: str, num_workers: int = 1)[source]
A function to process the entire data chunkwise and write the processed data to disk.
- Parameters:
full_data (Union[AnnData, AnnCollection]) – Full data for transformation.
sample_chunksize (int) – Number of samples in one chunk.
dirpath (str) – Path to write the data to.
num_workers (int) – number of jobs to run in parallel for data writing.
- scalr.data.preprocess._preprocess.build_preprocessor(preprocessing_config: dict) tuple[PreprocessorBase, dict] [source]
Builder object to get a processor, updated preprocessing_config.
sample_norm
This file performs Sample-wise normalization on the data.
- class scalr.data.preprocess.sample_norm.SampleNorm(scaling_factor: float = 1.0)[source]
Bases:
PreprocessorBase
Class for Samplewise Normalization
standard_scale
This file performs standard scaler normalization on the data.
- class scalr.data.preprocess.standard_scale.StandardScaler(with_mean: bool = True, with_std: bool = True)[source]
Bases:
PreprocessorBase
Class for Standard Normalization
- calculate_mean(data: AnnData | AnnCollection, sample_chunksize: int) None [source]
Function to calculate mean for each feature in the train data
- Parameters:
data – Data to calculate the mean of.
sample_chunksize – Chunks of data that can be loaded into memory at once.
- Returns:
Nothing, stores mean per feature of the train data.
- calculate_std(data: AnnData | AnnCollection, sample_chunksize: int) None [source]
A function to calculate standard deviation for each feature in the train data.
- Parameters:
data – Data to calculate the standard deviation of
sample_chunksize – Chunks of data that can be loaded into memory at once.
- Returns:
Nothing, stores standard deviation per feature of the train data.
- fit(data: AnnData | AnnCollection, sample_chunksize: int) None [source]
This function calculate parameters for standard scaler object from the train data.
- Parameters:
data – Data to calculate the required parameters of.
sample_chunksize – Chunks of data that can be loaded into memory at once.
test_sample_norm
This is a test file for Sample-wise normalization.
test_standard_scale
This is a test file for standard-scaler normalization.