Data ingestion
This file is a class for data ingestion into the pipeline.
Eval and analysis
This file contains an implementation of model evaluation and performs downstream analysis tasks.
- class scalr.eval_and_analysis_pipeline.EvalAndAnalysisPipeline(analysis_config, dirpath, device)[source]
Bases:
object
Class for evaluation and analysis of the trained model.
- _perform_downstream_analysis(samples: str)[source]
A function to perform all downstream analysis tasks on model and data.
- Parameters:
'full'] (samples ['test' |) – indicates the samples to perform downstream analysis.
- evaluation_and_classification_report()[source]
A function to evaluate the trained model and generate classification report on test data.
- full_samples_downstream_anlaysis()[source]
A function to perform downstream analysis tasks on all samples data.
Note: The Model & DataLoader will not be passsed since it is assumed that a model is trained on the train data, so analysis by model should not be on full samples data.
- gene_analysis()[source]
A function to perform analysis on trained model to get top genes and biomarkers.
- load_data_and_targets_from_config(data_config: dict)[source]
A function to load data and targets from data config.
- Parameters:
data_config – Data config.
- set_data_and_targets(train_data: AnnData | AnnCollection, val_data: AnnData | AnnCollection, test_data: AnnData | AnnCollection, target: str | list[str], mappings: dict)[source]
A function to set data when you don’t use data directly from config, but rather by other sources like feature subsetting, etc.
- Parameters:
train_data (Union[AnnData, AnnCollection]) – Training data.
val_data (Union[AnnData, AnnCollection]) – Validation data.
target (Union[str, list[str]]) – Target columns name(s).
mappings (dict) – Mapping of a column value to ids eg. mappings[column_name][label2id] = {A: 1, B:2, …}.
Feature extraction
This file contains the implementation of feature subsetting, model training followed by top feature extraction.
- class scalr.feature_extraction_pipeline.FeatureExtractionPipeline(feature_selection_config, dirpath, device)[source]
Bases:
object
- feature_scoring() DataFrame [source]
A function to generate scores of each feature for each class using a scorer and chunked models.
- feature_subsetted_model_training() list[Module] [source]
This function train models on subsetted data containing feature_subsetsize genes.
- load_data_and_targets_from_config(data_config: dict)[source]
A function to load data and targets from data config.
- Parameters:
data_config – Data config.
- set_data_and_targets(train_data: AnnData | AnnCollection, val_data: AnnData | AnnCollection, target: str | list[str], mappings: dict, sample_chunksize: int = None)[source]
A function to set data when you don’t use data directly from config, but rather by other sources like feature subsetting, etc.
- Parameters:
train_data (Union[AnnData, AnnCollection]) – Training data.
val_data (Union[AnnData, AnnCollection]) – Validation data.
target (Union[str, list[str]]) – Target columns name(s).
mappings (dict) – Mapping of a column value to ids eg. mappings[column_name][label2id] = {A: 1, B:2, …}.
sample_chunksize (int) – Chunks of samples to be loaded in memory at once.
- set_model(models: list[Module])[source]
A function to set the trained model for downstream feature tasks.
Model training
This file contains an implementation for the model training pipeline.
- class scalr.model_training_pipeline.ModelTrainingPipeline(model_config: dict, train_config: dict, dirpath: str = None, device: str = 'cpu')[source]
Bases:
object
Class for Model training pipeline.
- build_model_training_artifacts()[source]
This function configures the model, optimizer, and loss function required for model training.
- build_optimizer(opt_config: dict = None)[source]
A function to build optimizer.
- Parameters:
opt_config (dict) – Optimizer config.
- load_data_and_targets_from_config(data_config: dict)[source]
A function to load data and targets from data config.
- Parameters:
data_config – Data config.
- set_data_and_targets(train_data: AnnData | AnnCollection, val_data: AnnData | AnnCollection, target: str | list[str], mappings: dict)[source]
A function to set data when you don’t use data directly from config, but rather by other sources like feature subsetting, etc.
- Parameters:
train_data (Union[AnnData, AnnCollection]) – Training data.
val_data (Union[AnnData, AnnCollection]) – Validation data.
target (Union[str, list[str]]) – Target columns name(s).
mappings (dict) – Mapping of a column value to ids eg. mappings[column_name][label2id] = {A: 1, B:2, …}.