scalr
data_ingestion_pipeline
This file is a class for data ingestion into the pipeline.
eval_and_analysis_pipeline
This file contains an implementation of model evaluation and performs downstream analysis tasks.
- class scalr.eval_and_analysis_pipeline.EvalAndAnalysisPipeline(analysis_config, dirpath, device)[source]
Bases:
object
Class for evaluation and analysis of the trained model.
- evaluation_and_classification_report()[source]
A function to evaluate the trained model and generate classification report on test data.
- gene_analysis()[source]
A function to perform analysis on trained model to get top genes and biomarkers.
- load_data_and_targets_from_config(data_config: dict)[source]
A function to load data and targets from data config.
- Parameters:
data_config – Data config.
- perform_downstream_anlaysis()[source]
A function to perform all downstream analysis tasks on model and data.
- set_data_and_targets(train_data: AnnData | AnnCollection, val_data: AnnData | AnnCollection, test_data: AnnData | AnnCollection, target: str | list[str], mappings: dict)[source]
A function to set data when you don’t use data directly from config, but rather by other sources like feature subsetting, etc.
- Parameters:
train_data (Union[AnnData, AnnCollection]) – Training data.
val_data (Union[AnnData, AnnCollection]) – Validation data.
target (Union[str, list[str]]) – Target columns name(s).
mappings (dict) – Mapping of a column value to ids eg. mappings[column_name][label2id] = {A: 1, B:2, …}.
feature_extraction_pipeline
This file contains the implementation of feature subsetting, model training followed by top feature extraction.
- class scalr.feature_extraction_pipeline.FeatureExtractionPipeline(feature_selection_config, dirpath, device)[source]
Bases:
object
- feature_scoring() DataFrame [source]
A function to generate scores of each feature for each class using a scorer and chunked models.
- feature_subsetted_model_training() list[Module] [source]
This function train models on subsetted data containing feature_subsetsize genes.
- load_data_and_targets_from_config(data_config: dict)[source]
A function to load data and targets from data config.
- Parameters:
data_config – Data config.
- set_data_and_targets(train_data: AnnData | AnnCollection, val_data: AnnData | AnnCollection, target: str | list[str], mappings: dict)[source]
A function to set data when you don’t use data directly from config, but rather by other sources like feature subsetting, etc.
- Parameters:
train_data (Union[AnnData, AnnCollection]) – Training data.
val_data (Union[AnnData, AnnCollection]) – Validation data.
target (Union[str, list[str]]) – Target columns name(s).
mappings (dict) – Mapping of a column value to ids eg. mappings[column_name][label2id] = {A: 1, B:2, …}.
- set_model(models: list[Module])[source]
A function to set the trained model for downstream feature tasks.
model_training_pipeline
This file contains an implementation for the model training pipeline.
- class scalr.model_training_pipeline.ModelTrainingPipeline(model_config: dict, train_config: dict, dirpath: str | None = None, device: str = 'cpu')[source]
Bases:
object
Class for Model training pipeline.
- build_model_training_artifacts()[source]
This function configures the model, optimizer, and loss function required for model training.
- build_optimizer(opt_config: dict | None = None)[source]
A function to build optimizer.
- Parameters:
opt_config (dict) – Optimizer config.
- load_data_and_targets_from_config(data_config: dict)[source]
A function to load data and targets from data config.
- Parameters:
data_config – Data config.
- set_data_and_targets(train_data: AnnData | AnnCollection, val_data: AnnData | AnnCollection, target: str | list[str], mappings: dict)[source]
A function to set data when you don’t use data directly from config, but rather by other sources like feature subsetting, etc.
- Parameters:
train_data (Union[AnnData, AnnCollection]) – Training data.
val_data (Union[AnnData, AnnCollection]) – Validation data.
target (Union[str, list[str]]) – Target columns name(s).
mappings (dict) – Mapping of a column value to ids eg. mappings[column_name][label2id] = {A: 1, B:2, …}.