Tracking

Contents

Tracking#

The molflux tracking API lets you log parameters, metrics, and other output files when running your machine learning code.

Tracking API#

The productionising .tracking API lets you log a variety of artefacts associated with a given model training experiment.

At the moment, the following generic utility functions are available:

log_params()

Logs an arbitrary collection of parameters to disk as json.


from molflux.core.tracking import log_params

# params = <arbitrary key-value pairs>

log_params(params, path="out/my_params.json")

log_dataset()

Logs a dataset to disk.

from molflux.core.tracking import log_dataset

# dataset = <arbitrary dataset>

log_dataset(dataset, path="out/my_dataset.parquet")

While these will allow you to log the corresponding objects according to standardised formats and conventions:

log_pipeline_config()

Logs your pipeline config dictionary.

from molflux.core.tracking import log_pipeline_config

# config = <your-pipeline-or-training-script-config>

log_pipeline_config(config, directory="out")

log_featurised_dataset()

Logs a featurised dataset to disk.

from molflux.core.tracking import log_featurised_dataset

# dataset = <arbitrary dataset>

log_featurised_dataset(dataset, directory="out")

log_splitting_strategy()

Logs splitting strategy metadata to disk.

from molflux.core.tracking import log_splitting_strategy

# splitting_strategy = <arbitrary splitting strategy>

log_splitting_strategy(splitting_strategy, directory="out")

log_fold()

Logs a fold (DatasetDict) to disk.

from molflux.core.tracking import log_fold

# fold = <arbitrary fold>

log_fold(fold, directory="out")

log_model_params()

Logs your model metadata.

from molflux.core.tracking import log_model_params

# model = <your-trained-model>

log_model_params(model, directory="out")

log_scores()

Logs a nested dictionary of key-value metrics for each predictive task and for each fold.

from molflux.core.scoring import score_model
from molflux.core.tracking import log_scores

# model = <your-model>
# fold = <your-dataset-dict>
# metrics = <your-metrics>

scores = score_model(model, fold=fold, metrics=metrics)
log_scores(scores, directory="out")