Models

Models#

molflux models can be stored into a standard format for packaging machine learning models that can be used in a variety of our downstream tools - for example, real-time serving through a REST API or local batch inference.

Storage Format#

Ultimately, molflux models are a directory containing arbitrary files. Most importantly, they include the serialised molflux.modelzoo model that you have trained, and metadata about the molflux.features featurisation config applied at model training time to generate the model’s input features.

# Directory written by molflux.core.save_model(model, path="my_model", featurisation_metadata=...)
my_model/
├── model_config.json
├── model_artefacts/
|   └── model.pkl
├── featurisation_metadata.json
└── requirements.txt

Additional Logged files#

For environment recreation, we automatically log a requirements.txt file whenever a model is logged. This file can then be used to reinstall dependencies using conda or virtualenv with pip.

Models API#

The productionising .models API lets you save and load molflux models ready for deployment. To save a model:

from molflux.core.models import save_model

# model = <your-trained-model>
# featurisation_metadata = <your-featurisation-metadata>

save_model(model, path="my_model", featurisation_metadata=featurisation_metadata)

To load a model locally:

from molflux.core.models import load_model

model = load_model(path="my_model")

Scoring API#

The productionising .scoring API implements a utility function for scoring your models on any given fold against an arbitrary suite of metrics:

from molflux.core.scoring import score_model

# model = <your-trained-model>
# fold = <the-fold-on-which-to-evaluate-the-model>
# metrics <the-metrics-to-use-to-evaluate-the-model>

scores = score_model(model, fold=fold, metrics=metrics)