Basic usage#
In this section, we will illustrate how to use molflux.metrics
. These examples will provide you with a starting
point.
Browsing#
First, we’ll review which metrics are available for use. These are conveniently categorised (for example,
into regression
, classification
, etc.). To view what’s available you can do
from molflux.metrics import list_metrics
catalogue = list_metrics()
print(catalogue)
{'classification': ['accuracy', 'average_precision', 'balanced_accuracy', 'diversity_roc', 'f1_score', 'matthews_corrcoef', 'precision', 'recall', 'roc_auc', 'top_k_accuracy', 'top_k_accuracy_roc', 'validity_roc'], 'regression': ['explained_variance', 'max_error', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'out_of_sample_r2', 'pearson', 'proportion_within_fold', 'r2', 'root_mean_squared_error', 'spearman'], 'uncertainty': ['calibration_gap', 'coefficient_of_variation', 'expected_calibration_error', 'gaussian_nll', 'prediction_interval_coverage', 'prediction_interval_width', 'uncertainty_based_rejection']}
This returns a dictionary of available metrics (organised by categories and name
). There are a few to choose from.
To see how you can add your own metrics, see How to add your own metrics.
When computing metrics, it is often useful to compute all of the possible regression or classification metrics to get a better
idea of the model performance. To this end, molflux.metrics
provides you with the option to load an entire metric suite.
To view what suites are available, you can do
from molflux.metrics import list_suites
catalogue = list_suites()
print(catalogue)
['classification', 'regression', 'uncertainty']
Loading metrics#
Loading a molflux.metrics
metric is very easy, simply do
from molflux.metrics import load_metric
metric = load_metric(name="r2")
print(metric)
Metric(
name: "r2",
tag: "r2",
features: {'predictions': Value(dtype='float32', id=None), 'references': Value(dtype='float32', id=None)},
signature: self.compute(*, predictions: Any, references: Any, sample_weight: list[float] | None = None, multioutput: str = 'uniform_average', **kwargs: Any) -> dict[str, typing.Any],
description: """
R2 (coefficient of determination) regression score function.
Best possible score is 1.0 and it can be negative (because the model can be
arbitrarily worse). A constant model that always predicts the expected value of
y, disregarding the input features, would get a R2 score of 0.0.""",
usage: """
Args:
predictions: Estimated target values.
references: Ground truth (correct) target values.
sample_weight (optional): Weighting of each sample.
multioutput (optional): Defines aggregating of multiple output scores.
Array-like value defines weights used to average errors. Alternatively,
one of {'raw_values', 'uniform_average', 'variance_weighted'}. Defaults
to 'uniform_average'.
'raw_values' :
Returns a full set of errors in case of multioutput input.
'uniform_average' :
Errors of all outputs are averaged with uniform weight.
'variance_weighted' :
Scores of all outputs are averaged, weighted by the variances
of each individual output.
Returns:
r2: The R2 score or ndarray of scores if 'multioutput' is 'raw_values'.
Examples:
>>> from molflux.metrics import load_metric
>>> metric = load_metric("r2")
>>> predictions = [2.5, 0.0, 2, 8]
>>> references = [3, -0.5, 2, 7]
>>> metric.compute(predictions=predictions, references=references)
{'r2': 0.948...}
>>> references = [1, 2, 3]
>>> predictions = [1, 2, 3]
>>> metric.compute(predictions=predictions, references=references)
{'r2': 1.0}
>>> references = [1, 2, 3]
>>> predictions = [2, 2, 2]
>>> metric.compute(predictions=predictions, references=references)
{'r2': 0.0}
>>> references = [1, 2, 3]
>>> predictions = [3, 2, 1]
>>> metric.compute(predictions=predictions, references=references)
{'r2': -3.0}
>>> metric = load_metric("r2", config_name="multioutput")
>>> predictions = [[0, 2], [-1, 2], [8, -5]]
>>> references = [[0.5, 1], [-1, 1], [7, -6]]
>>> metric.compute(predictions=predictions, references=references, multioutput='variance_weighted')
{'r2': 0.938...}
"""
state: {}
)
By printing the loaded metric, you get more information about it. Each metric has a name
, and a tag
(to uniquely identify it in case you would like to generate multiple copies of the same metric but with different
configurations). You can also see the optional compute metric arguments (and their default values) in the signature.
There is also a short description of the metric.
To load a metric suite, you can do
from molflux.metrics import load_suite
suite = load_suite("regression")
print(suite)
Metrics(['explained_variance', 'max_error', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'pearson', 'r2', 'root_mean_squared_error', 'spearman'])
You can also load a metric from a config dictionary. A metrics config dictionary is a dictionary specifying the metric to be loaded. A config dictionary must have the following format
metrics_dict = {
'name': '<name of the metric>',
'config': '<kwargs for instantiating metric>'
'presets': '<kwarg presets for computing metric>'
}
The name
key specifies the name
of the metric to load from the catalogue. The config
key
specifies the arguments that are needed for instantiating the metric
and the presets
key specifies some preset kwargs to apply on computing the metric. If neither
is specified, the metric will use default values.
To load a metric from a config
from molflux.metrics import load_from_dict
config = {
'name': 'r2',
'presets': {
'sample_weight': [0.2, 0.4, 0.4],
},
}
metric = load_from_dict(config)
print(metric.state)
{'sample_weight': [0.2, 0.4, 0.4]}
You can also load multiple metrics all at once using a list of config dictionaries. This is done as follows
from molflux.metrics import load_from_dicts
list_of_configs = [
{
'name': 'r2',
'presets':
{
'sample_weight': [0.2, 0.4, 0.4],
},
},
{
'name': 'mean_squared_error',
}
]
metrics = load_from_dicts(list_of_configs)
print(metrics)
Metrics(['mean_squared_error', 'r2'])
Finally, you can load metrics from a yaml file. You can use a single yaml file which includes configs for all the molflux
tools,
and molflux.metrics
will know how to extract the relevant part it needs. To do so, you need to define a yaml file with the
following example format
---
version: v1
kind: metrics
specs:
- name: r2
presets:
sample_weight: [0.2, 0.4, 0.4]
- name: mean_squared_error
...
It consists of a version (this is the version of the config format, for now just v1
), kind
of config (in this case
metrics
), and specs
. specs
is where the keyword arguments used when computing the metric are defined. The
yaml file can include configs for other molflux
modules as well. To load this yaml file, you can simply do
from molflux.metrics import load_from_yaml
metrics = load_from_yaml(path_to_yaml_file)
print(metrics)
Computing metrics#
After loading a metric (or group of metrics), you can apply them to predictions to compute the metric values. You must pass the references (ground truths) and the predictions (from your model).
from molflux.metrics import load_metric
metric = load_metric("r2")
ground_truth = [0, 0.3, 0.5, 0.8, 1]
preds = [0.1, 0.35, 0.45, 0.68, 1.2]
results = metric.compute(predictions=preds, references=ground_truth)
print(results)
{'r2': 0.8894904143920545}
This will return a dictionary with the metric tag
as the key and the computed metric as the value. For a group
of metrics (or a metric suite), you can follow the same procedure
from molflux.metrics import load_from_dicts
list_of_configs = [
{
'name': 'r2',
'presets':
{
'sample_weight': [0.2, 0.2, 0.3, 0.1, 0.2],
},
},
{
'name': 'mean_squared_error',
}
]
metrics = load_from_dicts(list_of_configs)
ground_truth = [0, 0.3, 0.5, 0.8, 1]
preds = [0.1, 0.35, 0.45, 0.68, 1.2]
results = metrics.compute(predictions=preds, references=ground_truth)
print(results)
{'r2': 0.8914456457924692, 'mean_squared_error': 0.01388000398397501}
This will return a dictionary with all the computed metrics (where the tags
as the keys and the metrics as the values).