Basic usage#

In this section, we will illustrate how to use molflux.metrics. These examples will provide you with a starting point.

Browsing#

First, we’ll review which metrics are available for use. These are conveniently categorised (for example, into regression, classification, etc.). To view what’s available you can do

from molflux.metrics import list_metrics

catalogue = list_metrics()

print(catalogue)
{'classification': ['accuracy', 'average_precision', 'balanced_accuracy', 'diversity_roc', 'f1_score', 'matthews_corrcoef', 'precision', 'recall', 'roc_auc', 'top_k_accuracy', 'top_k_accuracy_roc', 'validity_roc'], 'regression': ['explained_variance', 'max_error', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'out_of_sample_r2', 'pearson', 'proportion_within_fold', 'r2', 'root_mean_squared_error', 'spearman'], 'uncertainty': ['calibration_gap', 'coefficient_of_variation', 'expected_calibration_error', 'gaussian_nll', 'prediction_interval_coverage', 'prediction_interval_width', 'uncertainty_based_rejection']}

This returns a dictionary of available metrics (organised by categories and name). There are a few to choose from. To see how you can add your own metrics, see How to add your own metrics.

When computing metrics, it is often useful to compute all of the possible regression or classification metrics to get a better idea of the model performance. To this end, molflux.metrics provides you with the option to load an entire metric suite. To view what suites are available, you can do

from molflux.metrics import list_suites

catalogue = list_suites()

print(catalogue)
['classification', 'regression', 'uncertainty']

Loading metrics#

Loading a molflux.metrics metric is very easy, simply do

from molflux.metrics import load_metric

metric = load_metric(name="r2")

print(metric)
Metric(
	name: "r2",
	tag: "r2",
	features: {'predictions': Value(dtype='float32', id=None), 'references': Value(dtype='float32', id=None)},
	signature: self.compute(*, predictions: Any, references: Any, sample_weight: list[float] | None = None, multioutput: str = 'uniform_average', **kwargs: Any) -> dict[str, typing.Any],
	description: """
R2 (coefficient of determination) regression score function.

Best possible score is 1.0 and it can be negative (because the model can be
arbitrarily worse). A constant model that always predicts the expected value of
y, disregarding the input features, would get a R2 score of 0.0.""",
	usage: """
Args:
    predictions: Estimated target values.
    references: Ground truth (correct) target values.
    sample_weight (optional): Weighting of each sample.
    multioutput (optional): Defines aggregating of multiple output scores.
        Array-like value defines weights used to average errors. Alternatively,
        one of {'raw_values', 'uniform_average', 'variance_weighted'}. Defaults
        to 'uniform_average'.
        'raw_values' :
            Returns a full set of errors in case of multioutput input.
        'uniform_average' :
            Errors of all outputs are averaged with uniform weight.
        'variance_weighted' :
            Scores of all outputs are averaged, weighted by the variances
            of each individual output.

Returns:
    r2: The R2 score or ndarray of scores if 'multioutput' is 'raw_values'.

Examples:
    >>> from molflux.metrics import load_metric
    >>> metric = load_metric("r2")
    >>> predictions = [2.5, 0.0, 2, 8]
    >>> references = [3, -0.5, 2, 7]
    >>> metric.compute(predictions=predictions, references=references)
    {'r2': 0.948...}
    >>> references = [1, 2, 3]
    >>> predictions = [1, 2, 3]
    >>> metric.compute(predictions=predictions, references=references)
    {'r2': 1.0}
    >>> references = [1, 2, 3]
    >>> predictions = [2, 2, 2]
    >>> metric.compute(predictions=predictions, references=references)
    {'r2': 0.0}
    >>> references = [1, 2, 3]
    >>> predictions = [3, 2, 1]
    >>> metric.compute(predictions=predictions, references=references)
    {'r2': -3.0}
    >>> metric = load_metric("r2", config_name="multioutput")
    >>> predictions = [[0, 2], [-1, 2], [8, -5]]
    >>> references = [[0.5, 1], [-1, 1], [7, -6]]
    >>> metric.compute(predictions=predictions, references=references, multioutput='variance_weighted')
    {'r2': 0.938...}
"""
	state: {}
)

By printing the loaded metric, you get more information about it. Each metric has a name, and a tag (to uniquely identify it in case you would like to generate multiple copies of the same metric but with different configurations). You can also see the optional compute metric arguments (and their default values) in the signature. There is also a short description of the metric.

To load a metric suite, you can do

from molflux.metrics import load_suite

suite = load_suite("regression")

print(suite)
Metrics(['explained_variance', 'max_error', 'mean_absolute_error', 'mean_squared_error', 'median_absolute_error', 'pearson', 'r2', 'root_mean_squared_error', 'spearman'])

You can also load a metric from a config dictionary. A metrics config dictionary is a dictionary specifying the metric to be loaded. A config dictionary must have the following format

metrics_dict = {
  'name': '<name of the metric>',
  'config': '<kwargs for instantiating metric>'
  'presets': '<kwarg presets for computing metric>'
}

The name key specifies the name of the metric to load from the catalogue. The config key specifies the arguments that are needed for instantiating the metric and the presets key specifies some preset kwargs to apply on computing the metric. If neither is specified, the metric will use default values.

To load a metric from a config

from molflux.metrics import load_from_dict

config = {
    'name': 'r2',
    'presets': {
        'sample_weight': [0.2, 0.4, 0.4],
    },
}

metric = load_from_dict(config)

print(metric.state)
{'sample_weight': [0.2, 0.4, 0.4]}

You can also load multiple metrics all at once using a list of config dictionaries. This is done as follows

from molflux.metrics import load_from_dicts

list_of_configs = [
    {
        'name': 'r2',
        'presets':
        {
            'sample_weight': [0.2, 0.4, 0.4],
        },
    },
    {
        'name': 'mean_squared_error',
    }
]

metrics = load_from_dicts(list_of_configs)

print(metrics)
Metrics(['mean_squared_error', 'r2'])

Finally, you can load metrics from a yaml file. You can use a single yaml file which includes configs for all the molflux tools, and molflux.metrics will know how to extract the relevant part it needs. To do so, you need to define a yaml file with the following example format

---
version: v1
kind: metrics
specs:
    - name: r2
      presets:
        sample_weight: [0.2, 0.4, 0.4]
    - name: mean_squared_error
...

It consists of a version (this is the version of the config format, for now just v1), kind of config (in this case metrics), and specs. specs is where the keyword arguments used when computing the metric are defined. The yaml file can include configs for other molflux modules as well. To load this yaml file, you can simply do

from molflux.metrics import load_from_yaml

metrics = load_from_yaml(path_to_yaml_file)

print(metrics)

Computing metrics#

After loading a metric (or group of metrics), you can apply them to predictions to compute the metric values. You must pass the references (ground truths) and the predictions (from your model).

from molflux.metrics import load_metric

metric = load_metric("r2")

ground_truth = [0, 0.3, 0.5, 0.8, 1]
preds = [0.1, 0.35, 0.45, 0.68, 1.2]

results = metric.compute(predictions=preds, references=ground_truth)

print(results)
{'r2': 0.8894904143920545}

This will return a dictionary with the metric tag as the key and the computed metric as the value. For a group of metrics (or a metric suite), you can follow the same procedure

from molflux.metrics import load_from_dicts

list_of_configs = [
    {
        'name': 'r2',
        'presets':
        {
            'sample_weight': [0.2, 0.2, 0.3, 0.1, 0.2],
        },
    },
    {
        'name': 'mean_squared_error',
    }
]

metrics = load_from_dicts(list_of_configs)

ground_truth = [0, 0.3, 0.5, 0.8, 1]
preds = [0.1, 0.35, 0.45, 0.68, 1.2]

results = metrics.compute(predictions=preds, references=ground_truth)

print(results)
{'r2': 0.8914456457924692, 'mean_squared_error': 0.01388000398397501}

This will return a dictionary with all the computed metrics (where the tags as the keys and the metrics as the values).