Splits in a nutshell

Splits in a nutshell#

117 words | 1 min read

One of the core steps of machine learning is splitting your data for training, validation, and testing. Splitting strategies for drug discovery in particular are numerous and rapidly evolving. New tools and techniques are constantly being developed but using multiple tools from different sources quickly becomes tedious, inconvenient, and prone to incompatibilities.

The splits submodule aims to address these issues. It is a collection of many different types of splitters which can split datasets using a variety of criteria. Whether you are looking for random splits or more complicated scaffold or time splits, splits provides a standard and modular interface for using these splitters and also allows you to add your own!