Machine Learning & AI

Synopsis

The Challenge

Building traditional AI / ML requires immense resource allocation to train the models with computationally demanding resources (expensive GPUs). Inputs to the training process include massive datasets (up to terabytes to petabytes for large models), and the actual models are anywhere from a few hundred MiB to tens of GiB.

The process involves:

Data collection & preparation—prepare massive datasets for the training process.
Pre-training & fine-tuning—take prepared data and let the model learn from it.
Inference—let end users interact with the model where an input leads to an output, both of which require storage & compute (lower relative to training process).

Our Solution

Enables object storage with verifiable data pipelines and decentralized architecture, access control and ownership, providing tools that solve common challenges in ML/AI including:

Pool and collaborate on data with many actors writing to a single repo, aggregate fragmented data into a unified and valuable asset.
Provision access & monetize data with programmable read & write access control, configurable pricing & licensing, and flexible governance options.
Add verifiability & provenance to data, having it signed at its source to enable verifiable data origin information

Collaboration Over Large Datasets

How it Works

Basin makes data available, replicating datasets & models to decentralized storage for open access.

Benefits

Provide redundancy, fault tolerance, and retrieval to reduce hosted storage costs, guarantee data liveliness, and enable open data access—driving a better data consumer experience.

Synopsis

The Challenge

Our Solution

Collaboration Over Large Datasets

How it Works

Benefits

Data Provenance & Transparency

How it Works