First steps with Rubrix

Welcome to Rubrix’s documentation.

What’s Rubrix?

Rubrix is a free and open-source tool for tracking and iterating on data for AI projects.

With Rubrix, you can:

  • Monitor the predictions of deployed models.

  • Collect ground-truth data for starting up a project or evolving an existing one.

  • Iterate on ground-truth data and predictions to debug, track and improve your models over time.

  • Build custom applications and dashboards on top of your model predictions and ground-truth data.

Rubrix is designed to enable novel, human-in-the loop workflows involving data scientists, subject matter experts and data engineers for curating, understanding and evolving data for AI and data science projects.

We’ve tried to make Rubrix easy, fun and seamless to use with your favourite libraries while keeping it scalable and flexible. Rubrix’s main components are:

  • a Python library to enable data scientists, data engineers and DevOps roles to build bridges between data, models and users, which you can install with pip.

  • a web application for exploring, curating and labelling data, which you can launch using Docker or with a local installation.

  • a REST API for storing, retrieving and searching human annotations and model predictions, which is part of Rubrix’s installation.

images/rubrix_intro.svg

Rubrix currently supports several natural language processing and knowledge graph use cases but we will be adding support for speech recognition and computer vision soon.

Quickstart

Getting started with Rubrix is easy, let’s see a quick example using the 🤗 transformers and datasets libraries:

Make sure you have Docker installed and run (check the Setup and Installation section for a more detailed installation process):

mkdir rubrix && cd rubrix

And then run:

wget -O docker-compose.yml https://git.io/rb-docker && docker-compose up

Install Rubrix python library (and transformers, pytorch and datasets libraries for this example):

pip install rubrix transformers datasets torch

Use your favourite editor or a Jupyter notebook to run the following:

from transformers import pipeline
from datasets import load_dataset
import rubrix as rb

model = pipeline('zero-shot-classification', model="typeform/squeezebert-mnli")

dataset = load_dataset("ag_news", split='test[0:100]')

# Our labels are: ['World', 'Sports', 'Business', 'Sci/Tech']
labels = dataset.features["label"].names

for record in dataset:
    prediction = model(record['text'], labels)

    item = rb.TextClassificationRecord(
        inputs={"text": record["text"]},
        prediction=list(zip(prediction['labels'], prediction['scores'])),
        annotation=labels[record["label"]]
    )

    rb.log(item, name="ag_news_zeroshot")

Use cases

  • Model monitoring and observability: log and observe predictions of live models.

  • Ground-truth data collection: collect labels to start a project from scratch or from existing live models.

  • Evaluation: easily compute “live” metrics from models in production, and slice evaluation datasets to test your system under specific conditions.

  • Model debugging: log predictions during the development process to visually spot issues.

  • Explainability: log things like token attributions to understand your model predictions.

  • App development: get a powerful search-based API on top of your model predictions and ground truth data.

Design Principles

Rubrix’s design is:

  • Agnostic: you can use Rubrix with any library or framework, no need to implement any interface or modify your existing toolbox and workflows.

  • Flexible: Rubrix does not make any strong assumption about your input data, so you can log and structure your data as it fits your use case.

  • Minimalistic: Rubrix is built around a small set of concepts and methods.

Next steps

The documentation is divided into different sections, which explore different aspects of Rubrix:

Community

You can join the conversation on our Github page and our Github forum.