First steps with Rubrix¶
Welcome to Rubrix’s documentation.
Rubrix is a free and open-source tool for tracking and iterating on data for AI projects.
With Rubrix, you can:
Monitor the predictions of deployed models.
Collect ground-truth data for starting up a project or evolving an existing one.
Iterate on ground-truth data and predictions to debug, track and improve your models over time.
Build custom applications and dashboards on top of your model predictions and ground-truth data.
Rubrix is designed to enable novel, human-in-the loop workflows involving data scientists, subject matter experts and data engineers for curating, understanding and evolving data for AI and data science projects.
We’ve tried to make Rubrix easy, fun and seamless to use with your favourite libraries while keeping it scalable and flexible. Rubrix’s main components are:
a Python library to enable data scientists, data engineers and DevOps roles to build bridges between data, models and users, which you can install with
a web application for exploring, curating and labelling data, which you can launch using
Dockeror with a local installation.
a REST API for storing, retrieving and searching human annotations and model predictions, which is part of Rubrix’s installation.
Rubrix currently supports several
natural language processing and
knowledge graph use cases but we will be adding support for speech recognition and computer vision soon.
Getting started with Rubrix is easy, let’s see a quick example using the 🤗
Make sure you have
Docker installed and run (check the setup and installation section for a more detailed installation process):
mkdir rubrix && cd rubrix
And then run:
wget -O docker-compose.yml https://git.io/rb-docker && docker-compose up
Install Rubrix python library (and
datasets libraries for this example):
pip install rubrix==0.3.0 transformers datasets torch
Use your favourite editor or a Jupyter notebook to run the following:
from transformers import pipeline from datasets import load_dataset import rubrix as rb model = pipeline('zero-shot-classification', model="typeform/squeezebert-mnli") dataset = load_dataset("ag_news", split='test[0:100]') # Our labels are: ['World', 'Sports', 'Business', 'Sci/Tech'] labels = dataset.features["label"].names for record in dataset: prediction = model(record['text'], labels) item = rb.TextClassificationRecord( inputs=record["text"], prediction=list(zip(prediction['labels'], prediction['scores'])), annotation=labels[record["label"]] ) rb.log(item, name="ag_news_zeroshot")
Now you can explore the records in the Rubrix UI at http://localhost:6900/.
The default username and password are
Model monitoring and observability: log and observe predictions of live models.
Ground-truth data collection: collect labels to start a project from scratch or from existing live models.
Evaluation: easily compute “live” metrics from models in production, and slice evaluation datasets to test your system under specific conditions.
Model debugging: log predictions during the development process to visually spot issues.
Explainability: log things like token attributions to understand your model predictions.
App development: get a powerful search-based API on top of your model predictions and ground truth data.
Rubrix’s design is:
Agnostic: you can use Rubrix with any library or framework, no need to implement any interface or modify your existing toolbox and workflows.
Flexible: Rubrix does not make any strong assumption about your input data, so you can log and structure your data as it fits your use case.
Minimalistic: Rubrix is built around a small set of concepts and methods.
The documentation is divided into different sections, which explore different aspects of Rubrix: