Metrics#
This guide gives you a brief introduction to Rubrix Metrics. Rubrix Metrics enable you to perform fine-grained analyses of your models and training datasets. Rubrix Metrics are inspired by a a number of seminal works such as Explainaboard.
The main goal is to make it easier to build more robust models and training data, going beyond single-number metrics (e.g., F1).
This guide gives a brief overview of currently supported metrics. For the full API documentation see the Python API reference
This feature is experimental, you can expect some changes in the Python API. Please report on Github any issue you encounter.
Install dependencies#
Verify you have already installed Jupyter Widgets in order to properly visualize the plots. See https://ipywidgets.readthedocs.io/en/latest/user_install.html
For running this guide you need to install the following dependencies:
[ ]:
%pip install datasets spacy plotly -qqq
and the spacy model:
[ ]:
!python -m spacy download en_core_web_sm
1. Rubrix Metrics for NER pipelines predictions#
Load dataset and spaCy model#
We’ll be using spaCy for this guide, but all the metrics we’ll see are computed for any other framework (Flair, Stanza, Hugging Face, etc.). As an example will use the WNUT17 NER dataset.
[ ]:
import rubrix as rb
import spacy
from datasets import load_dataset
nlp = spacy.load("en_core_web_sm")
dataset = load_dataset("wnut_17", split="train")
Log records into a Rubrix dataset#
Let’s log spaCy predictions using the built-in rb.monitor
method:
[ ]:
nlp = rb.monitor(nlp, dataset="spacy_sm_wnut17")
def predict(record):
doc = nlp(" ".join(record["tokens"]))
return {"predicted": True}
dataset.map(predict)
Explore some metrics for this pipeline#
[17]:
from rubrix.metrics.token_classification import token_length
token_length(name="spacy_sm_wnut17").visualize()
[7]:
from rubrix.metrics.token_classification import token_capitalness
token_capitalness(name="spacy_sm_wnut17").visualize()
[20]:
from rubrix.metrics.token_classification import token_frequency
token_frequency(name="spacy_sm_wnut17", tokens=50).visualize()
[21]:
from rubrix.metrics.token_classification import entity_consistency
entity_consistency(name="spacy_sm_wnut17", mentions=5000, threshold=2).visualize()
[5]:
from rubrix.metrics.token_classification import entity_labels
entity_labels(name="spacy_sm_wnut17").visualize()
[6]:
from rubrix.metrics.token_classification import entity_density
entity_density(name="spacy_sm_wnut17").visualize()
[8]:
from rubrix.metrics.token_classification import entity_capitalness
entity_capitalness(name="spacy_sm_wnut17").visualize()
[8]:
from rubrix.metrics.token_classification import mention_length
mention_length(name="spacy_sm_wnut17").visualize()
2. Rubrix Metrics for NER training sets#
2. Rubrix Metrics for text classification#
[ ]:
from datasets import load_dataset
from transformers import pipeline
import rubrix as rb
sst2 = load_dataset("glue", "sst2", split="validation")
labels = sst2.features["label"].names
nlp = pipeline("sentiment-analysis")
[11]:
records = [
rb.TextClassificationRecord(
text=record["sentence"],
annotation=labels[record["label"]],
prediction=[(pred["label"].lower(), pred["score"]) for pred in nlp(record["sentence"])]
)
for record in sst2
]
[ ]:
rb.log(records, name="sst2")
[13]:
from rubrix.metrics.text_classification import f1
f1(name="sst2").visualize()
[20]:
# now compute metrics for negation ( -> negative precision and positive recall go down)
f1(name="sst2", query="n't OR not").visualize()