Metrics (Experimental)¶
Here we describe the available metrics in Rubrix:
Text classification: Metrics for text classification
Token classification: Metrics for token classification
Text classification¶
- rubrix.metrics.text_classification.metrics.f1(name)¶
Computes the single label f1 metric for a dataset
- Parameters
name (str) – The dataset name.
- Returns
The f1 metric summary
- Return type
rubrix.metrics.models.MetricSummary
Examples
>>> from rubrix.metrics.text_classification import f1 >>> summary = f1(name="example-dataset") >>> summary.visualize() # will plot a bar chart with results >>> summary.data # returns the raw result data
- rubrix.metrics.text_classification.metrics.f1_multilabel(name)¶
Computes the multi-label label f1 metric for a dataset
- Parameters
name (str) – The dataset name.
- Returns
The f1 metric summary
- Return type
rubrix.metrics.models.MetricSummary
Examples
>>> from rubrix.metrics.text_classification import f1_multilabel >>> summary = f1_multilabel(name="example-dataset") >>> summary.visualize() # will plot a bar chart with results >>> summary.data # returns the raw result data
Token classification¶
- rubrix.metrics.token_classification.metrics.entity_capitalness(name)¶
Computes the entity capitalness. The entity capitalness splits the entity mention shape in 4 groups:
UPPER
: All charactes in entity mention are upper caseLOWER
: All charactes in entity mention are lower caseFIRST
: The mention is capitalizedMIDDLE
: Some character in mention between first and last is capitalized- Parameters
name (str) – The dataset name.
- Returns
The summary entity capitalness distribution
- Return type
rubrix.metrics.models.MetricSummary
Examples
>>> from rubrix.metrics.token_classification import entity_capitalness >>> summary = entity_capitalness(name="example-dataset") >>> summary.visualize()
- rubrix.metrics.token_classification.metrics.entity_consistency(name, mentions=10, threshold=2)¶
Computes the consistency for top entity mentions in the dataset.
Entity consistency defines the label variability for a given mention. For example, a mention first identified in the whole dataset as Cardinal, Person and Time is less consistent than a mention Peter identified as Person in the dataset.
- Parameters
name (str) – The dataset name.
mentions (int) – The number of top mentions to retrieve
threshold (int) – The entity variability threshold (Must be greater or equal to 2)
- Returns
The summary entity capitalness distribution
Examples
>>> from rubrix.metrics.token_classification import entity_consistency >>> summary = entity_consistency(name="example-dataset") >>> summary.visualize()
- rubrix.metrics.token_classification.metrics.entity_density(name, interval=0.005)¶
Computes the entity density distribution. Then entity density is calculated at record level for each mention as
mention_length/tokens_length
- Parameters
name (str) – The dataset name.
interval (float) – The interval for histogram. The entity density is defined in the range 0-1
- Returns
The summary entity density distribution
- Return type
rubrix.metrics.models.MetricSummary
Examples
>>> from rubrix.metrics.token_classification import entity_density >>> summary = entity_density(name="example-dataset") >>> summary.visualize()
- rubrix.metrics.token_classification.metrics.entity_labels(name, labels=50)¶
Computes the entity labels distribution
- Parameters
name (str) – The dataset name.
labels (int) – The number of top entities to retrieve. Lower numbers will be better performants
- Returns
The summary for entity tags distribution
- Return type
rubrix.metrics.models.MetricSummary
Examples
>>> from rubrix.metrics.token_classification import entity_labels >>> summary = entity_labels(name="example-dataset", labels=10) >>> summary.visualize() # will plot a bar chart with results >>> summary.data # The top-20 entity tags
- rubrix.metrics.token_classification.metrics.mention_length(name, interval=1)¶
Computes mentions length distribution (in number of tokens)
- Parameters
name (str) – The dataset name.
interval (int) – The bins or bucket for result histogram
- Returns
The summary for mention token distribution
- Return type
rubrix.metrics.models.MetricSummary
Examples
>>> from rubrix.metrics.token_classification import mention_length >>> summary = mention_length(name="example-dataset", interval=2) >>> summary.visualize() # will plot a histogram chart with results >>> summary.data # the raw histogram data with bins of size 2
- rubrix.metrics.token_classification.metrics.tokens_length(name, interval=1)¶
Computes the tokens length distribution
- Parameters
name (str) – The dataset name.
interval (int) – The bins or bucket for result histogram
- Returns
The summary for token distribution
- Return type
rubrix.metrics.models.MetricSummary
Examples
>>> from rubrix.metrics.token_classification import tokens_length >>> summary = tokens_length(name="example-dataset", interval=5) >>> summary.visualize() # will plot a histogram with results >>> summary.data # the raw histogram data with bins of size 5