Python client API¶

Here we describe the python client API of Rubrix that we divide into two basic modules:

Methods: These methods make up the interface to interact with Rubrix’s REST API.
Models: You need to wrap your data in these data models for Rubrix to understand it.

Methods¶

This module contains the interface to access Rubrix’s REST API.

rubrix.delete(name)¶

Delete a dataset.

Parameters: name (str) – The dataset name.
Return type: None

Examples

>>> rb.delete(name="example-dataset")

rubrix.init(api_url=None, api_key=None, timeout=60)¶

Init the python client.

Passing an api_url disables environment variable reading, which will provide default values.

Parameters

api_url (Optional[str]) – Address of the REST API. If None (default) and the env variable RUBRIX_API_URL is not set, it will default to http://localhost:6900.
api_key (Optional[str]) – Authentification key for the REST API. If None (default) and the env variable RUBRIX_API_KEY is not set, it will default to a not authenticated connection.
timeout (int) – Wait timeout seconds for the connection to timeout. Default: 60.

Return type

None

Examples

>>> rb.init(api_url="http://localhost:9090", api_key="4AkeAPIk3Y")

rubrix.load(name, snapshot=None, ids=None, limit=None)¶

Load dataset/snapshot data to a pandas DataFrame.

Parameters

name (str) – The dataset name.
snapshot (Optional[str]) – The dataset snapshot id.
ids (Optional[List[Union[str, int]]]) – If provided, load dataset records with given ids. Ignored for snapshots.
limit (Optional[int]) – The number of records to retrieve.

Returns

The dataset as a pandas Dataframe.

Return type

pandas.core.frame.DataFrame

Examples

>>> dataframe = rb.load(name="example-dataset")

rubrix.log(records, name, tags=None, metadata=None, chunk_size=500)¶

Log Records to Rubrix.

Parameters

records (Union[rubrix.client.models.TextClassificationRecord, rubrix.client.models.TokenClassificationRecord, Iterable[Union[rubrix.client.models.TextClassificationRecord, rubrix.client.models.TokenClassificationRecord]]]) – The record or an iterable of records.
name (str) – The dataset name.
tags (Optional[Dict[str, str]]) – A dictionary of tags related to the dataset.
metadata (Optional[Dict[str, Any]]) – A dictionary of extra info for the dataset.
chunk_size (int) – The chunk size for a data bulk.

Returns

Summary of the response from the REST API

Return type

rubrix.client.models.BulkResponse

Examples

>>> record = rb.TextClassificationRecord(
...     inputs={"text": "my first rubrix example"},
...     prediction=[('spam', 0.8), ('ham', 0.2)]
... )
>>> response = rb.log(record, name="example-dataset")

rubrix.snapshots(name)¶

Retrieve dataset snapshots.

Parameters: name (str) – The dataset name whose snapshots will be retrieved.
Returns: A list of snapshots.
Return type: List[rubrix.client.models.DatasetSnapshot]

Examples

>>> snapshot_list = rb.snapshots(name="example-dataset")

Models¶

This module contains the data models for the interface

class rubrix.client.models.BulkResponse(*, dataset, processed, failed=0)¶

Data info for bulk results.

Parameters

dataset (str) – The dataset name.
processed (int) – Number of records in bulk.
failed (Optional[int]) – Number of failed records.

Return type

None

class rubrix.client.models.DatasetSnapshot(*, id, task, creation_date)¶

The dataset snapshot info.

Parameters

id (str) – Id of the snapshot.
task (str) – Task of the snapshot.
creation_date (datetime.datetime) – Creation date of the snapshot.

Return type

None

class rubrix.client.models.TextClassificationRecord(*args, inputs, prediction=None, annotation=None, prediction_agent=None, annotation_agent=None, multi_label=False, explanation=None, id=None, metadata=None, status=None, event_timestamp=None)¶

Record for text classification

Parameters

inputs (Union[str, List[str], Dict[str, Union[str, List[str]]]]) – The inputs of the record
prediction (Optional[List[Tuple[str, float]]]) – A list of tuples containing the predictions for the record. The first entry of the tuple is the predicted label, the second entry is its corresponding score.
annotation (Optional[Union[str, List[str]]]) – A string or a list of strings (multilabel) corresponding to the annotation (gold label) for the record.
prediction_agent (Optional[str]) – Name of the prediction agent.
annotation_agent (Optional[str]) – Name of the annotation agent.
multi_label (bool) – Is the prediction/annotation for a multi label classification task? Defaults to False.
explanation (Optional[Dict[str, List[rubrix.client.models.TokenAttributions]]]) – A dictionary containing the attributions of each token to the prediction. The keys map the input of the record (see inputs) to the TokenAttributions.
id (Optional[Union[int, str]]) – The id of the record. By default (None), we will generate a unique ID for you.
metadata (Dict[str, Any]) – Meta data for the record. Defaults to {}.
status (Optional[str]) – The status of the record. Options: ‘Default’, ‘Edited’, ‘Discarded’, ‘Validated’. If an annotation is provided, this defaults to ‘Validated’, otherwise ‘Default’.
event_timestamp (Optional[datetime.datetime]) – The timestamp of the record.

Return type

None

classmethod input_as_dict(inputs)¶: Preprocess record inputs and wraps as dictionary if needed

class rubrix.client.models.TokenAttributions(*, token, attributions=None)¶

Attribution of the token to the predicted label.

In the Rubrix app this is only supported for TextClassificationRecord and the multi_label=False case.

Parameters

token (str) – The input token.
attributions (Dict[str, float]) – A dictionary containing label-attribution pairs.

Return type

None

class rubrix.client.models.TokenClassificationRecord(*args, text, tokens, prediction=None, annotation=None, prediction_agent=None, annotation_agent=None, id=None, metadata=None, status=None, event_timestamp=None)¶

Record for a token classification task

Parameters

text (str) – The input of the record
tokens (List[str]) – The tokenized input of the record. We use this to guide the annotation process and to cross-check the spans of your prediction/annotation.
prediction (Optional[List[Tuple[str, int, int]]]) – A list of tuples containing the predictions for the record. The first entry of the tuple is the name of predicted entity, the second and third entry correspond to the start and stop character index of the entity.
annotation (Optional[List[Tuple[str, int, int]]]) – A list of tuples containing annotations (gold labels) for the record. The first entry of the tuple is the name of the entity, the second and third entry correspond to the start and stop char index of the entity.
prediction_agent (Optional[str]) – Name of the prediction agent.
annotation_agent (Optional[str]) – Name of the annotation agent.
id (Optional[Union[int, str]]) – The id of the record. By default (None), we will generate a unique ID for you.
metadata (Dict[str, Any]) – Meta data for the record. Defaults to {}.
status (Optional[str]) – The status of the record. Options: ‘Default’, ‘Edited’, ‘Discarded’, ‘Validated’. If an annotation is provided, this defaults to ‘Validated’, otherwise ‘Default’.
event_timestamp (Optional[datetime.datetime]) – The timestamp of the record.

Return type

None

rubrix.client.models.limit_metadata_values(metadata)¶

Checks metadata values length and apply value truncation for large values

Parameters: metadata (Dict[str, Any]) –
Return type: Dict[str, Any]