Dataset¶
The Dataset page is the main page of the Rubrix web app. From here you can access most of Rubrix’s features, like exploring and annotating the records of your dataset.
The page is composed of 4 major components:
Search bar¶
Rubrix’s search bar is a powerful tool that allows you to thoroughly explore your dataset, and quickly navigate through the records. You can either fuzzy search the contents of your records, or use the more advanced query string syntax of Elasticsearch to take full advantage of Rubrix’s data models. You can find more information about how to use the search bar in our detailed search guide.
Filters¶
The filters provide you a quick and intuitive way to filter and sort your records with respect to various parameters. You can find more information about how to use the filters in our detailed filter guide.
Note
Not all filters are available for all tasks.
Predictions filter¶
This filter allows you to filter records with respect of their predictions:
Predicted as: filter records by their predicted labels
Predicted ok: filter records whose predictions do, or do not, match the annotations
Score: filter records with respect to the score of their prediction
Predicted by: filter records by the prediction agent
Annotations filter¶
This filter allows you to filter records with respect to their annotations:
Annotated as: filter records with respect to their annotated labels
Annotated by: filter records by the annotation agent
Status filter¶
This filter allows you to filter records with respect to their status:
Default: records without any annotation or edition
Validated: records with validated annotations
Edited: records with annotations but still not validated
Metadata filter¶
This filter allows you to filter records with respect to their metadata.
Hint
Nested metadata will be flattened and the keys will be joint by a dot.
Sort records¶
With this component you can sort the records by various parameters, such as the predictions, annotations or their metadata.
Record cards¶
The record cards are at the heart of the Dataset page and contain your data. There are three different flavors of record cards depending on the task of your dataset. All of them share the same basic structure showing the input text and a vertical ellipsis (or “kebab menu”) on the top right that lets you access the record’s metadata. Predictions and annotations are shown depending on the current mode and task of the dataset.
Check out our exploration and annotation guides to see how the record cards work in the different modes.
Text classification¶
In this task the predictions are given as tags below the input text. They contain the label as well as a percentage score. When in Explore mode annotations are shown as tags on the right together with a symbol indicating if the predictions match the annotations or not. When in Annotate mode predictions and annotations share the same labels (annotation labels are darker).
A text classification dataset can support either single-label or multi-label classification - in other words, records are either annotated with one single label or various.
Token classification¶
In this task predictions and annotation are given as highlights in the input text. Work in progress …
Text2Text¶
In this task predictions and the annotation are given in a text field below the input text. You can switch between prediction and annotation via the “View annotation”/”View predictions” buttons. For the predictions you can find an associated score in the lower left corner. If you have multiple predictions you can toggle between them using the arrows on the button of the record card.
Sidebar¶
The sidebar is divided into three sections.
Modes¶
This section of the sidebar lets you switch between the different Rubrix modes that are covered extensively in their respective guides:
Explore: this mode is for exploring your dataset and gain valuable insights
Annotate: this mode lets you conveniently annotate your data
Define rules: this mode helps you to define rules to automatically label your data
Note
Not all modes are available for all tasks.
Metrics¶
In this section you find several “metrics” that can provide valuable insights to your dataset, or support you while annotating your records. They are grouped into two submenus:
Progress: see metrics of your annotation process, like its progress and the label distribution
Stats: check the keywords of your dataset and the error distribution of the predictions
You can find more information about each metric in our dedicated metrics guide.
Refresh¶
This button allows you to refresh the list of the record cards with respect to the activated filters. For example, if you are annotating and use the Status filter to filter out annotated records, you can press the Refresh button to hide the latest annotated records.