This section gives you ideas about the kind of tasks you can use Rubrix for. It also describes some of the tasks on our roadmap, if there’s some task you want and don’t see here or you want to contribute a task, file an issue or use the Discussion forum at Rubrix’s GitHub page.
According to the amazing NLP Progress resource by Seb Ruder:
Text classification is the task of assigning a sentence or document an appropriate category. The categories depend on the chosen dataset and can range from topics.
Rubrix is flexible with input and output shapes, which means you can model many related tasks like for example:
The most well-known task in this category is probably Named Entity Recognition:
Named entity recognition (NER) is the task of tagging entities in text with their corresponding type. Approaches typically use BIO notation, which differentiates the beginning (B) and the inside (I) of entities. O is used for non-entity tokens.
Rubrix is flexible with input and output shapes, which means you can model related tasks like for example:
Named entity recognition
Part of speech tagging
The most typical and oldest task in this category is probably Machine Translation:
Machine translation is the task of translating a sentence in a source language to a different target language.
The common frame of this category is that the modal receives and outputs a sequence of tokens. It encompasses a variety of tasks such as
natural language generation
paraphrase generation, etc.