SeqIO#

Task-based datasets, preprocessing, and evaluation for sequence models


SeqIO is a library for processing sequential data to be fed into downstream sequence models. It uses tf.data.Dataset to create scalable data pipelines but requires minimal use of TensorFlow. In particular, with one line of code, the returned dataset can be transformed to a numpy iterator and hence it is fully compatible with other frameworks such as JAX or PyTorch.

Installation#

pip install seqio

Quick Start#

Read the SeqIO Guide for a quick introduction to the Task and Mixture APIs, key underlying components such as the data source, preprocessors, feature converters and metrics, and the evaluation API.

Learn more

Tutorials#

Browse through a series of self-contained Colab notebooks that illustrate various aspects of defining and running data pipelines using SeqIO Tasks and Mixtures, and evaluating models using the SeqIO evaluation library.

Learn more

API Reference#

Understand the codebase better to write custom components and contribute to the effort.

Learn more

Citing SeqIO#

Please use the following bibtex entry to cite SeqIO.

@article{roberts2022t5x,
url = {https://arxiv.org/abs/2203.17189},
author = {Roberts, Adam and Chung, Hyung Won and Levskaya, Anselm and Mishra, Gaurav and Bradbury, James and Andor, Daniel and Narang, Sharan and Lester, Brian and Gaffney, Colin and Mohiuddin, Afroz and Hawthorne, Curtis and Lewkowycz, Aitor and Salcianu, Alex and van Zee, Marc and Austin, Jacob and Goodman, Sebastian and Soares, Livio Baldini and Hu, Haitang and Tsvyashchenko, Sasha and Chowdhery, Aakanksha and Bastings, Jasmijn and Bulian, Jannis and Garcia, Xavier and Ni, Jianmo and Chen, Andrew and Kenealy, Kathleen and Clark, Jonathan H. and Lee, Stephan and Garrette, Dan and Lee-Thorp, James and Raffel, Colin and Shazeer, Noam and Ritter, Marvin and Bosma, Maarten and Passos, Alexandre and Maitin-Shepard, Jeremy and Fiedel, Noah and Omernick, Mark and Saeta, Brennan and Sepassi, Ryan and Spiridonov, Alexander and Newlan, Joshua and Gesmundo, Andrea},
title = {Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$},
journal={arXiv preprint arXiv:2203.17189},
year = {2022},
}