Intro­duc­tion to ragged tensors

We review Tensorflow’s concept of ragged tensors, which were intro­duced at the end of 2018. We explain their basic struc­ture and why they are useful.

Problem state­ment


With the standar­diza­tion of tradi­tional machine learning problems, many models are very easy to imple­ment: read a table of features from a database, use pandas and numpy for prepro­ces­sing, build a model with one of the well-known libra­ries, and type In many cases – that’s it! Done!

But wait – what if the data does not come in tabular form and is irregular by nature? What if there are instances with varying dimen­sions? Consider a scenario for time series classi­fi­ca­tion and suppose we have a dataset consis­ting of four diffe­rent short time series:

As you can see, the series differ in both the number and time of the measu­re­ments. Since machine learning models typically require a fixed input size, it’s a bit more compli­cated to fit such data into our models.

There are a number of possi­bi­li­ties to handle this type of input; for example we could inter­po­late the series and take virtual measu­re­ments at the same timestamps for each series:

Introduction to ragged tensors

Here we take the values from timestamps 0, 2, 4, 6, 8, and 10 such that every series consists of 6 values. However, at this stage we already have to choose hyper­pa­ra­me­ters such as the type of inter­po­la­tion, how many values, etc. However, we cannot rely on the accuracy of the inter­po­la­tion, especi­ally for extra­po­lated values and values within large gaps between succes­sive measu­re­ments (see the orange and green series at time 10).

From the technical side, when we feed the data into a Tensor­Flow Keras model and do not want to use inter­po­la­tion techni­ques, a common practice is to pad the series, e.g. with zeros at the end. This is neces­sary because Tensor­Flow groups batches of data together which must have the same shape in every dimen­sion. A batch of the 4 series above would have the shape (4, 6) with 4 being the number of series (=batch dimen­sion) and 6 being the number of measu­re­ments per series.

However, the 6 arises from artifi­cial data, either inter­po­lated measu­re­ments or padding values. To overcome the uncer­tainty and the overhead of both these techni­ques, we can use ragged tensors to work with the original data.

Concept of ragged tensors


The concept of ragged tensors is surpri­singly easy after under­stan­ding the inten­tion behind them. Let’s stick with our above example with 4 time series. As you can see, the minimum number of measu­re­ments per series is 3, while the maximum is 5. With padding we would have to fill every series with zeros at the end (or sometimes at the begin­ning) to achieve a common length of 5.

In contrast, a ragged tensor consists of the conca­te­na­tion of all values from all series together with metadata speci­fying where to split the conca­te­na­tion into the indivi­dual series. Let’s define our dataframe df and then our ragged tensor rt:

time value
0 3
3 1
6 8
8 0
10 9
0 15
5 11
8 7
0 12
2 7
4 8
9 2
0 9
4 0
6 13
10 4
row_splits = [0, 5, 8, 12, 16]
rt = tf.RaggedTensor.from_row_splits(values=df.values, row_splits=row_splits)

<tf.RaggedTensor [[[0, 3], [3, 1], [6, 8], [8, 0], [10, 9]], [[0, 15], [5, 11], [8, 7]], [[0, 12], [2, 7], [4, 8], [9, 2]], [[0, 9], [4, 0], [6, 13], [10, 4]]]>

As we can see, the row_splits array defines the indivi­dual series by speci­fying their startrow (inclu­sive) and endrow (exclu­sive).

That’s it. This is the really simple struc­ture of ragged tensors. As an alter­na­tive to speci­fying the row_splits we can also create the same ragged tensor with one of the follo­wing methods:

  • value_rowids: for every row in the conca­te­n­ated series we specify an id number which indexes the indivi­dual series:
value_rowids = [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
rt_1 = tf.RaggedTensor.from_value_rowids(values=df.values, value_rowids=value_rowids)

<tf.RaggedTensor [[[0, 3], [3, 1], [6, 8], [8, 0], [10, 9]], [[0, 15], [5, 11], [8, 7]], [[0, 12], [2, 7], [4, 8], [9, 2]], [[0, 9], [4, 0], [6, 13], [10, 4]]]>
  • row_lengths: we state the length of every indivi­dual series:
row_lengths = [5, 3, 4, 4]
rt_2 = tf.RaggedTensor.from_row_lengths(values=df.values, row_lengths=row_lengths)

<tf.RaggedTensor [[[0, 3], [3, 1], [6, 8], [8, 0], [10, 9]], [[0, 15], [5, 11], [8, 7]], [[0, 12], [2, 7], [4, 8], [9, 2]], [[0, 9], [4, 0], [6, 13], [10, 4]]]>
  • constant: we can define the ragged tensor as a “constant” by directly speci­fying a list of arrays:
rt_3 = tf.ragged.constant([df.loc[0:4, :].values, df.loc[5:7, :].values, df.loc[8:11, :].values, df.loc[12:15, :].values])

<tf.RaggedTensor [[[0, 3], [3, 1], [6, 8], [8, 0], [10, 9]], [[0, 15], [5, 11], [8, 7]], [[0, 12], [2, 7], [4, 8], [9, 2]], [[0, 9], [4, 0], [6, 13], [10, 4]]]>

Intern­ally, it does not matter which method we choose to create a ragged tensor, the results are all equiva­lent. Next we’ll see how to perform mathe­ma­tical opera­tions on ragged tensors.

Working with ragged tensors


Tensor­Flow provides a very handy function to perform opera­tions on ragged tensors: tf.ragged.map_flat_values(op, *args, **kwargs). It does what the function name says – every ragged tensor in args is substi­tuted by its conca­te­n­ated (=flat) version, omitting the batch dimen­sion. In our example, this is the same as if we operate on the df.values directly. The only diffe­rence is that the output of the opera­tion is again a ragged tensor with the same metadata infor­ma­tion about where to split. Let’s consider an example where we compute the matrix product of the ragged tensor with a matrix m of shape (2, 5). Each indivi­dual series in our ragged tensor has shape (k, 2) where k corre­sponds to the number of measu­re­ments in the given series. Taking care to first casting to floats:

m = tf.random.uniform(shape=[2, 5])

(2, 5)

rt = tf.cast(rt, tf.float32)
result = tf.ragged.map_flat_values(tf.matmul, rt, m)
print(*(t.shape for t in result), sep='\n')

(5, 5)
(3, 5)
(4, 5)
(4, 5)
Perfect! The resul­ting ragged tensor has the same row splits as the input, but the inner dimen­sion changed from 2 to 5 because of the matrix multi­pli­ca­tion. We could do some more compli­cated opera­tions, for example if m is not a 2-dimen­sional matrix, but a 3-dimen­sional tensor:
m = tf.random.uniform(shape=[2, 5, 4])

(2, 5, 4)

rt = tf.cast(rt, tf.float32)
result = tf.ragged.map_flat_values(tf.einsum, "bi, ijk -> bjk", rt, m)
print(*(t.shape for t in result), sep='\n')

(5, 5, 4)
(3, 5, 4)
(4, 5, 4)
(4, 5, 4)

As expected, the batch dimen­sion b corre­sponds to the length of the indivi­dual series, while the other dimen­sions origi­nate from m. By the way, tf.einsum refers to the Einstein summa­tion conven­tion, which is extre­mely handy if we are working with higher dimen­sional tensors. Read more about it here.

One last thing, it is also very easy to perform aggre­ga­tions over ragged tensors. For example, if we want to know the colum­nwise sum, we can use reduc­tion functions for this:

tf.reduce_sum(rt, axis=1)

<tf.Tensor: shape=(4, 2), dtype=float32, numpy=
array([[27., 21.],
       [13., 33.],
       [15., 29.],
       [20., 26.]], dtype=float32)>

There exists many more opera­tions for ragged tensors which are listed here.


We learned about the struc­ture of Tensor­Flow ragged tensors and how to perform basic mathe­ma­tical opera­tions on them. They make it unneces­sary to apply unnatural prepro­ces­sing techni­ques like inter­po­la­tion or padding. This is especi­ally useful for irregular time series datasets, although there are many other appli­ca­tions. Imagine a dataset with images of various sizes – ragged tensors are even able to handle multiple ragged dimen­sions, perfect for that.

In a subse­quent post I will dive a bit deeper into how to work with ragged tensors as input types for a Keras model by treating the indivi­dual time series as sets and performing atten­tion directly on the ragged tensors. Stay tuned!

