# Introduction to frayed tensors

- Von Torben Windler

#### Share post:

### Introduction

**ragged tensors**

which was introduced at the end of 2018. We explain their basic structure and why they are useful.

### Problem definition

Due to the standardization of traditional machine learning problems, many models are very easy to implement: Read a table of features from a database, use pandas and numpy for preprocessing, create a model using one of the well-known libraries, and type **model.fit()**. In many cases – that’s it! Done!

But wait – what if the data is not in tabular form and is irregular by nature? What if there are instances with different dimensions? Let’s consider a scenario for the classification of time series and assume we have a data set consisting of four different short time series:

As you can see, the series differ both in the number and the time of the measurements. Since machine learning models usually require a fixed input size, it is somewhat more complicated to fit such data into our models.

There are a number of ways to deal with this type of input. For example, we could interpolate the series and take *virtual* measurements at the same points in time for each series:

Here we take the values of the time stamps 0, 2, 4, 6, 8 and 10, so that each series consists of 6 values. At this stage, however, we already have to select hyperparameters such as the type of interpolation, the number of values, etc. However, we cannot rely on the accuracy of the interpolation, especially for extrapolated values and for values within large gaps between consecutive measurements (see the orange and green rows at time 10).

If we want to feed the data into a TensorFlow Keras model and do not want to use interpolation techniques, a common practice is to pad the rows, e.g. with zeros at the end. This is necessary because TensorFlow combines data stacks that must have the same shape in every dimension. A batch with the above 4 series would have the form (4, 6), where 4 is the number of series (= batch size) and 6 is the number of measurements per series.

However, the 6 results from artificial data, either interpolated measurements or fill values. To overcome the uncertainty and overhead of these two techniques, we can process the original data with ragged tensors.

### Concept of frayed tensors

The concept of ragged tensors is surprisingly simple once you understand the intention behind it. Let’s stick to our example above with 4 time series. As you can see, the minimum number of measurements per series is 3, while the maximum number is 5. With padding, we would have to fill each row with zeros at the end (or sometimes at the beginning) to achieve a common length of 5.

In contrast, a ragged tensor consists of the concatenation of all values from all series together with metadata indicating where the concatenation should be split into the individual series. Let’s define our data frame **df** and then our ragged tensor **rt**:

Time | Value |
---|---|

0 | 3 |

3 | 1 |

6 | 8 |

8 | 0 |

10 | 9 |

0 | 15 |

5 | 11 |

8 | 7 |

0 | 12 |

2 | 7 |

4 | 8 |

9 | 2 |

0 | 9 |

4 | 0 |

6 | 13 |

10 | 4 |

```
row_splits = [0, 5, 8, 12, 16]
rt = tf.RaggedTensor.from_row_splits(values=df.values, row_splits=row_splits)
rt
<tf.RaggedTensor [[[0, 3], [3, 1], [6, 8], [8, 0], [10, 9]], [[0, 15], [5, 11], [8, 7]], [[0, 12], [2, 7], [4, 8], [9, 2]], [[0, 9], [4, 0], [6, 13], [10, 4]]]>
```

As we can see, the **row_splits array** defines the individual rows by specifying their start row (inclusive) and end row (exclusive).

That’s it. This is the really simple structure of the ragged tensors. As an alternative to specifying **row_splits**, we can also create the same ragged tensor using one of the following methods:

**value_rowids**: for each row in the concatenated series, we specify an ID number that indexes the individual row:

```
value_rowids = [0, 0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
rt_1 = tf.RaggedTensor.from_value_rowids(values=df.values, value_rowids=value_rowids)
rt_1
<tf.RaggedTensor [[[0, 3], [3, 1], [6, 8], [8, 0], [10, 9]], [[0, 15], [5, 11], [8, 7]], [[0, 12], [2, 7], [4, 8], [9, 2]], [[0, 9], [4, 0], [6, 13], [10, 4]]]>
```

**row_lengths**: we specify the length of each individual row:

```
row_lengths = [5, 3, 4, 4]
rt_2 = tf.RaggedTensor.from_row_lengths(values=df.values, row_lengths=row_lengths)
rt_2
<tf.RaggedTensor [[[0, 3], [3, 1], [6, 8], [8, 0], [10, 9]], [[0, 15], [5, 11], [8, 7]], [[0, 12], [2, 7], [4, 8], [9, 2]], [[0, 9], [4, 0], [6, 13], [10, 4]]]>
```

**Constant**: We can define the ragged tensor as a “constant” by directly specifying a list of arrays:

```
rt_3 = tf.ragged.constant([df.loc[0:4, :].values, df.loc[5:7, :].values, df.loc[8:11, :].values, df.loc[12:15, :].values])
rt_3
<tf.RaggedTensor [[[0, 3], [3, 1], [6, 8], [8, 0], [10, 9]], [[0, 15], [5, 11], [8, 7]], [[0, 12], [2, 7], [4, 8], [9, 2]], [[0, 9], [4, 0], [6, 13], [10, 4]]]>
```

Internally, it does not matter which method we choose to create a ragged tensor, the results are all equivalent. Next, let’s look at how to perform mathematical operations with ragged tensors.

### Working with frayed tensors

TensorFlow offers a very handy function to perform operations on ragged tensors: **tf.ragged.map_flat_values(op, *args, **kwargs)**. It does what the function name says – each ragged tensor in **args** is replaced by its concatenated (=flat) version, omitting the stack dimension. In our example, this is the same as if we were working directly with the **df.values**. The only difference is that the output of the operation is again a Ragged Tensor with the same metadata information about where to split. Let us consider an example in which we calculate the matrix product of the ragged tensor with a matrix **m** of the form (2, 5). Each individual series in our ragged tensor has the form (k, 2), where k corresponds to the number of measurements in the respective series. Make sure that you start the floats first:

```
m = tf.random.uniform(shape=[2, 5])
print(m.shape)
(2, 5)
rt = tf.cast(rt, tf.float32)
result = tf.ragged.map_flat_values(tf.matmul, rt, m)
print(*(t.shape for t in result), sep='\n')
(5, 5)
(3, 5)
(4, 5)
(4, 5)
```

Perfect! The resulting ragged tensor has the same row splits as the input, but the inner dimension has changed from 2 to 5 due to the matrix multiplication. We could perform some more complicated operations, for example if **m** is not a 2-dimensional matrix but a 3-dimensional tensor:

```
m = tf.random.uniform(shape=[2, 5, 4])
print(m.shape)
(2, 5, 4)
rt = tf.cast(rt, tf.float32)
result = tf.ragged.map_flat_values(tf.einsum, "bi, ijk -> bjk", rt, m)
print(*(t.shape for t in result), sep='\n')
(5, 5, 4)
(3, 5, 4)
(4, 5, 4)
(4, 5, 4)
```

As expected, the batch dimension **b** corresponds to the length of the individual series, while the other dimensions come from `m`

. Incidentally, **tf.einsum** refers to the Einstein summation convention, which is extremely practical when we are working with higher-dimensional tensors. Read more about it

**here**

.

One last point: It is also very easy to perform aggregations over frayed tensors. For example, if we want to know the column-wise sum, we can use reduction functions:

```
tf.reduce_sum(rt, axis=1)
<tf.Tensor: shape=(4, 2), dtype=float32, numpy=
array([[27., 21.],
[13., 33.],
[15., 29.],
[20., 26.]], dtype=float32)>
```

There are many more operations for frayed tensors, which are listed here.

### Conclusion

We have learned about the structure of TensorFlow Ragged Tensors and how to perform basic mathematical operations with them. They make it unnecessary to use unnatural pre-processing techniques such as interpolation or padding. This is particularly useful for irregular time series data sets, although there are many other applications. Imagine a data set with images of different sizes – ragged tensors can even handle multiple ragged dimensions, perfect for this.

In a later post, I’ll dive a little deeper into working with ragged tensors as input types for a Keras model by treating the individual time series as sets and drawing attention directly to the ragged tensors. Stay tuned!