Rust Puzzle: Flatten a Nested Iterator of Results

03 Nov 2018
Paul Kernfeld dot com

When I was rewriting the ndarray-csv crate for the third time, I encountered the problem described below. Although it didn’t sound that difficult originally, it took me numerous hours to figure it out.

The Puzzle

Flatten an iterator of results of iterators of results into an iterator of results. Although it would be fun to implement a custom iterator to solve this problem, my goal was to use existing code from std or the itertools crate.

pub fn flatten_nested_results<T, E, II, IO>(iter_outer: IO) -> impl Iterator<Item = Result<T, E>>
where
    II: Iterator<Item = Result<T, E>>,
    IO: Iterator<Item = Result<II, E>>,
{
    /// Fill me in!
}

Here are the tests that I used to check whether my solution worked. I wanted to use this with Result::from_iter, so that’s how I’ve written the tests. Notably, this means that once an error is encountered, it’s not necessary to process any subsequent items in the iterator.

/// This is a testing convenience method to convert a nested Vec into a nested Iterator
fn vec_to_nested_iter(
    vec_outer: Vec<Result<Vec<Result<i8, f32>>, f32>>,
) -> impl Iterator<Item = Result<impl Iterator<Item = Result<i8, f32>>, f32>> {
    vec_outer
        .into_iter()
        .map(|vec_inner| vec_inner.map(Vec::into_iter))
}

/// Without any Errs, we should return the whole sequence
#[test]
fn test_all_ok() {
    let iter_outer = vec_to_nested_iter(vec![Ok(vec![Ok(1), Ok(2)]), Ok(vec![Ok(3)])]);
    let expected: Result<Vec<i8>, f32> = Ok(vec![1, 2, 3]);
    assert_eq!(expected, flatten_nested_results(iter_outer).collect())
}

/// We should return the first inner Err
#[test]
fn test_inner_err() {
    let iter_outer = vec_to_nested_iter(vec![Ok(vec![Err(1.0), Ok(2), Err(3.0)]), Err(4.0)]);
    let expected: Result<Vec<i8>, f32> = Err(1.0);
    assert_eq!(expected, flatten_nested_results(iter_outer).collect())
}

/// We should return the first outer Err
#[test]
fn test_outer_err() {
    let iter_outer = vec_to_nested_iter(vec![Err(1.0), Ok(vec![Ok(2), Err(3.0)])]);
    let expected: Result<Vec<i8>, f32> = Err(1.0);
    assert_eq!(expected, flatten_nested_results(iter_outer).collect())
}

If you want to treat this like a puzzle, try to fill in the implementation of flatten_nested_results. Otherwise, you can read through my thought process and solution below.

Failed Attempts

Failed Attempt 1: Iterator.flatten

Iterator already has a flatten method, so what if we use that?

pub fn flatten_nested_results<T, E, II, IO>(iter_outer: IO) -> impl Iterator<Item = Result<T, E>>
where
    II: Iterator<Item = Result<T, E>>,
    IO: Iterator<Item = Result<II, E>>,
{
    iter_outer.flatten()
}

This results in a type mismatch compilation error. flatten can be used to turn an Iterator<Item = Iterator<Item = Result<T, E>>> into an Iterator<Item = Result<T, E>>. However, that’s not exactly the right type signature because we want to be able to handle two levels of Result, not one. Oh well, it was worth a shot!

Failed Attempt 2: flat_map

Here is a simple solution that compiles. It handles Ok and inner errors correctly because flat_map naturally translates each inner error into an error in the returned iterator. However, this function panics when it hits an outer error.

pub fn flatten_nested_results<T, E>(
    iter_outer: impl Iterator<Item = Result<impl Iterator<Item = Result<T, E>>, E>>,
) -> impl Iterator<Item = Result<T, E>> {
    iter_outer.flat_map(|iter_inner_result| match iter_inner_result {
        Ok(iter_inner) => iter_inner,
        Err(err) => panic!("oh jeez, an outer error happened"),
    })
}

Okay, so now we need to replace the panic with something that returns an Err. Should be straightforward, right?

Failed Attempt 3: once

Now let’s try turning each outer error into a single-item iterator (std::iter::once):

use std::iter::once;

pub fn flatten_nested_results<T, E, II, IO>(iter_outer: IO) -> impl Iterator<Item = Result<T, E>>
where
    II: Iterator<Item = Result<T, E>>,
    IO: Iterator<Item = Result<II, E>>,
{
    iter_outer.flat_map(|iter_inner_result| match iter_inner_result {
        Ok(iter_inner) => iter_inner,
        Err(err) => once(Err(err)),
    })
}

This produces the compilation error: match arms have incompatible type. Although iter_inner and once(Err(err)) both implement the right return type, they are different concrete types. iter_inner is of type II, whereas once(Err(err)) is of type std::iter::Once<Result<T, E>>. This makes the Rust compiler very angry.

This error caught me off guard, because I am used to working in languages where a solution like this works just fine. For example, an equivalent solution in Scala is:

def flattenNestedResults[T, E](iter: Iterator[Either[E, Iterator[Either[E, T]]]]): Iterator[Either[E, T]] = {
    iter.flatMap { _ match {
        case Right(iterInner) => iterInner
        case Left(err) => Some(Left(err))
    }}
}

Even though iterInner and Some(Left(err)) aren’t the same type, Scala knows that they both implement Iterator so it will happily compile and run this code. It is (I think) able to do this because it uses a technique called dynamic dispatch, which involves looking up functions at runtime. Rust, however, uses static dispatch in most cases, which separates it from many of the high-level languages that I’m familiar with. You can read more about static and dynamic dispatch in the Trait Objects section of the first edition of The Rust Programming Language.

In order to solve this, we’ll need to either use dynamic dispatch or make both match arms return the same concrete type.

Solutions

Solution: Trait Objects

I was able to solve this by casting each of the match arms to return Box<Iterator<Item = Result<T, E>>>. This makes use of a Rust feature called trait objects.

pub fn flatten_nested_results<T, E, II, IO>(iter_outer: IO) -> impl Iterator<Item = Result<T, E>>
where
    T: 'static,
    E: 'static,
    II: 'static + Iterator<Item = Result<T, E>>,
    IO: 'static + Iterator<Item = Result<II, E>>,
{
    iter_outer.flat_map(|iter_inner_result| match iter_inner_result {
        Ok(iter_inner) => Box::new(iter_inner) as Box<Iterator<Item = Result<T, E>>>,
        Err(err) => Box::new(once(Err(err))) as Box<Iterator<Item = Result<T, E>>>,
    })
}

There are a few things that I don’t like about this solution:

  1. The casting syntax is kind of ugly.
  2. Trait objects add runtime cost and prevent some optimizations.
  3. Technically it doesn’t solve the original problem because I needed to add a 'static lifetime bound to all of the types.

Solution: Either

The solution that I ended up with was to use the either crate, which is a dependency of itertools. Actually, currently it’s the only dependency of itertools. This solution works because Either<L, R> is an iterator as long as both L and R are iterators with the same Item. In this case, the Item is Result<T, E>. I think this is a pretty neat solution.

extern crate either;

use either::Either;
use std::iter::once;

pub fn flatten_nested_results<T, E, II, IO>(iter_outer: IO) -> impl Iterator<Item = Result<T, E>>
where
    II: Iterator<Item = Result<T, E>>,
    IO: Iterator<Item = Result<II, E>>,
{
    iter_outer.flat_map(|iter_inner_result| match iter_inner_result {
        Ok(iter_inner) => Either::Right(iter_inner),
        Err(err) => Either::Left(once(Err(err))),
    })
}

Your Solution?

Please let me know if you have any alternative solutions to this problem!

Thanks to Michael Hoy and Brandon Maister for comments.