Improving ndarray-csv: Goodbye failure, Hello Extension Traits

Two weeks ago, I wrote a blog post explaining some design decisions that I made for the ndarray-csv crate. Based on some excellent Reddit comments and GitHub issues from dtolnay, I have amended some of these decisions.

Removing failure

Although I thought that I was future-proofing the crate by using failure, I may have been doing the opposite. u/WellMakeItSomehow shared some convincing arguments highlighting drawbacks of depending on failure.

I don’t have much to add, so I’ll just quote the comment in full:

Besides the concerns about exposing an unstable crate in your API, people have also complained that the backtraces stored by failure slow down their apps. I don’t think that’s such a big concern, but I’ve been bothered about the increased build times; while a larger app might already be using e.g. syn (the same version, hopefully), for a small CLI app or crate, it might be too much.

There’s also some vague notion that failure’s API is not ideal, so it might change in the future. But it looks like there can’t be any breaking changes to it, because at this point failure is too big to fail (pardon the pun).

Also, I’ve seen people (including myself) getting confused by the failure documentation wrt. the ErrorKind pattern and the Context API. If you go for an Error enum, you still need to implement the conversion functions by hand.

failure gives you backtraces, the context thing, and the macro for Displaying the error messages. Not everyone wants backtraces, and IMHO that macro should be another crate, with proper maintenance (it has some limitations right now).

So, while it’s not unreasonable to use it, failure today is probably not how Rust error handling will end up looking in a year or two. I’m sure that the compromises will be the same (wrt. backtraces, memory allocation and so on), but I think the details will be different. And if you use it today, you’re writing more code that depends on a “too big to fail (or change)” crate.

In general, I like to err on the side of simplicity, so I decided to remove failure. I noticed one big improvement: compilation time in release mode dropped from 35 seconds to 14 seconds!

Using extension traits

Previously, my read function looked like:

use csv::Reader;
use ndarray::Array2;
use serde::de::DeserializeOwned;
use std::io::Read;

pub fn read<A: DeserializeOwned>(
    shape: (usize, usize),
    reader: &mut Reader<impl Read>
) -> Result<Array2<A>, Error> {
    // Implementation goes here
}

Rust’s impl Trait functionality is pretty cool, but I wasn’t using it effectively. Specifically, mixing impl Trait and type parameters is bad because it prevents callers from using the turbofish operator (::<...>). Actually, this limitation was already showing itself in my tests, where I needed to add ugly type annotations like this:

#[test]
fn test_read_csv_error() {
    let readed: Result<Array2<i8>, _> = read((2, 3), &mut in_memory_reader("1,2,3\n4,x,6\n"));
    readed.unwrap_err();
}

dtolnay suggested solving this problem by using a trait object or an extension trait method. I didn’t want to do a trait object because seeing &mut Reader<Box<Read>> kind of creeped me out, possibly for no good reason. Instead, I went with an extension trait. This is where I define a new trait in order to add methods to a struct from another crate. In this case, I defined a trait Array2Reader to add a deserialize_array2 method to the Reader struct from the csv crate:

use csv::Reader;
use ndarray::Array2;
use serde::de::DeserializeOwned;
use std::io::Read;

pub trait Array2Reader {
    fn deserialize_array2<A: DeserializeOwned>(
        self,
        shape: (usize, usize),
    ) -> Result<Array2<A>, ReadError>;
}

impl<'a, R: Read> Array2Reader for &'a mut Reader<R> {
    // Implementation goes here
}

Although the code above is a bit more complicated, being able to use the turbofish really cleaned up my test code:

#[test]
fn test_read_csv_error() {
    in_memory_reader("1,2,3\n4,x,6\n")
        .deserialize_array2::<i8>((2, 3))
        .unwrap_err();
}

Small improvements

I also made a couple small improvements:

Somehow I had accidentally added an unnecessary Copy bound on A in my read method, so I removed it.
DeserializeOwned provides a nicer way of writing for<'de> A: serde::Deserialize<'de>.

Thanks for the advice!

I’m grateful that the Rust community has provided such excellent constructive criticism to help me improve ndarray-csv and understand Rust better. I’m sure that it is still possible to make many improvements, so please send me any design feedback that you have!