Two weeks ago, I wrote a blog post explaining some design decisions that I made for the ndarray-csv crate. Based on some excellent Reddit comments and GitHub issues from dtolnay, I have amended some of these decisions.
Removing failure
Although I thought that I was future-proofing the crate by using failure, I may have been doing the opposite. u/WellMakeItSomehow shared some convincing arguments highlighting drawbacks of depending on failure.
I don’t have much to add, so I’ll just quote the comment in full:
Besides the concerns about exposing an unstable crate in your API, people have also complained that the backtraces stored by
failure
slow down their apps. I don’t think that’s such a big concern, but I’ve been bothered about the increased build times; while a larger app might already be using e.g.syn
(the same version, hopefully), for a small CLI app or crate, it might be too much.There’s also some vague notion that
failure
’s API is not ideal, so it might change in the future. But it looks like there can’t be any breaking changes to it, because at this pointfailure
is too big to fail (pardon the pun).Also, I’ve seen people (including myself) getting confused by the
failure
documentation wrt. theErrorKind
pattern and theContext
API. If you go for anError
enum, you still need to implement the conversion functions by hand.
failure
gives you backtraces, the context thing, and the macro forDisplay
ing the error messages. Not everyone wants backtraces, and IMHO that macro should be another crate, with proper maintenance (it has some limitations right now).So, while it’s not unreasonable to use it,
failure
today is probably not how Rust error handling will end up looking in a year or two. I’m sure that the compromises will be the same (wrt. backtraces, memory allocation and so on), but I think the details will be different. And if you use it today, you’re writing more code that depends on a “too big to fail (or change)” crate.
In general, I like to err on the side of simplicity, so I decided to remove failure
. I noticed one big improvement: compilation time in release mode dropped from 35 seconds to 14 seconds!
Using extension traits
Previously, my read
function looked like:
use csv::Reader;
use ndarray::Array2;
use serde::de::DeserializeOwned;
use std::io::Read;
pub fn read<A: DeserializeOwned>(
shape: (usize, usize),
reader: &mut Reader<impl Read>
) -> Result<Array2<A>, Error> {
// Implementation goes here
}
Rust’s impl Trait
functionality is pretty cool, but I wasn’t using it effectively. Specifically, mixing impl Trait
and type parameters is bad because it prevents callers from using the turbofish operator (::<...>
). Actually, this limitation was already showing itself in my tests, where I needed to add ugly type annotations like this:
#[test]
fn test_read_csv_error() {
let readed: Result<Array2<i8>, _> = read((2, 3), &mut in_memory_reader("1,2,3\n4,x,6\n"));
readed.unwrap_err();
}
dtolnay suggested solving this problem by using a trait object or an extension trait method. I didn’t want to do a trait object because seeing &mut Reader<Box<Read>>
kind of creeped me out, possibly for no good reason. Instead, I went with an extension trait. This is where I define a new trait in order to add methods to a struct from another crate. In this case, I defined a trait Array2Reader
to add a deserialize_array2
method to the Reader
struct from the csv crate:
use csv::Reader;
use ndarray::Array2;
use serde::de::DeserializeOwned;
use std::io::Read;
pub trait Array2Reader {
fn deserialize_array2<A: DeserializeOwned>(
self,
shape: (usize, usize),
) -> Result<Array2<A>, ReadError>;
}
impl<'a, R: Read> Array2Reader for &'a mut Reader<R> {
// Implementation goes here
}
Although the code above is a bit more complicated, being able to use the turbofish really cleaned up my test code:
#[test]
fn test_read_csv_error() {
in_memory_reader("1,2,3\n4,x,6\n")
.deserialize_array2::<i8>((2, 3))
.unwrap_err();
}
Small improvements
I also made a couple small improvements:
- Somehow I had accidentally added an unnecessary
Copy
bound onA
in myread
method, so I removed it. DeserializeOwned
provides a nicer way of writingfor<'de> A: serde::Deserialize<'de>
.
Thanks for the advice!
I’m grateful that the Rust community has provided such excellent constructive criticism to help me improve ndarray-csv and understand Rust better. I’m sure that it is still possible to make many improvements, so please send me any design feedback that you have!