Skip to content

Make API more idiomatic, simplify code, improve style #612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

H2CO3
Copy link
Contributor

@H2CO3 H2CO3 commented Jun 3, 2025

What does this PR do?

There are a number of issues with the current code style, for example:

  • The API sometimes forces sub-optimal choices (e.g. the use of &Option<T> forces the caller to have an owned T, whereas Option<&T> doesn't, and can be obtained from an Option<T> via Option::<T>::as_ref())
  • The implementation was sometimes overly verbose (e.g. heavily nested if let "triangles of hell", which can be replaced by let … else and early returns)
  • There have been some notable omissions (e.g. the source() method on impl Error for SafeTensorsError)
  • Some dependencies (notably, Criterion) were out-of-date and the code used deprecated functions
  • Other, minor things such as unnecessary allocation of Strings in value formatting, extensively commented-out code (that could just be deleted since it's already in version control history), or manual, verbose, inline impls of Option combinators such as map_or.

This PR fixes the aforementioned stylistic and API issues. Note that some of these are technically breaking change, but since the library is still relatively young, and hasn't reached 1.0 yet, this should not be a problem.

@H2CO3
Copy link
Contributor Author

H2CO3 commented Jun 4, 2025

@Narsil I believe this PR is now complete.

@H2CO3
Copy link
Contributor Author

H2CO3 commented Jun 10, 2025

@Narsil Hey, can I possibly get some feedback on this? Thanks!

@H2CO3
Copy link
Contributor Author

H2CO3 commented Jun 12, 2025

This clearly isn't going anywhere; closing.

@H2CO3 H2CO3 closed this Jun 12, 2025
@Narsil
Copy link
Collaborator

Narsil commented Jun 12, 2025

Hey,

First of all, sorry to not have seen that PR before. I just don't read PRs on every repo every day.

Now, thanks a lot for the propositions. I'll break down my comments here:

  • There are a lot of stylistic choices. I tend to defer anything stylistic to linters or similar tool, that avoids the bikeshedding. If they don't complain, then the code is fine.

  • You claim there are some over allocation, those would be really interesting to fix, but since there are so many changes, I'm not sure which are actually allocation related, could you split PRs into small unitary PRs?

  • For the error source + general display, do you have any specific examples where the displays where annoying to work with in particular ? Especially since we have to deal mostly with a python surface, it's important to have good error messages in Python too.

In order to merge PRs, I think having very small ones would be an easier start. This crate and python package is relied upon by many, so I try to make changes as few as possible (so breaking is much less likely).

@H2CO3
Copy link
Contributor Author

H2CO3 commented Jun 12, 2025

Hi there!

There are a lot of stylistic choices. I tend to defer anything stylistic to linters or similar tool, that avoids the bikeshedding. If they don't complain, then the code is fine.

Most of the improvements in this PR are not, in fact, purely stylistic, perhaps except for some newlines that make trait impls with many consecutive fns easier on the eye, and the removal of superfluous parentheses in a few places (2, IIRC). The rest of the changes make the code objectively better, in the ways described in my very first comment.

You claim there are some over allocation, those would be really interesting to fix, but since there are so many changes, I'm not sure which are actually allocation related

Again, please see the thread starter comment. It's mostly the forcing of &Option<T> instead of Option<&T>, which forces allocation as explained there in more detail. Unfortunately, this is a breaking change on the Rust side, although not on the Python side. (And it's a lot less annoying in Rust, since Cargo's strict adherence to semver ensures that nothing that already depends on the package will break anyway.)

There's also this unnecessary .into_iter().collect(), which doesn't do anything except for allocating a new vector and invalidating the old one.

For the error source + general display, do you have any specific examples where the displays where annoying to work with in particular ? Especially since we have to deal mostly with a python surface, it's important to have good error messages in Python too.

The error messages (and in some cases, Display impls) were mostly just deferred to the automatically-derived Debug impls, which is a practice that the official Rust API guidelines explicitly recommend against, because:

  1. Debug formatting was never meant to be shown to users.
  2. Debug formatting is not stable (it can change between Rust versions, toolchains, etc.), whereas user-facing Display and Error impls are part of the API and should be stable, therefore they can't ever be implemented correctly by just delegating to Debug (unless you know it's stable because you implemented it manually, but then it's better to implement Display by hand and delegate Debug to Display instead).

In order to merge PRs, I think having very small ones would be an easier start. This crate and python package is relied upon by many, so I try to make changes as few as possible (so breaking is much less likely).

I don't think the API of the Python package is affected in any way. Please see the individual commits for different aspects of the changes. I'm sorry but you also have to understand that it would be really hard for me to basically un-do everything but some very specific changes. The changes in this PR are coherent, and you don't have to take them at face value – feel free to ask any other experienced Rust programmer and they'll confirm that they improve the code.

@Narsil
Copy link
Collaborator

Narsil commented Jun 12, 2025

Do you mind opening small PRs instead of this giant one.

I see a few things slip which are not exactly obvious and making sure none slips review is hard.

Keep

write!(f, "{}...") 

everywhere. It's just easier to read when code is uniform.
index.fmt(..)-> Same comment.
Those are the first ones I saw.

Anything that touches comments is very easy to review since it doesn't touch any logic.
Upgrading the deps is easy (if it's only number bumping) If it modifies any of the underlying code, it's easier to showcase it. For instance I don't see why the extra black_box is required. (To be fair some benchmarks are not really used anyway, the python ones are the relevant ones since they are actually doing comparisons).

There's also this unnecessary .into_iter().collect(), which doesn't do anything except for allocating a new vector and invalidating the old one.

Definitely a leftover, thanks for flagging.

Ah and another one, remove the Into<Vec<usize>>. I tried to avoid accepting traits anywhere not strictly necessary, I think it is cleaner to avoid the generic bloats everywhere.

The commented out code for the TensorIndexer should be left alone (yes it is in version control), but it's also purposefully not enabled since most tensors never exceed certain dimensions, however, it should be trivial to add for users that want to play around. There is no downside to it.

@H2CO3
Copy link
Contributor Author

H2CO3 commented Jun 12, 2025

I see a few things slip which are not exactly obvious and making sure none slips review is hard.
Keep

write!(f, "{}...")

everywhere. It's just easier to read when code is uniform.
§index.fmt(..)§-> Same comment.
Those are the first ones I saw.

Oh, that's definitely intentional, because the formatting mechanism is slow, so it's unnecessarily wasteful to write!() when there's nothing to format. A great example is all-unit variant enums such as Framework, which always write a constant string, so there's exactly one single call to Formatter::write_str() needed.

Ah and another one, remove the Into<Vec>. I tried to avoid accepting traits anywhere not strictly necessary, I think it is cleaner to avoid the generic bloats everywhere.

Okay, I can do that.

@H2CO3
Copy link
Contributor Author

H2CO3 commented Jun 13, 2025

@Narsil I have split this into 4 separate PRs, split by the "theme" of the PR:

  1. Update dependencies #614
  2. Simplify code and make it more robust #615
  3. Better error handling through improved Display and Error impls #616
  4. Do not force &Option<T> in public API; use Option<&T> instead #617

I hope this is now acceptable, and you can review and hopefully merge them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants