Skip to content

Conversation

@Anton-Tarazi
Copy link
Contributor

Resolves #2673

Rationale for this change

_SnapshotProducer._summary() copies the metadata for every added / deleted DataFile. This is pretty expensive. Instead we just copy it once at the beginning of the function and use the same value each DataFile.

On my data, which overwrites a few million rows at a time, I saw the time for table.overwrite go from ~20 seconds to ~6 seconds.

Are these changes tested?

Yes, existing unit / integration tests

Are there any user-facing changes?

Just faster writes :)

f Please enter the commit message for your changes. Lines starting
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_SnapshotProducer._summary() unreasonably slow

1 participant