-
Notifications
You must be signed in to change notification settings - Fork 152
Fix instance cache key collisions with different configurations #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
23e4e6f
to
36e3c60
Compare
36e3c60
to
9228470
Compare
|
||
// Configuration parameters that should affect cache keys. These are authentication | ||
// and session parameters that distinguish different connections to the same database. | ||
var cacheKeyParams = []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to more suggestions. This list is certainly not exhaustive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also allow users to override this global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding of the issue, users try to pass a unique token here, which is then used to disambiguate two connections (whose parameters otherwise overlap). So maybe it is better to make this completely configurable by the user, without providing any pre-configured values?
|
||
// Configuration parameters that should affect cache keys. These are authentication | ||
// and session parameters that distinguish different connections to the same database. | ||
var cacheKeyParams = []string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding of the issue, users try to pass a unique token here, which is then used to disambiguate two connections (whose parameters otherwise overlap). So maybe it is better to make this completely configurable by the user, without providing any pre-configured values?
paramString := relevantParams.Encode() | ||
hash := sha256.Sum256([]byte(paramString)) | ||
|
||
return fmt.Sprintf("%s#%x", basePath, hash[:8]) // Use first 8 bytes of hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding: why only the first 8 bytes of the hash?
paramString := relevantParams.Encode() | ||
hash := sha256.Sum256([]byte(paramString)) | ||
|
||
return fmt.Sprintf("%s#%x", basePath, hash[:8]) // Use first 8 bytes of hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, how does the #
work here? If we use, e.g., hello.db#myhash
as the DB path, isn't that going to create unique new files with the different hashes instead of, e.g., opening an existing file? I tried to have a look at the C++ side of this: DBInstanceCache::GetOrCreateInstance
calls into either CreateInstanceInternal
or GetInstanceInternal
, both of which call GetDBAbsolutePath(database, *local_fs);
on the database
string (path). I can't find any special-handling of the #
, but maybe I missed it / miss something here?
I just had a look at duckdb/duckdb#13129, and my understanding is that the changes in that PR are not yet in go-duckdb. I wonder if we have to do anything on the go-duckdb side at all? Once that change is in, we just pass our key-value (user-defined) parameters into the connection, and DuckDB will disambiguate? |
The current PR is nonsense as it stands. Let's just say that my mental model was wrong. 😄 From what I understand, duckdb/duckdb#13129 only added validation (which was shipped with DuckDB 1.4.0), and does not allow multiple users with different auth to connect to the same database file, or am I missing something? |
The current cache key generation only uses the database path and ignores query parameters. This causes cache collisions when multiple users connect to the same database instance with different session/auth params, leading to incorrect connection sharing.
Fixes https://github.com/duckdblabs/duckdb-internal/issues/5534