Skip to content

Conversation

mlafeldt
Copy link
Collaborator

@mlafeldt mlafeldt commented Sep 10, 2025

The current cache key generation only uses the database path and ignores query parameters. This causes cache collisions when multiple users connect to the same database instance with different session/auth params, leading to incorrect connection sharing.

Fixes https://github.com/duckdblabs/duckdb-internal/issues/5534

@mlafeldt mlafeldt force-pushed the fix-cache-key-collision branch from 23e4e6f to 36e3c60 Compare September 10, 2025 11:02
@mlafeldt mlafeldt changed the title Fix instance cache key collisions with different configurations WIP: Fix instance cache key collisions with different configurations Sep 10, 2025
@mlafeldt mlafeldt force-pushed the fix-cache-key-collision branch from 36e3c60 to 9228470 Compare September 10, 2025 11:57
@mlafeldt mlafeldt changed the title WIP: Fix instance cache key collisions with different configurations Fix instance cache key collisions with different configurations Sep 10, 2025

// Configuration parameters that should affect cache keys. These are authentication
// and session parameters that distinguish different connections to the same database.
var cacheKeyParams = []string{
Copy link
Collaborator Author

@mlafeldt mlafeldt Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to more suggestions. This list is certainly not exhaustive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also allow users to override this global.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding of the issue, users try to pass a unique token here, which is then used to disambiguate two connections (whose parameters otherwise overlap). So maybe it is better to make this completely configurable by the user, without providing any pre-configured values?

@mlafeldt mlafeldt self-assigned this Sep 12, 2025

// Configuration parameters that should affect cache keys. These are authentication
// and session parameters that distinguish different connections to the same database.
var cacheKeyParams = []string{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding of the issue, users try to pass a unique token here, which is then used to disambiguate two connections (whose parameters otherwise overlap). So maybe it is better to make this completely configurable by the user, without providing any pre-configured values?

paramString := relevantParams.Encode()
hash := sha256.Sum256([]byte(paramString))

return fmt.Sprintf("%s#%x", basePath, hash[:8]) // Use first 8 bytes of hash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding: why only the first 8 bytes of the hash?

paramString := relevantParams.Encode()
hash := sha256.Sum256([]byte(paramString))

return fmt.Sprintf("%s#%x", basePath, hash[:8]) // Use first 8 bytes of hash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, how does the # work here? If we use, e.g., hello.db#myhash as the DB path, isn't that going to create unique new files with the different hashes instead of, e.g., opening an existing file? I tried to have a look at the C++ side of this: DBInstanceCache::GetOrCreateInstance calls into either CreateInstanceInternal or GetInstanceInternal, both of which call GetDBAbsolutePath(database, *local_fs); on the database string (path). I can't find any special-handling of the #, but maybe I missed it / miss something here?

@taniabogatsch
Copy link
Collaborator

I just had a look at duckdb/duckdb#13129, and my understanding is that the changes in that PR are not yet in go-duckdb. I wonder if we have to do anything on the go-duckdb side at all? Once that change is in, we just pass our key-value (user-defined) parameters into the connection, and DuckDB will disambiguate?

@mlafeldt
Copy link
Collaborator Author

The current PR is nonsense as it stands. Let's just say that my mental model was wrong. 😄

From what I understand, duckdb/duckdb#13129 only added validation (which was shipped with DuckDB 1.4.0), and does not allow multiple users with different auth to connect to the same database file, or am I missing something?

@mlafeldt mlafeldt closed this by deleting the head repository Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants