-
Notifications
You must be signed in to change notification settings - Fork 216
Description
NBXplorer is working amazingly well overall, but I’ve encountered a performance issue with the nbxv1_keypath_info view under specific conditions.
The view is defined as follows here:
CREATE OR REPLACE VIEW nbxv1_keypath_info AS
SELECT ws.code,
ws.script,
s.addr,
d.metadata AS descriptor_metadata,
nbxv1_get_keypath(d.metadata, ds.idx) AS keypath,
ds.metadata AS descriptors_scripts_metadata,
ws.wallet_id,
ds.idx,
ds.used,
d.descriptor
FROM ((wallets_scripts ws
JOIN scripts s ON (((s.code = ws.code) AND (s.script = ws.script))))
LEFT JOIN ((wallets_descriptors wd
JOIN descriptors_scripts ds ON (((ds.code = wd.code) AND (ds.descriptor = wd.descriptor))))
JOIN descriptors d ON (((d.code = ds.code) AND (d.descriptor = ds.descriptor)))) ON (((wd.wallet_id = ws.wallet_id) AND (wd.code = ws.code) AND (ds.script = ws.script))));This view is queried using a WHERE clause that filters by code and script along with additional predicates here:
SELECT ts.script,
ts.addr,
ts.derivation,
ts.keypath,
ts.redeem
FROM unnest(@records) AS r (script),
LATERAL
(SELECT script,
addr,
descriptor_metadata->>'derivation' derivation,
keypath,
descriptors_scripts_metadata->>'redeem' redeem,
descriptors_scripts_metadata->>'blindedAddress' blinded_addr
FROM nbxv1_keypath_info ki
WHERE ki.code=@code
AND ki.script=r.script) ts;Issue Details
In my environment, the data distribution is as follows:
- BTC: 182,250 records
- LTC: 3,600 records
When filtering on the code and script values:
- BTC Query: Returns results in about 300ms.
- LTC Query: Returns results in around 20 seconds (despite LTC having far fewer records)
What’s Happening?
PostgreSQL’s query planner relies on up-to-date table statistics to generate efficient execution plans. These statistics are automatically refreshed by the auto-analyze process, which is triggered when a sufficient number of rows have been modified. The decision to run an auto-analyze is based on the formula:
total_rows * auto_analyze_scale_factor + auto_analyze_threshold
In our scenario, after inserting 72,090 BTC records, an auto-analyze is triggered just before the final BTC record is added. This means that the statistics are refreshed for BTC, and any subsequent query using these fresh statistics will perform optimally. The next auto-analyze would only occur after modifying approximately:
182,250 * 0.1 + 50 = 18,275 row_updates
However, after the BTC batch, only 3,600 LTC records are added. Since 3,600 is well below the 18,275 threshold, PostgreSQL does not trigger a new auto-analyze. As a result, the statistics for the LTC data remain outdated. This discrepancy causes the query planner to choose a suboptimal execution plan for queries filtering on code = 'LTC', leading to significantly slower performance (20 seconds).
It’s important to note that after manually running an ANALYZE wallets_scripts;, the updated statistics allowed the planner to generate an optimal plan, and the query performance improved dramatically.
Reproduction Steps
- Run the following Python script to simulate the scenario:
import requests
import time
HEADERS = {"Content-Type": "application/json"}
def create_wallet(CRYPTO):
data = {
"accountNumber": 0,
"scriptPubKeyType": "Legacy",
"__scriptPubKeyType": "Taproot",
"_scriptPubKeyType": "SegwitP2SH"
}
response = requests.post(f"http://localhost:32838/v1/cryptos/{CRYPTO}/derivations", json=data, headers=HEADERS)
if response.status_code == 200:
derivation_scheme = response.json().get("derivationScheme")
if derivation_scheme:
return derivation_scheme
else:
print("Error: No derivationScheme in response")
else:
print(f"Error creating wallet: {response.status_code} - {response.text}")
return None
def main():
for i in range(2025):
print(f"Iteration {i+1}")
create_wallet("btc")
time.sleep(0.1)
for i in range(40):
print(f"Iteration {i+1}")
create_wallet("ltc")
time.sleep(0.1)
if __name__ == "__main__":
main()- Query the view using a filter on the
codeandscriptfield (codeshould be LTC and the script does not have to exist, you can use a random value).
SELECT ts.script,
ts.addr,
ts.derivation,
ts.keypath,
ts.redeem
FROM unnest(ARRAY['76a914fbf7c85733d88415245534a9c0239b63d116660488ac']) AS r (script),
LATERAL
(SELECT script,
addr,
descriptor_metadata->>'derivation' derivation,
keypath,
descriptors_scripts_metadata->>'redeem' redeem,
descriptors_scripts_metadata->>'blindedAddress' blinded_addr
FROM nbxv1_keypath_info ki
WHERE ki.code='LTC'
AND ki.script=r.script) ts;- Observe the performance differences based on the
codefiltering.
Expected Behavior
Filtering on code should yield consistent performance regardless of the underlying data distribution. Ideally, the query planner should use up-to-date statistics to generate an optimal execution plan, even when the auto-analyze threshold is not met by the LTC records.
Actual Behavior
- BTC Queries: Benefit from a recent auto-analyze (triggered multiple times while adding BTC records and a last time jst before adding the last record), returning results quickly (around 300ms).
- LTC Queries: Suffer from outdated statistics due to the absence of an auto-analyze after the BTC batch. This results in a suboptimal execution plan and slower performance (around 3 seconds to 10 seconds).
Environment
- NBXplorer Version: Latest
- PostgreSQL Version: Latest
- PostgreSQL Settings: Default
- Data Distribution: 182,250 BTC records and 3,600 LTC records
Possible Solution
I was able to improve the query efficiency by changing the view to a function with parameters (code and script) and filtering on wallets_scripts first instead of calculating the view on all table elements. This increased the query performance and 20 seconds transformed to 300ms. I have created a PR which will be attached to this issue.
Thank you for your attention to this matter @NicolasDorier!