Slow Query Performance When Filtering on Code in nbxv1_keypath_info View Under Specific Conditions

NBXplorer is working amazingly well overall, but I’ve encountered a performance issue with the `nbxv1_keypath_info` view under specific conditions.

The view is defined as follows [here](https://github.com/dgarage/NBXplorer/blob/master/NBXplorer/DBScripts/023.KeyPathInfoReturnsIndex.sql):
```sql
CREATE OR REPLACE VIEW nbxv1_keypath_info AS
 SELECT ws.code,
    ws.script,
    s.addr,
    d.metadata AS descriptor_metadata,
    nbxv1_get_keypath(d.metadata, ds.idx) AS keypath,
    ds.metadata AS descriptors_scripts_metadata,
    ws.wallet_id,
    ds.idx,
    ds.used,
    d.descriptor
   FROM ((wallets_scripts ws
     JOIN scripts s ON (((s.code = ws.code) AND (s.script = ws.script))))
     LEFT JOIN ((wallets_descriptors wd
     JOIN descriptors_scripts ds ON (((ds.code = wd.code) AND (ds.descriptor = wd.descriptor))))
     JOIN descriptors d ON (((d.code = ds.code) AND (d.descriptor = ds.descriptor)))) ON (((wd.wallet_id = ws.wallet_id) AND (wd.code = ws.code) AND (ds.script = ws.script))));
```

This view is queried using a WHERE clause that filters by `code` and `script` along with additional predicates [here](https://github.com/dgarage/NBXplorer/blob/75cbba94278555a7ebb01e1de2fb26bddcf38554/NBXplorer/Backend/Repository.cs#L443):
```sql
SELECT ts.script,
       ts.addr,
       ts.derivation,
       ts.keypath,
       ts.redeem
FROM unnest(@records) AS r (script),
     LATERAL
         (SELECT script,
                 addr,
                 descriptor_metadata->>'derivation' derivation,
                 keypath,
                 descriptors_scripts_metadata->>'redeem' redeem,
                 descriptors_scripts_metadata->>'blindedAddress' blinded_addr
          FROM nbxv1_keypath_info ki
          WHERE ki.code=@code
            AND ki.script=r.script) ts;
```

### Issue Details
In my environment, the data distribution is as follows:
- **BTC:** 182,250 records  
- **LTC:** 3,600 records  

When filtering on the `code` and `script` values:
- **BTC Query:** Returns results in about **300ms**.
- **LTC Query:** Returns results in around **20 seconds** (despite LTC having far fewer records)

#### What’s Happening?

PostgreSQL’s query planner relies on up-to-date table statistics to generate efficient execution plans. These statistics are automatically refreshed by the auto-analyze process, which is triggered when a sufficient number of rows have been modified. The decision to run an auto-analyze is based on the formula:

```
total_rows * auto_analyze_scale_factor + auto_analyze_threshold
```

In our scenario, after inserting 72,090 BTC records, an auto-analyze is triggered just before the final BTC record is added. This means that the statistics are refreshed for BTC, and any subsequent query using these fresh statistics will perform optimally. The next auto-analyze would only occur after modifying approximately:

```
182,250 * 0.1 + 50 = 18,275 row_updates
```

However, after the BTC batch, only 3,600 LTC records are added. Since 3,600 is well below the 18,275 threshold, PostgreSQL does not trigger a new auto-analyze. As a result, the statistics for the LTC data remain outdated. This discrepancy causes the query planner to choose a suboptimal execution plan for queries filtering on `code = 'LTC'`, leading to significantly slower performance (20 seconds).

It’s important to note that after manually running an `ANALYZE wallets_scripts;`, the updated statistics allowed the planner to generate an optimal plan, and the query performance improved dramatically.

### Reproduction Steps
1. **Run the following Python script** to simulate the scenario:
```python
import requests
import time

HEADERS = {"Content-Type": "application/json"}

def create_wallet(CRYPTO):
    data = {
        "accountNumber": 0,
        "scriptPubKeyType": "Legacy",
        "__scriptPubKeyType": "Taproot",
        "_scriptPubKeyType": "SegwitP2SH"
    }
    response = requests.post(f"http://localhost:32838/v1/cryptos/{CRYPTO}/derivations", json=data, headers=HEADERS)
    
    if response.status_code == 200:
        derivation_scheme = response.json().get("derivationScheme")
        if derivation_scheme:
            return derivation_scheme
        else:
            print("Error: No derivationScheme in response")
    else:
        print(f"Error creating wallet: {response.status_code} - {response.text}")
    
    return None

def main():
    for i in range(2025):
        print(f"Iteration {i+1}")
        create_wallet("btc")
        time.sleep(0.1)
        
    for i in range(40):
        print(f"Iteration {i+1}")
        create_wallet("ltc")
        time.sleep(0.1)

if __name__ == "__main__":
    main()
``` 

2. **Query the view** using a filter on the `code` and `script` field (`code` should be LTC and the script does not have to exist, you can use a random value).
```sql
SELECT ts.script,
       ts.addr,
       ts.derivation,
       ts.keypath,
       ts.redeem
FROM unnest(ARRAY['76a914fbf7c85733d88415245534a9c0239b63d116660488ac']) AS r (script),
     LATERAL
         (SELECT script,
                 addr,
                 descriptor_metadata->>'derivation' derivation,
                 keypath,
                 descriptors_scripts_metadata->>'redeem' redeem,
                 descriptors_scripts_metadata->>'blindedAddress' blinded_addr
          FROM nbxv1_keypath_info ki
          WHERE ki.code='LTC'
            AND ki.script=r.script) ts;
```

3. **Observe** the performance differences based on the `code` filtering.

### Expected Behavior
Filtering on `code` should yield consistent performance regardless of the underlying data distribution. Ideally, the query planner should use up-to-date statistics to generate an optimal execution plan, even when the auto-analyze threshold is not met by the LTC records.

### Actual Behavior
- **BTC Queries:** Benefit from a recent auto-analyze (triggered multiple times while adding BTC records and a last time jst before adding the last record), returning results quickly (around **300ms**).
- **LTC Queries:** Suffer from outdated statistics due to the absence of an auto-analyze after the BTC batch. This results in a suboptimal execution plan and slower performance (around **3 seconds** to **10 seconds**).

### Environment
- **NBXplorer Version:** Latest  
- **PostgreSQL Version:** Latest
- **PostgreSQL Settings:** Default
- **Data Distribution:** 182,250 BTC records and 3,600 LTC records

### Possible Solution
I was able to improve the query efficiency by changing the `view` to a `function` with parameters (code and script) and filtering on wallets_scripts first instead of calculating the `view` on all table elements. This increased the query performance and 20 seconds transformed to 300ms. I have created a PR which will be attached to this issue.

Thank you for your attention to this matter @NicolasDorier!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slow Query Performance When Filtering on Code in nbxv1_keypath_info View Under Specific Conditions #514

Issue Details

What’s Happening?

Reproduction Steps

Expected Behavior

Actual Behavior

Environment

Possible Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slow Query Performance When Filtering on Code in nbxv1_keypath_info View Under Specific Conditions #514

Description

Issue Details

What’s Happening?

Reproduction Steps

Expected Behavior

Actual Behavior

Environment

Possible Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions