pandas.to_sql fails when writing more than 256 cells to a SQL Warehouse with method="multi"

This is a duplicate of https://github.com/databricks/databricks-sql-python/issues/300 . I am reposting it in this repo, as it still occurs with current versions of `databricks-sqlalchemy~=1.0` (1.0.5) and `databricks-sql-connector` (4.0.2) together with the DBR 15.4 LTS ML versions of pandas (1.5.3) and sqlalchemy (1.4.39).

# Problem

When using the multi-row insert mode of sqlalchemy, currently each value gets its own named parameter in the generated query:

```sql
INSERT INTO default.test_table (numeric_col, string_col) 
VALUES 
   (:numeric_col_m0, :string_col_m0), 
   (:numeric_col_m1, :string_col_m1), 
   (:numeric_col_m2, :string_col_m2), 
   (:numeric_col_m3, :string_col_m3),
   [...]
   (:numeric_col_m998, :string_col_m998), 
   (:numeric_col_m999, :string_col_m999)
```

with parameters:

```
{
'numeric_col_m0': 0, 
'string_col_m0': 'AAA', 
'numeric_col_m1': 1,
'string_col_m1': 'AAA', 
'numeric_col_m2',
[...], 
'numeric_col_m999': 999, 
'string_col_m999': 'AAA'
}
```

Consequently, even very small tables exceed the limit of 256 named parameters:

```
sqlalchemy.exc.OperationalError: (databricks.sql.exc.RequestError) Error during request to server: BAD_REQUEST: Parameterized query has too many parameters: 2000 parameters were given but the limit is 256.. BAD_REQUEST: Parameterized query has too many parameters: 2000 parameters were given but the limit is 256.
```

# Example

```
import pandas as pd 
from sqlalchemy import create_engine

sqlalchemy_connection_string = f"databricks://token:{token}@{host}?http_path={http_path}?catalog={catalog}"
engine = create_engine(sqlalchemy_connection_string)
test_data = pd.DataFrame({"numeric_col": range(1_000), "string_col": ["AAA"] * 1_000})
test_data.to_sql("test_table", engine, if_exists="replace", index=False, method="multi")
```

# Workarounds

## Inline Parameters
I tried to avoid this issue by using legacy inline parameters. However, I get a different error then:
```
sqlalchemy_connection_string = f"databricks://token:{token}@{host}?http_path={http_path}?catalog={catalog}"
engine = create_engine(sqlalchemy_connection_string, connect_args={"use_inline_params": "silent"})
test_data = pd.DataFrame({"numeric_col": range(1_000), "string_col": ["AAA"] * 1_000})
test_data.to_sql("test_table", engine, if_exists="replace", index=False, method="multi")
```
```
DatabaseError: (databricks.sql.exc.ServerOperationError) [UNBOUND_SQL_PARAMETER] Found the unbound parameter: numeric_col_m0. Please, fix `args` and provide a mapping of the parameter to either a SQL literal or collection constructor functions such as `map()`, `array()`, `struct()`. SQLSTATE: 42P02; line 1 pos 57
```

## Writing row-by-row
The only workaround at the moment seems to be to insert the data row-by-row by not setting `method` in the `to_sql()` method. 
However, this is prohibitively slow even for medium-sized data frames:

![Image](https://github.com/user-attachments/assets/ea36b324-af0c-49fc-99ca-13ad01984129)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pandas.to_sql fails when writing more than 256 cells to a SQL Warehouse with method="multi" #24

Problem

Example

Workarounds

Inline Parameters

Writing row-by-row

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pandas.to_sql fails when writing more than 256 cells to a SQL Warehouse with method="multi" #24

Description

Problem

Example

Workarounds

Inline Parameters

Writing row-by-row

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions