Simplify Joins on Shared Column Name

Currently joining two dataframes on a column of the same name generates ambiguity/duplicates the column.
```python
ctx = SessionContext()
left_df = ctx.from_pydict({"id": [1, 2, 3], "name": ["Alice", "Bob", "Charlie"]})
right_df = ctx.from_pydict({"id": [2, 3, 4], "city": ["New York", "London", "Paris"]})

joined_df = left_df.join(right_df, on="id")
joined_df.select(df.col("id"))
```
Error:
```
Exception: DataFusion error: ...snip..."column 'id' is ambiguous"
...snip..."possible column <hash_1>.id"...snip...possible column <hash_2>.id"...
```

**Describe the solution you'd like**
It would be nice to either:
1.) Have a pyspark like approach where some explicit differentiator in the api drops the duplicate `joined_df = left_df.join(right_df, on=["id"])`
2.) Have a means to refer to the specific column that is unambiguous since internally that is tracked `joined_df = left_df.join(right_df, on="id").drop(joined_df["id"])`

**Describe alternatives you've considered**
This works just is a bit verbose
`joined_df = left_df.join(right_df.with_column_renamed("id", "not_id"), left_on="id", right_on="not_id").drop("not_id")`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify Joins on Shared Column Name #1173

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Simplify Joins on Shared Column Name #1173

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions