Skip to content

Remove hadoop-aws from default Spark configuration in Python and R libraries #2487

@johngrimes

Description

@johngrimes

Is your feature request related to a problem? Please describe.
Currently, the org.apache.hadoop:hadoop-aws package is included in the default Spark configuration for the Python and R libraries. This dependency is not necessary for most users by default, and its inclusion may introduce unnecessary complexity or conflicts for environments that do not require AWS S3 support.

Describe the solution you'd like
Remove the org.apache.hadoop:hadoop-aws package from the default Spark configuration in both the Python and R libraries. Instead, users who require AWS S3 integration should be directed to follow the instructions provided in the documentation: https://pathling.csiro.au/docs/libraries/installation/spark

Describe alternatives you've considered

  • Keeping the package by default: This is not ideal, as it adds overhead for users who do not need AWS support.
  • Making the dependency optional/documented: This is preferable and can be achieved by providing clear instructions for users who want to add it themselves.

Additional context
Removing this dependency will simplify the default configuration and reduce the initial dependency download for most use cases. Users who require S3 support can still enable it by following the provided documentation link.

Metadata

Metadata

Assignees

Labels

dependenciesPull requests that update a dependency file

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions