Skip to content

larger than memory queries? #61

@mckeown12

Description

@mckeown12

I'm hoping to use Athena with Dask, performing queries which return 10-30 GB and then training some distributed ML algorithms. Any suggestions for concurrent/distributed io for such a task? I've been quite happy with the pandas cursor for smaller local use, following the examples in the pyAthena documentation, but I still have no idea what I am actually doing-- does the pandas cursor do concurrent io, or is it limited to one core?

I apologize in advance if this question belongs on some other forum-- let me know and I'll gladly move the conversation there. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions