Shouldn't  API client pass `stream = True` to the `requests` when downloading datasets?

Since the `requests` package seem to exhaust system RAM as default behavior, I think some api should pass `stream = True` that allows chunked download. Current implementation hardcode as `stream=None` (equivalent to False) and this can make the user's system unstable when downloading large datasets.

https://github.com/Kaggle/kaggle-api/blob/b97668bbeab7db7a51784275803cbb3eeec6d481/src/kagglesdk/kaggle_http_client.py#L123

The `download_file` method in `KaggleApi` class tries to support chunked downloads but I am not sure this code works as expected because the downloading would be considered complete at this point.

https://github.com/Kaggle/kaggle-api/blob/b97668bbeab7db7a51784275803cbb3eeec6d481/src/kaggle/api/kaggle_api_extended.py#L2181

And I think the current usage of the `kaggle.http_client()` outside of the `with self.build_kaggle_client() as kaggle:` statement is not recommended because resource managed by `kaggle` object might be closed outside the `with` statement.

```py
with self.build_kaggle_client() as kaggle:
  ...

download_file(..., kaggle.http_client(), ...)
```

ex.
https://github.com/Kaggle/kaggle-api/blob/b97668bbeab7db7a51784275803cbb3eeec6d481/src/kaggle/api/kaggle_api_extended.py#L1187-L1196



	with self.build_kaggle_client() as kaggle:
	request = ApiDownloadDataFileRequest()
	request.competition_name = competition
	request.file_name = file_name
	response = kaggle.competitions.competition_api_client.download_data_file(request)
	url = response.history[0].url
	outfile = os.path.join(effective_path, url.split('?')[0].split('/')[-1])

	if force or self.download_needed(response, outfile, quiet):
	self.download_file(response, outfile, kaggle.http_client(), quiet, not force)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shouldn't API client pass `stream = True` to the `requests` when downloading datasets? #754

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Shouldn't API client pass stream = True to the requests when downloading datasets? #754

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Shouldn't API client pass `stream = True` to the `requests` when downloading datasets? #754