Skip to content

Add support for Google Cloud Storage #262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

kgodlewski
Copy link
Contributor

@kgodlewski kgodlewski commented Jun 4, 2025

Summary by Sourcery

Add support for Google Cloud Storage by returning provider information from the file‐URL fetch API, routing uploads to provider‐specific implementations, and updating tests and CI to cover both Azure and GCP

New Features:

  • Support uploading files to Google Cloud Storage using signed URLs
  • Extend fetch_file_storage_urls to return storage provider along with URL

Enhancements:

  • Dispatch file uploads to provider-specific functions (upload_to_azure or upload_to_gcp)
  • Implement resumable, chunked GCP uploads with backoff and retry logic
  • Update end-to-end download logic to use generic signed URL endpoint and HTTP requests
  • Configure pytest for async tests and adjust test fixtures for multiple providers

CI:

  • Extend GitHub Actions e2e workflow matrix to run against both AZURE and GCP targets and streamline Python version matrix

Tests:

  • Parameterize and extend unit tests to cover both Azure and GCP upload flows and error handling
  • Add tests to verify correct provider dispatch and retry behavior for temporary versus terminal errors

Copy link

sourcery-ai bot commented Jun 4, 2025

Reviewer's Guide

This PR integrates Google Cloud Storage alongside Azure by refactoring the upload flow into provider‐specific functions, enriching the signed URL API with provider metadata, and updating tests and CI to support multi‐provider scenarios.

File-Level Changes

Change Details Files
Introduce multi‐provider dispatch in FileUploaderThread
  • Extract (provider, url) from fetch_file_storage_urls and pass into _upload_file
  • Change _upload_file signature to accept provider and route to upload_to_azure or upload_to_gcp
  • Raise error on unsupported provider
src/neptune_scale/sync/sync_process.py
Refactor generic upload into provider-specific modules
  • Remove generic upload_file function
  • Implement upload_to_azure with retry and backoff
  • Add upload_to_gcp module with chunked, resumable transfers and retry logic
src/neptune_scale/sync/sync_process.py
src/neptune_scale/sync/google_storage.py
Extend signed‐URL handling to include provider info
  • Change fetch_file_storage_urls to return dict of path -> (provider, url)
  • Update ApiClient to use signed_url_generic endpoint
  • Adjust e2e download flow to use HTTP GET instead of BlobClient
src/neptune_scale/sync/sync_process.py
src/neptune_scale/net/api_client.py
tests/e2e/test_fetcher/files.py
Parameterize and refactor unit tests for multi‐provider
  • Add pytest fixture to loop over 'azure' and 'gcp'
  • Replace mock_upload_file with provider‐aware mock_upload_func
  • Update assertions to validate calls per provider and error classification
tests/unit/test_sync_process.py
Update CI matrix and project configuration for Azure/GCP
  • Reduce supported Python versions and add env_target matrix for AZURE/GCP
  • Use dynamic secret names per provider and allow self-signed certificates
  • Add pytest asyncio configuration in pyproject.toml
.github/workflows/tests-e2e.yml
pyproject.toml

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@kgodlewski kgodlewski force-pushed the dev/google-storage branch 25 times, most recently from 8705a7b to f327fc3 Compare June 6, 2025 11:35
@kgodlewski kgodlewski marked this pull request as ready for review June 6, 2025 11:36
@kgodlewski kgodlewski requested review from pitercl and michalsosn and removed request for pitercl June 6, 2025 11:36
@kgodlewski kgodlewski force-pushed the dev/google-storage branch 4 times, most recently from e0cc9f5 to e0a4a6c Compare June 6, 2025 14:32
@kgodlewski kgodlewski force-pushed the dev/google-storage branch from e0a4a6c to ea4b793 Compare June 6, 2025 15:24
NEPTUNE_E2E_CUSTOM_RUN_ID: ${{ vars.E2E_CUSTOM_RUN_ID }}
NEPTUNE_FILE_API_ENABLED: ${{ vars.NEPTUNE_FILE_API_ENABLED }}
NEPTUNE_ALLOW_SELF_SIGNED_CERTIFICATE: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It this temporary or supposed to stay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is temporary until the test instance is finalized

freezegun
numpy
neptune-api @ git+https://github.com/neptune-ai/neptune-api.git@dev/storage-v2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder: to be removed.

@kgodlewski kgodlewski force-pushed the dev/google-storage branch 3 times, most recently from 26cdab3 to 1082c48 Compare June 9, 2025 15:38
Get rid of querying resume position on error: just reupload chunks
@kgodlewski kgodlewski force-pushed the dev/google-storage branch from 1082c48 to 5698574 Compare June 9, 2025 15:40
@kgodlewski kgodlewski requested a review from pitercl June 9, 2025 15:42
@kgodlewski kgodlewski requested a review from pitercl June 11, 2025 13:19
@kgodlewski kgodlewski force-pushed the dev/google-storage branch 4 times, most recently from 3665b65 to 8f521e4 Compare June 11, 2025 14:52
@kgodlewski kgodlewski force-pushed the dev/google-storage branch from 8f521e4 to a915ab0 Compare June 11, 2025 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants