-
|
I'm trying to create an icechunk repo on the Pangeo-EOSC s3-compatible MinIO storage. I've got I get back: IcechunkError: x error writing object to object store service error
|
| context:
| 0: icechunk::storage::s3::write_snapshot
| with id=1CECHNKREP0F1RSTCMT0
| at icechunk/src/storage/s3.rs:653
| 1: icechunk::asset_manager::write_snapshot
| at icechunk/src/asset_manager.rs:201
| 2: icechunk::repository::create
| at icechunk/src/repository.rs:146
|
|-> error writing object to object store service error
|-> service error
|-> unhandled error (AccessDenied)
`-> Error { code: "AccessDenied", message: "There were headers present in the request which were not signed", s3_extended_request_id:
"dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8", aws_request_id: "18566CF4F5F556CB" }What should I try next? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 9 replies
-
|
Hello @rsignell, thank you for reporting this. We do most of our testing using MinIO and we have never seen this. Do you have any proxies between you and MinIO? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
Although I'm able to create storage on the Pangeo-EOSC storage, I'm still struggling to create an icechunk repo. I'm close, but my code here: import icechunk
import os
def open_or_create_icechunk_repo(
bucket_name: str,
store_name: str,
endpoint_url: str,
region: str = "us-west-2"
):
"""
Opens or creates an icechunk repository on an S3-compatible storage endpoint.
This function assumes that AWS credentials (AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY) are available as environment variables.
Args:
bucket_name (str): The name of the S3 bucket.
store_name (str): A unique name for your data store within the bucket.
This will be used as a prefix.
endpoint_url (str): The full URL of the S3-compatible endpoint.
region (str, optional): The AWS region. Defaults to "us-west-2".
Returns:
icechunk.Repository: An initialized repository object, or None if an
error occurs.
"""
print(f"Attempting to open/create repo '{store_name}' in bucket '{bucket_name}'...")
print(f"Endpoint: {endpoint_url}, Region: {region}")
try:
# 1. Configure the main storage backend for repository metadata.
# We pass the endpoint_url here and specify that credentials
# should be sourced from the environment.
storage = icechunk.s3_storage(
bucket=bucket_name,
prefix=f"icechunk/{store_name}",
from_env=True, # Use AWS credentials from environment variables
endpoint_url=endpoint_url,
region=region
)
# 2. Set up the general repository configuration.
config = icechunk.RepositoryConfig.default()
# 3. Configure the virtual chunk container for the actual data chunks.
# It's crucial to also provide the endpoint_url here so the repository
# knows where to read/write the data chunks.
s3_chunk_store = icechunk.s3_store(
region=region,
endpoint_url=endpoint_url
)
config.set_virtual_chunk_container(
icechunk.VirtualChunkContainer(
f"s3://{bucket_name}/", # The protocol and host it handles
s3_chunk_store
)
)
# 4. Assemble all configuration options for creation.
repo_config = {
"storage": storage,
"config": config,
}
# 5. Initialize the repository. This is the correct method for creation.
# Note: This will likely raise an error if the repository already exists.
# A more robust implementation would be to try opening first, and only
# initialize if that fails.
print("Attempting to initialize repository...")
repo = icechunk.Repository.create(**repo_config)
print("Successfully initialized icechunk repository.")
return repo
except Exception as e:
# A common error here might be that the repository already exists.
# We can try to open it in that case.
if "already exists" in str(e): # This is a guess at the error message
try:
print("Repository seems to exist. Attempting to open...")
# If initialization fails because it exists, we open it.
# Note: We only need the 'storage' object to open.
repo = icechunk.Repository.open(storage)
print("Successfully opened existing repository.")
return repo
except Exception as open_e:
print(f"Failed to open existing repository: {open_e}")
return None
else:
print(f"An error occurred while creating the repository: {e}")
return None
# --- Example Usage ---
# Before running, make sure to set your environment variables, for example:
# export AWS_ACCESS_KEY_ID='YOUR_ACCESS_KEY'
# export AWS_SECRET_ACCESS_KEY='YOUR_SECRET_KEY'
if __name__ == "__main__":
# Replace with your actual bucket, store name, and endpoint URL
MY_BUCKET = "rsignell4-protocoast"
MY_STORE_NAME = "scientific-dataset-1"
MY_ENDPOINT_URL = "https://pangeo-eosc-minioapi.vm.fedcloud.eu"
# Check if credentials are set (basic check)
if not os.getenv("AWS_ACCESS_KEY_ID") or not os.getenv("AWS_SECRET_ACCESS_KEY"):
print("Error: AWS credentials are not set in the environment.")
else:
icechunk_repo = open_or_create_icechunk_repo(
bucket_name=MY_BUCKET,
store_name=MY_STORE_NAME,
endpoint_url=MY_ENDPOINT_URL
)
if icechunk_repo:
# You can now work with the repository object
print("Repository is ready to use.")
# For example, you can check its state:
# print(f"Repository state: {icechunk_repo.state()}")
is returning: @paraseba do you see my issue? |
Beta Was this translation helpful? Give feedback.
-
|
Woohoo @paraseba , that was it! import icechunk
import os
def open_or_create_icechunk_repo(
bucket_name: str,
store_name: str,
endpoint_url: str,
region: str = "us-west-2"
):
"""
Opens or creates an icechunk repository on an S3-compatible storage endpoint.
This function assumes that AWS credentials (AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY) are available as environment variables.
Args:
bucket_name (str): The name of the S3 bucket.
store_name (str): A unique name for your data store within the bucket.
This will be used as a prefix.
endpoint_url (str): The full URL of the S3-compatible endpoint.
region (str, optional): The AWS region. Defaults to "us-west-2".
Returns:
icechunk.Repository: An initialized repository object, or None if an
error occurs.
"""
print(f"Attempting to open/create repo '{store_name}' in bucket '{bucket_name}'...")
print(f"Endpoint: {endpoint_url}, Region: {region}")
try:
# 1. Configure the main storage backend for repository metadata.
# We pass the endpoint_url here and specify that credentials
# should be sourced from the environment.
storage = icechunk.s3_storage(
bucket=bucket_name,
prefix=f"icechunk/{store_name}",
from_env=True, # Use AWS credentials from environment variables
endpoint_url=endpoint_url,
force_path_style=True,
region=region
)
# 2. Set up the general repository configuration.
config = icechunk.RepositoryConfig.default()
# 3. Configure the virtual chunk container for the actual data chunks.
# It's crucial to also provide the endpoint_url here so the repository
# knows where to read/write the data chunks.
s3_chunk_store = icechunk.s3_store(
region=region,force_path_style=True,
endpoint_url=endpoint_url
)
config.set_virtual_chunk_container(
icechunk.VirtualChunkContainer(
f"s3://{bucket_name}/", # The protocol and host it handles
s3_chunk_store
)
)
# 4. Assemble all configuration options for creation.
repo_config = {
"storage": storage,
"config": config,
}
# 5. Initialize the repository. This is the correct method for creation.
# Note: This will likely raise an error if the repository already exists.
# A more robust implementation would be to try opening first, and only
# initialize if that fails.
print("Attempting to initialize repository...")
repo = icechunk.Repository.create(**repo_config)
print("Successfully initialized icechunk repository.")
return repo
except Exception as e:
# A common error here might be that the repository already exists.
# We can try to open it in that case.
if "already exists" in str(e): # This is a guess at the error message
try:
print("Repository seems to exist. Attempting to open...")
# If initialization fails because it exists, we open it.
# Note: We only need the 'storage' object to open.
repo = icechunk.Repository.open(storage)
print("Successfully opened existing repository.")
return repo
except Exception as open_e:
print(f"Failed to open existing repository: {open_e}")
return None
else:
print(f"An error occurred while creating the repository: {e}")
return None
# --- Example Usage ---
# Before running, make sure to set your environment variables, for example:
# export AWS_ACCESS_KEY_ID='YOUR_ACCESS_KEY'
# export AWS_SECRET_ACCESS_KEY='YOUR_SECRET_KEY'
if __name__ == "__main__":
# Replace with your actual bucket, store name, and endpoint URL
MY_BUCKET = "rsignell4-protocoast"
MY_STORE_NAME = "scientific-dataset-1"
MY_ENDPOINT_URL = "https://pangeo-eosc-minioapi.vm.fedcloud.eu"
# Check if credentials are set (basic check)
if not os.getenv("AWS_ACCESS_KEY_ID") or not os.getenv("AWS_SECRET_ACCESS_KEY"):
print("Error: AWS credentials are not set in the environment.")
else:
icechunk_repo = open_or_create_icechunk_repo(
bucket_name=MY_BUCKET,
store_name=MY_STORE_NAME,
endpoint_url=MY_ENDPOINT_URL
)
if icechunk_repo:
# You can now work with the repository object
print("Repository is ready to use.")
# For example, you can check its state:
# print(f"Repository state: {icechunk_repo.state()}")produces: |
Beta Was this translation helpful? Give feedback.
-
|
Here's the working code I ended up with that creates icechunk storage on an s3-compatible MinIO endpoint (Pangeo-EOSC) storing references for a virtual dataset that accesses data chunks from NetCDF3 stored in s3-compatible Ceph storage (Open Storage Network): |
Beta Was this translation helpful? Give feedback.

Here's the working code I ended up with that creates icechunk storage on an s3-compatible MinIO endpoint (Pangeo-EOSC) storing references for a virtual dataset that accesses data chunks from NetCDF3 stored in s3-compatible Ceph storage (Open Storage Network):
https://nbviewer.org/gist/rsignell/673d031bd27a73a600348bbd6a431d04
💦