Skip to content

Commit d1a8fbf

Browse files
authored
Merge pull request #3 from JJ11teen/dev
Add get_blindly feature
2 parents 12c7c78 + f430a80 commit d1a8fbf

File tree

13 files changed

+168
-58
lines changed

13 files changed

+168
-58
lines changed

.devcontainer/devcontainer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
// For format details, see https://aka.ms/devcontainer.json. For config options, see the README at:
22
// https://github.com/microsoft/vscode-dev-containers/tree/v0.177.0/containers/python-3
33
{
4-
"name": "Cloud Mappings Development",
4+
"name": "cloud-mappings Development",
55
"build": {
66
"dockerfile": "Dockerfile",
77
"context": "..",

README.md

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -74,14 +74,49 @@ del cm["key"]
7474
"key" in cm # returns false
7575
```
7676

77-
### Cloud Sync
77+
### Etags
7878

79-
Each `cloud-mapping` keeps an internal dict of [etags](https://en.wikipedia.org/wiki/HTTP_ETag) which it uses to ensure it is only reading/overwriting/deleting data it expects to. If the value in storage is not what the `cloud-mapping` expects, a `cloudmappings.errors.KeySyncError()` will be thrown. If you know what you are doing and want your operation to go through anyway, you will need to sync your `cloud-mapping` with the cloud by calling either `.sync_with_cloud()` to sync all keys or `.sync_with_cloud(key)` to sync a specific key. By default `.sync_with_cloud()` is called on instantiation of a `cloud-mapping` if the underlying provider storage already exists. You may skip this initial sync by passing an additional `sync_initially=False` parameter when you instantiate your `cloud-mapping`.
79+
Each `cloud-mapping` keeps an internal dict of [etags](https://en.wikipedia.org/wiki/HTTP_ETag) which it uses to ensure it is only reading/overwriting/deleting data it expects to. If the value in storage is not what the `cloud-mapping` expects, a `cloudmappings.errors.KeySyncError()` will be thrown.
80+
81+
If you would like to enable get (read) operations without ensuring etags, you can set `get_blindly=True`. This can be set in the constructor, or dynamically turned on and off directly on the `cloud-mapping` instance. Blindly getting a value that doesn't exist in the cloud will return `None`.
82+
83+
If you know what you are doing and you want an operation other than get to go through despite etags, you will need to sync your `cloud-mapping` with the cloud by calling either `.sync_with_cloud()` to sync all keys or `.sync_with_cloud(key_prefix)` to sync a specific key or subset of keys. By default `.sync_with_cloud()` is called on instantiation of a `cloud-mapping` if the underlying provider storage already exists. You may skip this initial sync by passing an additional `sync_initially=False` parameter when you instantiate your `cloud-mapping`.
8084

8185
### Serialisation
8286

8387
If you don't call `.with_pickle()` and instead pass your providers configuration directly to the `CloudMapping` class, you will get a "raw" `cloud-mapping` which accepts only byte-likes as values. Along with the `.with_pickle()` serialisation utility, `.with_json()` and `.with_json_zlib()` also exist.
8488

8589
You may build your own serialisation either using [zict](https://zict.readthedocs.io/en/latest/); or by calling `.with_buffers([dumps_1, dumps_2, ..., dumps_N], [loads_1, loads_2, ..., loads_N])`, where `dumps` and `loads` are the ordered functions to serialise and parse your data respectively.
8690

87-
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
91+
92+
93+
94+
95+
# Development
96+
97+
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
98+
99+
This project uses `.devcontainer` to describe the environment to use for development. You may use the environment described in this directory (it integrates automatically with vscode's 'remote containers' extension), or you may create your own environment with the same dependencies.
100+
101+
## Dependencies
102+
Install development dependencies with:
103+
104+
`pip install .[azureblob,azuretable,gcpstorage,awss3,tests]`
105+
106+
## Tests
107+
Set environment variables for each provider:
108+
* Azure Blob: `AZURE_TENANT_ID`, `AZURE_CLIENT_ID`, `AZURE_CLIENT_SECRET`
109+
* Azure Table: `AZURE_TABLE_STORAGE_CONNECTION_STRING`
110+
* GCP Storage: `GOOGLE_APPLICATION_CREDENTIALS` (path to credentials file)
111+
* AWS S3: `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`
112+
113+
Run tests with:
114+
```bash
115+
pytest
116+
--azure_blob_storage_account_url <azure-blob-storage-account-url>
117+
--azure_table
118+
--gcp_storage_project <gcp-project-id>
119+
--aws_s3
120+
--test_container_id <unique-test-run-id>
121+
```
122+
You can turn on/off tests for individual providers by including/excluding their parameters in the above command. `--test_container_id` is always required.

setup.cfg

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
[metadata]
2-
# replace with your username:
32
name = cloud-mappings
4-
version = 0.8.0
3+
version = 0.9.0
54
author = Lucas Sargent
65
author_email = lucas.sargent@eliiza.com.au
76
description = MutableMapping interfaces for common cloud storage providers
@@ -26,7 +25,7 @@ install_requires =
2625

2726
[options.extras_require]
2827
azureblob = azure-identity==1.6.0; azure-storage-blob==12.8.1
29-
azuretable = azure-identity==1.6.0; azure-data-tables==12.0.0b7
28+
azuretable = azure-identity==1.6.0; azure-data-tables==12.0.0
3029
gcpstorage = google-cloud-storage==1.38.0
3130
awss3 = boto3==1.17.73
3231
tests = pytest==6.2.4

src/cloudmappings/cloudstoragemapping.py

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,11 @@ def __init__(
1111
self,
1212
storage_provider: StorageProvider,
1313
sync_initially: bool = True,
14+
get_blindly: bool = False,
1415
) -> None:
1516
self._storage_provider = storage_provider
1617
self._etags = {}
18+
self.get_blindly = get_blindly
1719
if self._storage_provider.create_if_not_exists() and sync_initially:
1820
self.sync_with_cloud()
1921

@@ -22,12 +24,12 @@ def _encode_key(self, unsafe_key: str) -> str:
2224
raise TypeError("Key must be of type 'str'. Got key:", unsafe_key)
2325
return self._storage_provider.encode_key(unsafe_key=unsafe_key)
2426

25-
def sync_with_cloud(self, key: str = None) -> None:
26-
prefix_key = self._encode_key(key) if key is not None else None
27+
def sync_with_cloud(self, key_prefix: str = None) -> None:
28+
key_prefix = None if key_prefix is None else self._encode_key(key_prefix)
2729
self._etags.update(
2830
{
2931
self._storage_provider.decode_key(k): i
30-
for k, i in self._storage_provider.list_keys_and_etags(prefix_key).items()
32+
for k, i in self._storage_provider.list_keys_and_etags(key_prefix).items()
3133
}
3234
)
3335

@@ -36,13 +38,13 @@ def etags(self):
3638
return self._etags
3739

3840
def __getitem__(self, key: str) -> bytes:
39-
if key not in self._etags:
41+
if not self.get_blindly and key not in self._etags:
4042
raise KeyError(key)
41-
return self._storage_provider.download_data(key=self._encode_key(key), etag=self._etags[key])
43+
return self._storage_provider.download_data(
44+
key=self._encode_key(key), etag=None if self.get_blindly else self._etags[key]
45+
)
4246

4347
def __setitem__(self, key: str, value: bytes) -> None:
44-
if not isinstance(value, bytes):
45-
raise ValueError("Value must be bytes like")
4648
self._etags[key] = self._storage_provider.upload_data(
4749
key=self._encode_key(key),
4850
etag=self._etags.get(key, None),
@@ -84,6 +86,7 @@ def with_buffers(cls, input_buffers, output_buffers, *args, **kwargs) -> "CloudM
8486

8587
mapping.sync_with_cloud = raw_mapping.sync_with_cloud
8688
mapping.etags = raw_mapping.etags
89+
mapping.get_blindly = raw_mapping.get_blindly
8790
return mapping
8891

8992
@classmethod

src/cloudmappings/storageproviders/awss3.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,9 +71,13 @@ def download_data(self, key: str, etag: str) -> bytes:
7171
body, existing_etag, _ = self._get_body_etag_version_id_if_exists(key)
7272
if etag is not None and (body is None or etag != existing_etag):
7373
self.raise_key_sync_error(key=key, etag=etag)
74+
if body is None:
75+
return None
7476
return body.read()
7577

7678
def upload_data(self, key: str, etag: str, data: bytes) -> str:
79+
if not isinstance(data, bytes):
80+
raise ValueError("Data must be bytes like")
7781
_, existing_etag, _ = self._get_body_etag_version_id_if_exists(key)
7882
if etag != existing_etag:
7983
self.raise_key_sync_error(key=key, etag=etag)

src/cloudmappings/storageproviders/azureblobstorage.py

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
import json
33

44
from azure.core import MatchConditions
5-
from azure.core.exceptions import ResourceExistsError, ResourceModifiedError
5+
from azure.core.exceptions import ResourceExistsError, ResourceModifiedError, ResourceNotFoundError
66
from azure.storage.blob import ContainerClient
77
from azure.identity import DefaultAzureCredential
88

@@ -39,21 +39,35 @@ def create_if_not_exists(self):
3939
return False
4040

4141
def download_data(self, key: str, etag: str) -> bytes:
42+
args = dict(blob=key)
43+
if etag is not None:
44+
args.update(
45+
dict(
46+
etag=etag,
47+
match_condition=MatchConditions.IfNotModified,
48+
)
49+
)
4250
try:
43-
return self._container_client.download_blob(
44-
blob=key,
45-
etag=etag,
46-
match_condition=MatchConditions.IfNotModified if etag is not None else None,
47-
).readall()
51+
return self._container_client.download_blob(**args).readall()
4852
except ResourceModifiedError as e:
4953
self.raise_key_sync_error(key=key, etag=etag, inner_exception=e)
54+
except ResourceNotFoundError as e:
55+
if etag is None:
56+
return None
57+
self.raise_key_sync_error(key=key, etag=etag, inner_exception=e)
5058

5159
def upload_data(self, key: str, etag: str, data: bytes) -> str:
60+
if not isinstance(data, bytes):
61+
raise ValueError("Data must be bytes like")
5262
expecting_blob = etag is not None
53-
args = {"overwrite": expecting_blob}
63+
args = dict(overwrite=expecting_blob)
5464
if expecting_blob:
55-
args["etag"] = etag
56-
args["match_condition"] = MatchConditions.IfNotModified
65+
args.update(
66+
dict(
67+
etag=etag,
68+
match_condition=MatchConditions.IfNotModified,
69+
)
70+
)
5771
bc = self._container_client.get_blob_client(blob=key)
5872
try:
5973
response = bc.upload_blob(

src/cloudmappings/storageproviders/azuretablestorage.py

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
from urllib.parse import quote, unquote
44

55
from azure.core import MatchConditions
6-
from azure.core.exceptions import ResourceExistsError, HttpResponseError
6+
from azure.core.exceptions import ResourceExistsError, HttpResponseError, ResourceNotFoundError
77
from azure.data.tables import TableClient, UpdateMode
88

99
from .storageprovider import StorageProvider
@@ -16,7 +16,7 @@ def _chunk_bytes(data: bytes) -> Dict[str, bytes]:
1616

1717

1818
def _dechunk_entity(entity: Dict[str, bytes]) -> bytes:
19-
return b"".join([v.value for k, v in entity.items() if k.startswith("d_")])
19+
return b"".join([v for k, v in entity.items() if k.startswith("d_")])
2020

2121

2222
class AzureTableStorageProvider(StorageProvider):
@@ -59,15 +59,23 @@ def create_if_not_exists(self):
5959
return False
6060

6161
def download_data(self, key: str, etag: str) -> bytes:
62-
entity = self._table_client.get_entity(
63-
partition_key=key,
64-
row_key="cm",
65-
)
66-
if etag is not None and etag != entity.metadata["etag"]:
67-
self.raise_key_sync_error(key=key, etag=etag)
68-
return _dechunk_entity(entity)
62+
try:
63+
entity = self._table_client.get_entity(
64+
partition_key=key,
65+
row_key="cm",
66+
)
67+
except ResourceNotFoundError as e:
68+
if etag is None:
69+
return None
70+
self.raise_key_sync_error(key=key, etag=etag, inner_exception=e)
71+
else:
72+
if etag is not None and etag != entity.metadata["etag"]:
73+
self.raise_key_sync_error(key=key, etag=etag)
74+
return _dechunk_entity(entity)
6975

7076
def upload_data(self, key: str, etag: str, data: bytes) -> str:
77+
if not isinstance(data, bytes):
78+
raise ValueError("Data must be bytes like")
7179
entity = {
7280
"PartitionKey": key,
7381
"RowKey": "cm",

src/cloudmappings/storageproviders/googlecloudstorage.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,13 +49,17 @@ def download_data(self, key: str, etag: str) -> bytes:
4949
blob_name=key,
5050
)
5151
existing_etag = self._parse_etag(b)
52-
if etag != existing_etag:
52+
if etag is not None and etag != existing_etag:
5353
self.raise_key_sync_error(key=key, etag=etag)
54+
if b is None:
55+
return None
5456
return b.download_as_bytes(
5557
if_generation_match=b.generation,
5658
)
5759

5860
def upload_data(self, key: str, etag: str, data: bytes) -> str:
61+
if not isinstance(data, bytes):
62+
raise ValueError("Data must be bytes like")
5963
b = self._bucket.get_blob(
6064
blob_name=key,
6165
)

src/cloudmappings/storageproviders/storageprovider.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ def download_data(self, key: str, etag: str) -> bytes:
4949
@abstractmethod
5050
def upload_data(self, key: str, etag: str, data: bytes) -> str:
5151
"""Upload data to cloud storage. Raise KeyCloudSyncError if etag does not match the latest
52-
version in the cloud.
52+
version in the cloud. Raise ValueError is data is not bytes.
5353
:param etag: Expected etag if key already exists. Otherwise None
5454
:return: Etag of newly uploaded data, as str.
5555
"""

tests/cleanup_resources.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,18 @@
1111
buckets = client.list_buckets()
1212

1313
for bucket in buckets["Buckets"]:
14-
s3_bucket = s3.Bucket(bucket["Name"])
15-
s3_bucket.objects.all().delete()
16-
s3_bucket.object_versions.delete()
17-
s3_bucket.delete()
18-
print(f"Deleted s3 bucket {bucket['Name']}")
14+
if bucket["Name"].startswith("pytest"):
15+
s3_bucket = s3.Bucket(bucket["Name"])
16+
s3_bucket.objects.all().delete()
17+
s3_bucket.object_versions.delete()
18+
s3_bucket.delete()
19+
print(f"Deleted s3 bucket {bucket['Name']}")
1920

2021
storage_client = storage.Client()
2122
buckets = storage_client.list_buckets()
2223
for bucket in buckets:
23-
bucket.delete(force=True)
24-
print(f"Deleted gcp bucket {bucket.name}")
24+
if bucket.name.startswith("pytest"):
25+
bucket.delete(force=True)
26+
print(f"Deleted gcp bucket {bucket.name}")
2527

2628
print(f"Not deleting Azure containers")

0 commit comments

Comments
 (0)