Add support for `append` mode for Azure Blob Storage #865

geovalexis · 2025-06-15T15:10:02Z

Motivation

Append mode is not a widely used feature in Azure Blob Storage, but it can be particularly useful for specific scenarios, such as writing logs or updating sentinel or marker files. As the native Azure Blob Storage SDK provides support for append mode, this PR integrates that functionality into the smart_open interface, making it more accessible and convenient for developers.

This implementation comes with a few limitations that are worth noting:

There is a maximum size of the blob that can be written in append mode: 195 GiB. This is a limitation of the Azure Blob Storage API itself and cannot be changed.
The blob must be created in append mode on the first place, meaning that we cannot simply append to an existing blob that may have been created in a different mode (e.g. block or page mode).

As per these limitations, we observe that the append mode is not suitable for all use cases and is not meant to be used indiscriminately by anyone looking to append data to a blob. However, bearing them in mind, many developers may find it convenient for their use cases, especially when they are already using smart_open for other operations.

Closes #836

Tests

New tests have been added to cover the new functionality, including:

Writing to a new blob in append mode
Appending to an existing blob in append mode
Attempting to append to a blob that was not created in append mode (which should raise an error)

Also, I took the chance to fix a few existing tests that were failing when using Azurite. The method BlobClient._stage_contents does not exist in the Azure Blob Storage SDK (or at least not in the latest versions), so I replaced it with BlobClient.get_block_list("uncommitted"), which returns the same information.

Checklist

Picked a concise, informative and complete title
Clearly explained the motivation behind the PR
Linked to any existing issues that your PR will be solving
Included tests for any new functionality
Checked that all unit tests pass

…to feature/azure-append-mode

…o feature/azure-append-mode

ddelange · 2025-06-16T11:58:22Z

smart_open/tests/test_azure.py

+    def test_append_compressed_gzip(self):
+        """
+        Does appending into an Azure Blob file work correctly when the file is compressed?
+        We should be able to append into a compressed file. We will test this with a Gzip file.
+        """
+        expected = u'а не спеть ли мне песню... о любви'.encode('utf-8')
+        blob_name = "test_append_gzip_%s" % BLOB_NAME
+
+        with smart_open.azure.AppendWriter(CONTAINER_NAME, blob_name, CLIENT) as fout:
+            with gzip.GzipFile(fileobj=fout, mode='w') as zipfile:
+                zipfile.write(expected)
+
+        with smart_open.azure.Reader(CONTAINER_NAME, blob_name, CLIENT) as fin:
+            with gzip.GzipFile(fileobj=fin) as zipfile:
+                actual = zipfile.read()
+
+        self.assertEqual(expected, actual)


noting here that not all compression algorithms allow appending. gzip just adds a new header and 'restarts' compression without prior knowlegde of preceding bytes. we should be raising the appropriate errors, which might be nice to add a test case for as well.

that being said, could you refactor this test to use the top-level smart_open.open and the builtin (de)compression mechanism using multiple writes?

some pseudo-code:

with smart_open.open('azure://fname.txt.gz', 'wb') as fp: fp.write(expected) with smart_open.open('azure://fname.txt.gz', 'ab') as fp: fp.write(expected) with smart_open.open('azure://fname.txt.gz', 'rb') as fp: actual = fp.read() self.assertEqual(actual, expected * 2)

smart_open/azure.py

ddelange · 2025-06-16T12:07:57Z

smart_open/azure.py

+        # Uploads data as an AppendBlob type with automatic block chunking.
+        # The AppendBlob will be created at first if it does not exist or append to it if it does already.
+        return self._blob.upload_blob(
+            data=b,
+            blob_type=azure.storage.blob.BlobType.APPENDBLOB,
+            overwrite=False,
+            **self._blob_kwargs,
+        )


can you implement the _min_part_size buffer mechanic similar to the azure Writer class? this current implementation is unbuffered, and you'll quickly hit the max parts limit an AppendBlob support if you do many small writes

ddelange · 2025-07-01T18:07:59Z

fyi: a packaging refactor and corresponding fixes for CI rot just merged into develop. below it mentions conflicts, but a local git pull upstream develop will figure it out without conflicts. it's a rename commit moving the tests folder one level up to the top level directory.

geovalexis and others added 14 commits September 20, 2024 08:33

fix unit test when running with azurite

48492f0

add new class for supporting append mode in Azure

7ab1220

add naive approach for azure append

56033cf

add main append part logic

8ac59db

Merge branch 'develop' of https://github.com/geovalexis/smart_open in…

76e4b86

…to feature/azure-append-mode

Merge branch 'develop' into feature/azure-append-mode

1b802a6

fix azure write unit tests when using real blob client

263a02d

complete Azure append writer

38f1993

add unit tests for Azure append writer

1fe3797

add more unit tests for Azure's append writer

7ef4006

Merge branch 'develop' of https://github.com/geovalexis/smart_open in…

c632295

…to feature/azure-append-mode

Merge branch 'develop' of https://github.com/piskvorky/smart_open int…

22d1cdb

…o feature/azure-append-mode

Merge branch 'develop' of https://github.com/piskvorky/smart_open int…

71912d6

…o feature/azure-append-mode

address linting issues

4a5ae3f

ddelange reviewed Jun 16, 2025

View reviewed changes

smart_open/azure.py Show resolved Hide resolved

ddelange reviewed Jun 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add support for `append` mode for Azure Blob Storage #865

Add support for `append` mode for Azure Blob Storage #865

Uh oh!

geovalexis commented Jun 15, 2025 •

edited

Loading

Uh oh!

ddelange Jun 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

ddelange Jun 16, 2025

Uh oh!

ddelange commented Jul 1, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add support for append mode for Azure Blob Storage #865

Are you sure you want to change the base?

Add support for append mode for Azure Blob Storage #865

Uh oh!

Conversation

geovalexis commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Tests

Checklist

Uh oh!

ddelange Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ddelange Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

ddelange commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add support for `append` mode for Azure Blob Storage #865

Add support for `append` mode for Azure Blob Storage #865

geovalexis commented Jun 15, 2025 •

edited

Loading

ddelange Jun 16, 2025 •

edited

Loading

ddelange commented Jul 1, 2025 •

edited

Loading