-
-
Notifications
You must be signed in to change notification settings - Fork 25
Description
For a while, I’ve been annoyed by intermittent test failures in our tests around GZip compression for FileStorage::S3
. Here is a good example: https://app.circleci.com/pipelines/github/edgi-govdata-archiving/web-monitoring-db/2076/workflows/9e189bc7-d909-44bd-93a8-42df47719b2f/jobs/7501 where the output is something like:
Error:
FileStorage::S3Test#test_s3_storage_can_write_a_gzipped_stream:
WebMock::NetConnectNotAllowedError: Real HTTP connections are disabled. ...
You can stub this request with the following snippet:
stub_request(:put, "https://test-bucket.s3.us-west-2.amazonaws.com/something.txt").
with(
body: "\x1F\x8B\b\x00K\xC6~h\x00\x03\xF3H\xCD\xC9\xC9WH+\xCA\xCFU\b6V\x04\x00\x91-\xB5\xE6\x0E\x00\x00\x00",
headers: { ... }.
to_return(status: 200, body: "", headers: {})
registered request stubs:
stub_request(:put, "https://test-bucket.s3.us-west-2.amazonaws.com/something.txt").
with(
body: "\x1F\x8B\b\x00J\xC6~h\x00\x03\xF3H\xCD\xC9\xC9WH+\xCA\xCFU\b6V\x04\x00\x91-\xB5\xE6\x0E\x00\x00\x00",
headers: { ... })
============================================================
lib/file_storage/s3.rb:55:in 'FileStorage::S3#save_file'
test/lib/file_storage/s3_test.rb:126:in 'block in <class:S3Test>'
Today I dug into it a bit and realized this is because Ruby Zlib includes the time the data was compressed in the compression stream, so if the same text is compressed twice in the same second, it will get the same output, but if not, it won’t match, which breaks the tests.
There are two ways to fix this for our tests:
-
Change the test to match on the
PUT
URL only and assert that the body decompresses correctly instead of asserting that the compressed bodies are the same:web-monitoring-db/test/lib/file_storage/s3_test.rb
Lines 106 to 128 in 6868821
test 's3 storage can write a gzipped file' do text = 'Hello from S3!' s3_put = stub_request(:put, 'https://test-bucket.s3.us-west-2.amazonaws.com/something.txt') .with(body: ActiveSupport::Gzip.compress(text), headers: { 'Content-Type' => 'text/plain', 'Content-Encoding' => 'gzip' }) .to_return(status: 200, body: '', headers: {}) storage = example_storage(gzip: true) storage.save_file('something.txt', text, content_type: 'text/plain') assert_requested(s3_put) end test 's3 storage can write a gzipped stream' do text = 'Hello from S3!' s3_put = stub_request(:put, 'https://test-bucket.s3.us-west-2.amazonaws.com/something.txt') .with(body: ActiveSupport::Gzip.compress(text), headers: { 'Content-Type' => 'text/plain', 'Content-Encoding' => 'gzip' }) .to_return(status: 200, body: '', headers: {}) storage = example_storage(gzip: true) storage.save_file('something.txt', StringIO.new(text), content_type: 'text/plain') assert_requested(s3_put) end -
Change how we compress to set the
mtime
andorig_name
attributes predictably so we always get the same output for the same input bytes. Unfortunately we currently useActiveSupport::Gzip.compress
to do the compression:web-monitoring-db/lib/file_storage/s3.rb
Line 59 in 6868821
body: @gzip ? ActiveSupport::Gzip.compress(content.try(:read) || content) : content, …and it doesn’t let you set those properties. We’d need to write our own compression helper.
(1) is expedient, but (2) is probably better, since it might help us write less frequently (or not overwrite unnecessarily) to S3.
Metadata
Metadata
Assignees
Type
Projects
Status