Skip to content

Releases: alephdata/ingest-file

4.1.2

12 Jun 16:42
ef90ce5
Compare
Choose a tag to compare

This version contains a patch for a security vulnerability in ingest-file, the component that processes files uploaded to Aleph. We recommend that you update Aleph instances you operate to use the latest patched release of ingest-file.

Please find detailed information about the patched vulnerability below.

How to update

If you operate Aleph using Docker Compose, update the ingest-file service in your Docker Compose configuration to use the image ghcr.io/alephdata/ingest-file:4.1.2.

If you operate Aleph using the Helm chart, update the aleph.ingestfile.image.tag value to 4.1.2.

Summary

Previous versions of ingest-file handled 7zip archives containing symbolic links insecurely. When processing 7zip archives, ingest-file followed symbolic links even if they were targeting files outside of the archive. A maliciously crafted archive would allow an attacker to access arbitrary files in the ingest-file container.

Depending on the exact configuration and deployment method, this might include:

  • Access to files uploaded to Aleph if using the file archive (rather than object storage such as S3 or Google Cloud Storage) as the file archive is mounted into the container.
  • Access to environment variables.
  • Access to secrets mounted into the container.

Affected versions

All versions of ingest-file prior to 4.1.2 (this release) are affected.

Solution

ingest-file 4.1.2 contains a patch for the security vulnerability. 7zip archives containing symbolic links are now validated and archives containing symbolic links pointing to files outside of the archive are rejected.

Credits

OCCRP would like to thank everyone who identified this vulnerability and contributed to its resolution:

  • Responsibly disclosed by InterSecLab
  • Patch by Alex Ștefănescu
  • Research, Testing, Validation: Alex Ștefănescu, Simon Wörpel, Jan Strozyk, Friedrich Lindenberg

4.1.0

13 Mar 11:49
fa740d2
Compare
Choose a tag to compare

What's Changed

  • Update base image to python:3.9 based on debian bookworm by @stchris in #660

  • Add Workbook metadata to Table entities by @catileptic in #657

  • Improved audio format parsing unit test assertions in e8dd833

  • Allow parsing truncated image files in 05ffe34

  • Update checkout actions by @stchris in #658

  • Add more frequent keepalives to unreliable download in a99b0e6

Full Changelog: 4.0.2...4.1.0

4.0.2

23 Nov 12:52
4.0.2
03206c3
Compare
Choose a tag to compare

We're announcing the release of Aleph 4.0.2 (and ingest-file 4.0.2) and highly recommend users of the 4.x branches to update to this release.

What's changed

  • Update to servicelayer 1.23.2 which fixes a significant performance regression noticeable especially when there are > 10000 tasks per dataset being processed

Full Changelog: 4.0.1...4.0.2

4.0.1

07 Nov 10:24
4.0.1
1d50033
Compare
Choose a tag to compare

We're announcing the release of Aleph 4.0.1 (and ingest-file 4.0.1) and highly recommend users of the 4.x branches to update to this release.

What's changed

Bugfix

  • Update to using servicelayer 1.23.1 which fixes an issue with improper clean-up when a task exhausts it's maximum number of retries

Other changes

  • chore: Remove project board action by @stchris in #656
  • chore: Post release announcements to Discourse by @stchris in #655

Full Changelog: 4.0.0...4.0.1

4.0.0

11 Oct 11:26
4.0.0
bd321ec
Compare
Choose a tag to compare

What's Changed

Other changes

Full Changelog: 3.22.0...4.0.0

3.22.0

27 May 15:22
c67c3fe
Compare
Choose a tag to compare

Note

Please note that we skipped version 3.21.0. That means the previous version before this version is 3.20.3.

What's Changed

Full Changelog: 3.20.3...3.22.0

3.20.3

22 Apr 12:44
3.20.3
0fb53b7
Compare
Choose a tag to compare

⚠️ This release fixes a security vulnerability in ingest-file, the component that handles files uploaded to Aleph. Please update your Aleph instance to the latest patched versions of Aleph and ingest-file: ⚠️

Please refer to the release notes for Aleph 3.15.6 for detailed information.

3.20.2

01 Mar 16:39
3.20.2
9cfd300
Compare
Choose a tag to compare

What's Changed

  • Fix TIFF processing by @catileptic in #587
    • There was an issue with some types of TIFF files not being properly previewed and OCRd
    • Extended test coverage to prevent regressions in OCR for gif, jpg, jp2, tiff, webp

Full Changelog: 3.20.1...3.20.2

3.20.1

21 Feb 15:08
3.20.1
31d1eb1
Compare
Choose a tag to compare

What's changed

  • Force installing tesserocr from source instead of using wheels because of sirfz/tesserocr#337. This fixes a regression which might have caused certain image file types to not have been OCRd.
  • Add a clear-cache command to the ingestors CLI, which allows one to clear the ingest cache. It also takes a prefix (for instance ocr: or pdf:.

Full Changelog: 3.20.0...3.20.1

3.20.0

22 Jan 14:09
3.20.0
59733eb
Compare
Choose a tag to compare

What's Changed

Full Changelog: 3.19.3...3.20.0