-
Notifications
You must be signed in to change notification settings - Fork 169
build: replace rockylinux with chainguard/wolfi as a base image #423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Main comment is I don't think we need the download packages step now since all of that gets installed in the base images. We can remove that from unstructured
now as well.
.github/workflows/ci.yml
Outdated
@@ -110,5 +110,12 @@ jobs: | |||
- name: Test Dockerfile | |||
run: | | |||
source .venv/bin/activate | |||
make docker-dl-packages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we pull from base-images
I don't think we'll need this anymore.
.github/workflows/docker-publish.yml
Outdated
@@ -68,6 +68,7 @@ jobs: | |||
# Clear some space (https://github.com/actions/runner-images/issues/2840) | |||
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/share/boost | |||
|
|||
make docker-dl-packages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here on download packages.
.gitignore
Outdated
@@ -145,3 +145,6 @@ celerybeat-schedule.db | |||
|
|||
# temporarily generated files by project-specific Makefile | |||
tmp* | |||
|
|||
# APK packages for the docker build | |||
docker-packages/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here on download packages.
CHANGELOG.md
Outdated
@@ -1,3 +1,7 @@ | |||
## 0.0.71-dev0 | |||
|
|||
* replace rockylinux with chainguard/wolfie as a base image for `amd64` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit: wolfi
instead of wolfie
.
Makefile
Outdated
@@ -64,6 +64,10 @@ DOCKER_IMAGE ?= pipeline-family-${PIPELINE_FAMILY}-dev:latest | |||
docker-build: | |||
PIP_VERSION=${PIP_VERSION} PIPELINE_FAMILY=${PIPELINE_FAMILY} PIPELINE_PACKAGE=${PIPELINE_PACKAGE} ./scripts/docker-build.sh | |||
|
|||
.PHONY: docker-dl-packages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here on download packages
scripts/docker-dl-packages.sh
Outdated
@@ -0,0 +1,22 @@ | |||
#!/bin/bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here on download packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…indows (#2) ## PR Summary 1. Merged changes from upstream. 2. Update `unstructuredio_api.spec`. 3. Update `unstructuredio_api.py`. 4. Add additional setup dependencies to the `docs/Windows.md`. * build(deps): version bumps for maintenance (Unstructured-IO#424) ### Summary Version bumps for regular maintenance and to address moderate CVEs from security scans. - bump `unstructured` to `0.14.6` - bump `unstructured-inference` to `0.7.35` * build: replace rockylinux with chainguard/wolfi as a base image (Unstructured-IO#423) ### Summary Updates the Dockerfile to use the Chainguard wolfi-base image to reduce CVEs. Also adds a step in the docker publish job that scans the images and checks for CVEs before publishing. ### Testing Run `make docker-build` and `make docker-start-api`, then try: ``` from unstructured.partition.api import partition_via_api elements = partition_via_api( filename=filename, api_url="http://localhost:8000/general/v0/general", api_key="<API-KEY>", strategy="hi_res", ) print("\n\n".join([str(el) for el in elements])) ``` * fix: build and push workflow failing due to missing `-f` option `buildx build` command (Unstructured-IO#425) I noticed that images on main branch are failing to build (and push) due to missing `-f` parameter in `docker buildx build`. By default it expects `Dockerfile` to exist, but we only have `Dockerfile-amd64` and `Dockerfile-arm64`  --------- Co-authored-by: christinestraub <christinemstraub@gmail.com> * fix: update SHA for the base images (both architectures) after `base-images` repo update (Unstructured-IO#427) build and publish CI steps are failing, because the base images have changed in quay (their SHAs)   * fix: revert to rockylinux SHA that works (arm64) (Unstructured-IO#428) unnecessary SHA update introduced in Unstructured-IO#427 that needs to be reverted * fix: re-add `DOCKER_IMAGE` env var in `Test image` step (Unstructured-IO#429) shell syntax error occurs in docker-publish.yml workflow * fix: invalid env var setting in `docker-publish` workflow (Unstructured-IO#430) bug introduced in previous PR causing build failure on main * fix: `docker-publish` workflow failing on main due to inexisting `ARCH` env var (Unstructured-IO#431) * build(deps): bump dependency versions (Unstructured-IO#434) ### Summary Bumps dependency versions for the API. Closes Unstructured-IO#432. * fix/Fix MS Office filetype errors and harden docker smoketest (Unstructured-IO#436) # Changes **Fix for docx and other office files returning `{"detail":"File type None is not supported."}`** After moving to the wolfi base image, the `mimetypes` lib no longer knows about these file extensions. To avoid issues like this, let's add an explicit mapping for all the file extensions we care about. I added a `filetypes.py` and moved `get_validated_mimetype` over. When this file is imported, we'll call `mimetypes.add_type` for all file extensions we support. **Update smoke test coverage** This bug snuck past because we were already providing the mimetype in the docker smoke test. I updated `test_happy_path` to test against the container with and without passing `content_type`. I added some missing filetypes, and sorted the test params by extension so we can see when new types are missing. # Testing The new smoke test will verify that all filetypes are working. You can also `make docker-build && make docker-start-api`, and test out the docx in the sample docs dir. On `main`, this file will give you the error above. ``` curl 'http://localhost:8000/general/v0/general' \ --form 'files=@"fake.docx"' ``` * merge main; validated format: xml, txv, csv, xml, json, html, docs, docx, ppt, pptx, xlsx, xls, pdf * compilable setting * update Windows markdown * Disable debug mode in `unstructuredio_api.spec` * Enable pdf test case in `test_app.py` --------- Co-authored-by: Christine Straub <christinemstraub@gmail.com> Co-authored-by: Michał Martyniak <64484917+micmarty-deepsense@users.noreply.github.com> Co-authored-by: Matt Robinson <mrobinson@unstructured.io> Co-authored-by: Austin Walker <austin@unstructured.io> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Summary
Updates the Dockerfile to use the Chainguard wolfi-base image to reduce CVEs. Also adds a step in the docker publish job that scans the images and checks for CVEs before publishing.
Testing
Run
make docker-build
andmake docker-start-api
, then try: