Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
212 commits
Select commit Hold shift + click to select a range
fa2aff7
feat(deployment): Migrate package orchestration to Docker Compose (re…
junhaoliao Aug 20, 2025
32ea452
Add garbage collector service to Docker Compose configuration
junhaoliao Aug 22, 2025
217c63f
fix s3 support and various issues
junhaoliao Aug 24, 2025
c229fe0
lint
junhaoliao Aug 24, 2025
a850f69
add Docker Compose launch
junhaoliao Aug 24, 2025
1763e17
Merge branch 'main' into docker-compose
junhaoliao Aug 24, 2025
b283231
remove unused import
junhaoliao Aug 24, 2025
743268c
add proper stop support; reduce stop_grace_period from 10s to 3s
junhaoliao Aug 24, 2025
ee1b9f2
de-duplicate docker-compose.yml
junhaoliao Aug 24, 2025
ec0cfa5
replace hardcoded image name with environment variable for container …
junhaoliao Aug 24, 2025
ab68c25
remove FIXME
junhaoliao Aug 24, 2025
934a83c
remove FIXME
junhaoliao Aug 24, 2025
5bf23c2
Merge branch 'main' into docker-compose
junhaoliao Aug 24, 2025
83cc9d1
reformat
junhaoliao Aug 24, 2025
82abc07
reformat
junhaoliao Aug 24, 2025
0fb2294
Update garbage collector logs directory mapping
junhaoliao Aug 24, 2025
c8ffb94
Remove unused component argument parsers
junhaoliao Aug 24, 2025
83e902a
Remove unused component argument parsers
junhaoliao Aug 24, 2025
5f2e5cd
Refactor dependency checks to include docker-compose status validation
junhaoliao Aug 24, 2025
edfa9c9
Refactor log directory handling to use constant path definitions
junhaoliao Aug 24, 2025
7b3965e
Add constants for archive and stream directory paths
junhaoliao Aug 24, 2025
cd84be8
remove unused component groups and functions
junhaoliao Aug 24, 2025
aa12bdb
Remove unused CONTROLLER_TARGET_NAME constant from start_clp.py
junhaoliao Aug 24, 2025
5365722
fix staging dirs
junhaoliao Aug 24, 2025
21ef703
fix: update command to check if Docker Compose is running
junhaoliao Aug 24, 2025
d2cdfbc
add AWS env credentials support
junhaoliao Aug 24, 2025
4f56709
Merge branch 'main' into docker-compose
junhaoliao Aug 25, 2025
f0db07f
Update container image name in start_clp.py
junhaoliao Aug 26, 2025
655600d
Merge remote-tracking branch 'origin/main' into docker-compose
junhaoliao Sep 3, 2025
5f24ce7
add support for configurable CLP WebUI rate limiting
junhaoliao Sep 3, 2025
3df20fc
update WebUI server path in start_clp.py and docker-compose configura…
junhaoliao Sep 3, 2025
c6f81ad
copy docker-compose.yml in package task
junhaoliao Sep 3, 2025
db9c20f
use absolute paths in archive and stream storage configurations
junhaoliao Sep 3, 2025
ea03e17
refactor: centralize environment variable management and enhance vali…
junhaoliao Sep 4, 2025
60994ee
fix: use List[str] type hint for command parameter in start_clp.py
junhaoliao Sep 4, 2025
7e25d75
refactor: remove `dump_to_env_vars_dict` methods and centralize envir…
junhaoliao Sep 4, 2025
3e24e4e
lint
junhaoliao Sep 4, 2025
cbb9ce1
refactor: modularize and simplify start_clp.py by introducing DockerC…
junhaoliao Sep 4, 2025
3eb8dfe
refactor: remove obsolete node-specific directory configuration comments
junhaoliao Sep 4, 2025
669fa9c
refactor: remove redundant `conf_dir` parameter and use centralized c…
junhaoliao Sep 4, 2025
acee071
refactor: rename `controllers` module to `controller` and update impo…
junhaoliao Sep 4, 2025
3c45cfa
refactor: make `validate_log_directory` private and update references…
junhaoliao Sep 4, 2025
ed20110
refactor: rename `conf_dir` to `_conf_dir` and update references to r…
junhaoliao Sep 4, 2025
4d1f5aa
refactor: make `get_ip_from_hostname` private and update all references
junhaoliao Sep 4, 2025
3a698bb
refactor: extract `transform_for_container_config` method to simplify…
junhaoliao Sep 4, 2025
cca84f4
refactor: centralize and simplify path definitions and container conf…
junhaoliao Sep 4, 2025
537398a
refactor: streamline container service configuration with `transform_…
junhaoliao Sep 4, 2025
42442b8
fix imports
junhaoliao Sep 4, 2025
a3288ae
reorganize imports
junhaoliao Sep 4, 2025
5b36779
fix: adjust volume path formatting in docker-compose
junhaoliao Sep 4, 2025
93882af
fix: correct typo in comment for scheduler healthcheck in docker-compose
junhaoliao Sep 4, 2025
a4d8e1f
refactor: unify volume path definitions and remove redundant staging …
junhaoliao Sep 4, 2025
63e6d72
remove comment
junhaoliao Sep 4, 2025
0531a7b
feat: implement `stop` method in `DockerComposeController` and update…
junhaoliao Sep 4, 2025
f93aec7
refactor: reorder volume definitions in docker-compose for consistenc…
junhaoliao Sep 4, 2025
6a20628
revert error message in start_clp.py
junhaoliao Sep 4, 2025
0935658
refactor: standardize logger messages from "Initializing" to "Provisi…
junhaoliao Sep 4, 2025
ad6192a
remove comment
junhaoliao Sep 4, 2025
9469db4
docs: update multi-node deployment guide and add Docker Compose desig…
junhaoliao Sep 4, 2025
110e9fa
docs: refine Docker Compose design doc for clarity and consistency
junhaoliao Sep 4, 2025
045dde6
refactor: update CLP container configuration to use execution contain…
junhaoliao Sep 4, 2025
fa87d92
refactor(controller): move directory creation logic from `_provision`…
junhaoliao Sep 4, 2025
b6ac2c6
feat(controller): add function to dump shared container configuration…
junhaoliao Sep 4, 2025
bfd8c7b
lint
junhaoliao Sep 4, 2025
2b7959b
refactor(clp-py-utils): update staging directory handling in S3 stora…
junhaoliao Sep 4, 2025
f9eb88c
docs(design): Enhance Docker Compose design doc with diagrams and det…
junhaoliao Sep 4, 2025
0f5bd45
fix title case
junhaoliao Sep 4, 2025
0656928
Update references to docker-compose.yml to docker-compose.yaml.
junhaoliao Sep 4, 2025
66dcbb2
feat(docker): add project name to docker-compose and enhance running …
junhaoliao Sep 10, 2025
09ef298
fix lint
junhaoliao Sep 10, 2025
d3b6a67
fix(taskfile): update default task to `docker-images:package`
junhaoliao Sep 10, 2025
6148a65
feat: reset default ports for container configs.
junhaoliao Sep 17, 2025
0ab99f0
fix(docker): update MongoDB connection string to use internal port
junhaoliao Sep 17, 2025
263be6f
fix(controller): update webui configuration to use container-specific…
junhaoliao Sep 17, 2025
51d1b55
add ownership management for data and logs directories when running a…
junhaoliao Sep 17, 2025
0066386
fix(controller, docker-compose): update user and group ID handling fo…
junhaoliao Sep 17, 2025
bc9ab99
fix(search): enable direct connection for MongoDB client in search.py
junhaoliao Sep 20, 2025
8008138
revert direct connection option for MongoDB client in search.py
junhaoliao Sep 20, 2025
34b60bd
refactor(controller): rename provision methods to set_up_env_for_* to…
junhaoliao Sep 22, 2025
9f63481
reorder private functions
junhaoliao Sep 22, 2025
2344db9
add documentation
junhaoliao Sep 22, 2025
2ba93d7
change visibility of `set_up_env_for` function from public to private…
junhaoliao Sep 22, 2025
ce94225
Improve type hints.
junhaoliao Sep 22, 2025
5225870
docs(clp-package-utils): add docstring for is_docker_compose_running …
junhaoliao Sep 22, 2025
cecf5b2
docs(clp-package-utils): add docstring for check_docker_dependencies …
junhaoliao Sep 22, 2025
0c0c562
docs(clp-package-utils): add docstring for _validate_log_directory fu…
junhaoliao Sep 22, 2025
762884b
remove `None` return type annotation from _validate_log_directory fun…
junhaoliao Sep 22, 2025
fc515db
Rename transform_for_container_config to transform_for_container; add…
junhaoliao Sep 22, 2025
268c144
add docs for x-service-defaults & x-healthcheck-defaults in docker-co…
junhaoliao Sep 22, 2025
c1d7b23
Merge branch 'main' into docker-compose
junhaoliao Sep 24, 2025
fc7aae3
remove garbage-collector's dependency condition on query-scheduler.
junhaoliao Sep 24, 2025
cda0348
Split base services into a separate docker-compose.base.yaml; Launch …
junhaoliao Sep 24, 2025
e506f6d
feat(controller): Add instance ID support for Docker Compose project …
junhaoliao Sep 24, 2025
93034fe
remove network to use default network with bridge driver
junhaoliao Sep 24, 2025
7ebdbb6
Merge branch 'main' into docker-compose
junhaoliao Sep 25, 2025
ca62009
fix(controller): update clp_home to be private variable.
junhaoliao Sep 25, 2025
82cc48c
Use unique ids to launch dev clp package.
junhaoliao Sep 25, 2025
8f6ba1b
remove trailing space
junhaoliao Sep 25, 2025
2aff301
remove `execution_container` field from clp-config.yml template
junhaoliao Sep 25, 2025
5c91bf3
fix(docker-compose): update `include` array syntax to use [] format.
junhaoliao Sep 25, 2025
049e269
update service name from `db` to `database`.
junhaoliao Sep 25, 2025
fc70c05
fix(docker-compose, controller): correct configuration file variable …
junhaoliao Sep 25, 2025
2cb1807
add _HOST postfix to path env var names for consistency
junhaoliao Sep 25, 2025
08472d1
Merge branch 'main' into docker-compose
junhaoliao Sep 25, 2025
09658e5
Merge branch 'main' into docker-compose
junhaoliao Sep 26, 2025
e657feb
fix: the webui service should depend on db-table-creator rather than …
junhaoliao Sep 26, 2025
c79259a
Merge branch 'main' into docker-compose
junhaoliao Oct 1, 2025
af96fc8
lint
junhaoliao Oct 1, 2025
320dd11
Merge branch 'main' into docker-compose
junhaoliao Oct 2, 2025
08644f4
Merge branch 'main' into docker-compose
junhaoliao Oct 2, 2025
dd56d04
fix: Replace container_name with hostname for services to avoid name …
junhaoliao Oct 2, 2025
5298540
refactor(docker-compose): move image definition to service defaults.
junhaoliao Oct 2, 2025
140187f
refactor(docker-compose): Remove image entry from reducer service.
junhaoliao Oct 2, 2025
68586f0
refactor(docker-compose): add default fallback for image and storage …
junhaoliao Oct 2, 2025
cba044a
improve docs
junhaoliao Oct 3, 2025
8a6790f
Merge branch 'main' into docker-compose
junhaoliao Oct 3, 2025
f41e22a
refactor(docker-compose): use list syntax for healthcheck test command.
junhaoliao Oct 3, 2025
da6c17f
Merge branch 'main' into docker-compose
junhaoliao Oct 6, 2025
02fcbe7
fix(docker-compose): revert workers log level to hardcoded "warning".
junhaoliao Oct 6, 2025
6b0e1f2
fix(docker-compose): Fix environment variable name for results cache …
junhaoliao Oct 6, 2025
ddcf760
refactor(clp-package-utils): Move EnvVarsDict type alias below imports.
junhaoliao Oct 6, 2025
4134b1a
refactor(clp-package-utils): Rename LOGS_FILE_MODE to LOG_FILE_ACCESS…
junhaoliao Oct 6, 2025
f483abd
reflow comments to 100 char per line
junhaoliao Oct 6, 2025
7d4f47d
refactor(clp-package-utils): Rename deploy method to start and adjust…
junhaoliao Oct 6, 2025
064f365
refactor(clp-package-utils): Rename _provision method to _set_up_env …
junhaoliao Oct 6, 2025
7beb199
docs(clp-package-utils): Update docstrings to use “sets up” phrasing.
junhaoliao Oct 6, 2025
3507a1b
docs(clp-package-utils): Standardize return docstrings; Remove return…
junhaoliao Oct 6, 2025
eb2754a
refactor(clp-package-utils): Add spacing before log and data director…
junhaoliao Oct 6, 2025
cae66df
refactor(clp-package-utils): Move logs directory assignment after dat…
junhaoliao Oct 6, 2025
dffc2f2
refactor(clp-package-utils): Rename logs_file -> log_file
junhaoliao Oct 6, 2025
c0ef4ff
refactor(clp-package-utils): Rename LOGS_FILE variables to LOG_FILE i…
junhaoliao Oct 6, 2025
95404d0
refactor(clp-package-utils): Remove redundant flush after writing ins…
junhaoliao Oct 6, 2025
e1db192
Clarify resolving to IPv4 in the docstring - Apply suggestions from c…
junhaoliao Oct 6, 2025
630329b
Remove redundant param description in `_get_ip_from_hostname()` - App…
junhaoliao Oct 6, 2025
d9da763
refactor(clp-package-utils): Rename db logging config variable and en…
junhaoliao Oct 6, 2025
c200502
use long form `:readonly` instead of short form `:ro` in binding moun…
junhaoliao Oct 6, 2025
5751ddf
revert unintentional change
junhaoliao Oct 6, 2025
3e9734e
chore(deployment): Add local logging driver to service defaults.
junhaoliao Oct 6, 2025
ab64f4a
Improve docs - Apply suggestions from code review
junhaoliao Oct 7, 2025
2d9906f
chore(deployment): Adjust healthcheck defaults (start interval, perio…
junhaoliao Oct 7, 2025
3635883
refactor(deployment): Use long form volume definitions and use bind o…
junhaoliao Oct 7, 2025
5e0d0d2
refactor(deployment): Use long form port definitions; fix CLP_RESULTS…
junhaoliao Oct 7, 2025
e771800
refactor(deployment): Reorder port definitions to long form order.
junhaoliao Oct 7, 2025
6e878cf
refactor(clp-package-utils): Replace deprecated Pydantic (V2) copy() …
junhaoliao Oct 7, 2025
f2b0967
feat(deployment): Add support for configurable logs input directory m…
junhaoliao Oct 7, 2025
01558ba
lint
junhaoliao Oct 7, 2025
8a3985b
refactor(deployment): Remove secrets usage; replace with environment …
junhaoliao Oct 7, 2025
c691735
docs(design): Add health check defaults.
junhaoliao Oct 7, 2025
787c7d7
refactor(clp-package-utils): Improve error handling for config loadin…
junhaoliao Oct 7, 2025
323cb58
Merge branch 'main' into docker-compose
junhaoliao Oct 9, 2025
8b97bc2
refactor(deployment): Remove outdated comment in reducer healthcheck …
junhaoliao Oct 9, 2025
ccb8f66
refactor(deployment): Remove duplicated docstring from docker-compose…
junhaoliao Oct 9, 2025
40b88a0
docs(design): Remove health check defaults section; add comments to h…
junhaoliao Oct 9, 2025
e58512a
docs(quick-start): Add required containerd, Docker CE, CLI, and compo…
junhaoliao Oct 10, 2025
d7b40b0
Merge branch 'main' into docker-compose
junhaoliao Oct 10, 2025
37a0a8a
restore verbose logging option to start_clp script.
junhaoliao Oct 10, 2025
3cdf245
Merge remote-tracking branch 'origin/main' into docker-compose
junhaoliao Oct 14, 2025
95d900a
refactor(clp-py-utils): Use DEFAULT_PORT for WebUi port configuration.
junhaoliao Oct 14, 2025
f6cbc7f
apply docs suggestions
junhaoliao Oct 15, 2025
5bf0887
refactor(controller): Rename clp_config -> _clp_config for consistent…
junhaoliao Oct 15, 2025
26dfaca
refactor(deployment): Rename Redis environment variables for improved…
junhaoliao Oct 15, 2025
99e983f
Apply suggestions - Reformat multiline statements and remove unnecess…
junhaoliao Oct 15, 2025
0b028dc
Apply suggestions - Rename UID/GID environment variables for improved…
junhaoliao Oct 15, 2025
f11963c
Apply suggestions - Use constants for jobs table names.
junhaoliao Oct 15, 2025
91cb9fc
refactor(controller): Remove redundant `stderr` redirection from subp…
junhaoliao Oct 15, 2025
1d57a89
docs: move high-level comments to individual blocks
junhaoliao Oct 15, 2025
f4254fa
docs(controller): reference issue for revisiting worker count logic
junhaoliao Oct 15, 2025
932c334
refactor(controller): add `--wait` flag to `docker compose up` comman…
junhaoliao Oct 15, 2025
a8cee64
refactor(controller): replace `Dict` with a custom `EnvVarsDict` for …
junhaoliao Oct 16, 2025
f0e1740
lint
junhaoliao Oct 16, 2025
6679c2d
use local variable in assignment in _update_settings_object()
junhaoliao Oct 16, 2025
416c31a
refactor(controller): remove redundant try-except blocks around `subp…
junhaoliao Oct 16, 2025
9ab85d2
add warning log for failed Docker dependency check before attempting …
junhaoliao Oct 16, 2025
1345de3
refactor(controller): add explicit return type annotations for all me…
junhaoliao Oct 16, 2025
25c3812
Merge branch 'main' into docker-compose
junhaoliao Oct 16, 2025
3c3e23d
refactor(scripts): move `instance_id` assignment inside `try` block i…
junhaoliao Oct 16, 2025
7c43aad
revert `--config` argument removal in `stop_clp` to allow specifying …
junhaoliao Oct 16, 2025
63c700a
refactor(config): Rename CLP_DEFAULT_ARCHIVE_* -> CLP_DEFAULT_ARCHIVE…
junhaoliao Oct 16, 2025
0f7c78c
apply docs suggestions
junhaoliao Oct 16, 2025
5e07756
docs(clp-package-utils): Update docstrings for clarity and consistency.
junhaoliao Oct 16, 2025
ff4f06c
remove return type from dump_shared_container_config()
junhaoliao Oct 16, 2025
e381c3d
refactor(clp-package-utils): Rename base_config -> component_config i…
junhaoliao Oct 16, 2025
f426c82
Simplify raised error message for missing configuration file: redis
junhaoliao Oct 16, 2025
73ec68f
Apply suggestion - To alphabetize
junhaoliao Oct 16, 2025
4e707ae
Remove individual log file paths and use shared logging volume
junhaoliao Oct 16, 2025
6f53877
fix(controller): revert removal of "ClpQueryEngine" in server_setting…
junhaoliao Oct 16, 2025
e7addd3
refactor(clp-package-utils): Rename CLP_PACKAGE_CONTAINER -> CLP_PACK…
junhaoliao Oct 16, 2025
d4e81f3
fix(docker-compose): use `:?error` syntax to mark required non-empty …
junhaoliao Oct 16, 2025
a5a497d
disallow empty `username` and `password` in `Database` because our db…
junhaoliao Oct 16, 2025
0dd11ce
fix(docker-compose): Update default database image to mariadb:10-jammy.
junhaoliao Oct 16, 2025
d27a0b1
fix(docker-compose): Add stop_grace_period for compression-worker and…
junhaoliao Oct 16, 2025
a6478e6
apply docs suggestions
junhaoliao Oct 16, 2025
7bbd134
refactor(docker-compose): Rename volume_root_logs_readonly -> volume_…
junhaoliao Oct 16, 2025
04e56f0
ensure `healthcheck` goes after `command`
junhaoliao Oct 16, 2025
0fc91d7
fix(docker-compose): Add AWS credentials environment variables in web…
junhaoliao Oct 16, 2025
f307d94
fix(docker-compose): Update environment variables for archive and str…
junhaoliao Oct 16, 2025
e5a90dd
Replace "docker-compose" (usually refer to v1) with "Docker Compose" …
junhaoliao Oct 16, 2025
3015ec4
lint
junhaoliao Oct 16, 2025
ff43c78
refactor(clp-package-utils): Reorder functions and rename is_docker_c…
junhaoliao Oct 16, 2025
699979a
fix(docker-compose): Add mounts for archive and stream output in garb…
junhaoliao Oct 16, 2025
092a86c
refactor(clp-package-utils): Rename _is_docker_compose_running -> _is…
junhaoliao Oct 16, 2025
8d75171
fix(docker-compose): Rename CLP_AWS_ACCESS_KEY_ID -> CLP_STREAM_OUTPU…
junhaoliao Oct 16, 2025
f53339f
fix(docker-compose): Break long volume mount lines for improved reada…
junhaoliao Oct 16, 2025
18d1109
lint
junhaoliao Oct 16, 2025
9d9bd46
refactor(clp-package-utils): Rename should_compose_run -> should_comp…
junhaoliao Oct 18, 2025
225c6c5
Merge branch 'main' into docker-compose
junhaoliao Oct 18, 2025
127aacc
refactor(clp-package-utils): Reorder private functions
junhaoliao Oct 18, 2025
f5ea3e2
Replace EnvironmentError with OSError as per Ruff os-error-alias (UP024)
junhaoliao Oct 18, 2025
22593d8
refactor(general): Improve error handling for Docker and Docker Compose
junhaoliao Oct 18, 2025
bc69b26
refactor(general): Update dump_container_config call to remove return…
junhaoliao Oct 18, 2025
1d2ab61
refactor(docs): Update exception documentation to use 'raise' format
junhaoliao Oct 18, 2025
d4c921a
add missing `f` in f-string
junhaoliao Oct 18, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
772 changes: 772 additions & 0 deletions components/clp-package-utils/clp_package_utils/controller.py

Large diffs are not rendered by default.

190 changes: 107 additions & 83 deletions components/clp-package-utils/clp_package_utils/general.py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirkrodrigues any reason we didn't make the validate_and_load_ functions as instance methods?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I had a reason but I can't remember it anymore, so probably not an important one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the return of dump_shared_container_config since it's not used by anything (it's also not documented by the docstring)?

Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import enum
import errno
import json
import os
import pathlib
import re
Expand All @@ -15,6 +16,9 @@
CLP_DEFAULT_CREDENTIALS_FILE_PATH,
CLP_SHARED_CONFIG_FILENAME,
CLPConfig,
CONTAINER_AWS_CONFIG_DIRECTORY,
CONTAINER_CLP_HOME,
CONTAINER_INPUT_LOGS_ROOT_DIR,
DB_COMPONENT_NAME,
QueryEngine,
QUEUE_COMPONENT_NAME,
Expand Down Expand Up @@ -42,12 +46,6 @@
EXTRACT_IR_CMD = "i"
EXTRACT_JSON_CMD = "j"

# Paths
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to move this to clp_config.py to avoid circular import

CONTAINER_AWS_CONFIG_DIRECTORY = pathlib.Path("/") / ".aws"
CONTAINER_CLP_HOME = pathlib.Path("/") / "opt" / "clp"
CONTAINER_INPUT_LOGS_ROOT_DIR = pathlib.Path("/") / "mnt" / "logs"
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH = pathlib.Path("etc") / "clp-config.yml"

DOCKER_MOUNT_TYPE_STRINGS = ["bind"]


Expand Down Expand Up @@ -98,13 +96,6 @@ def __init__(self, clp_home: pathlib.Path, docker_clp_home: pathlib.Path):
self.generated_config_file: Optional[DockerMount] = None


def _validate_data_directory(data_dir: pathlib.Path, component_name: str) -> None:
try:
validate_path_could_be_dir(data_dir)
except ValueError as ex:
raise ValueError(f"{component_name} data directory is invalid: {ex}")


def get_clp_home():
# Determine CLP_HOME from an environment variable or this script's path
clp_home = None
Expand Down Expand Up @@ -132,63 +123,35 @@ def generate_container_name(job_type: str) -> str:
return f"clp-{job_type}-{str(uuid.uuid4())[-4:]}"


def check_dependencies():
def check_docker_dependencies(should_compose_project_be_running: bool, project_name: str):
"""
Checks if Docker and Docker Compose are installed, and whether a Docker Compose project is
running.

:param should_compose_project_be_running:
:param project_name: The Docker Compose project name to check.
:raise OSError: If any Docker dependency is not installed or the Docker Compose project state
doesn't match `should_compose_run`.
"""
try:
subprocess.run(
"command -v docker",
shell=True,
stdout=subprocess.PIPE,
stdout=subprocess.DEVNULL,
stderr=subprocess.STDOUT,
check=True,
)
except subprocess.CalledProcessError:
raise EnvironmentError("docker is not installed or available on the path")
try:
subprocess.run(
["docker", "ps"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, check=True
)
except subprocess.CalledProcessError:
raise EnvironmentError("docker cannot run without superuser privileges (sudo).")
except subprocess.CalledProcessError as e:
err_msg = "docker is not installed or available on the path"
raise OSError(err_msg) from e


def is_container_running(container_name):
# fmt: off
cmd = [
"docker", "ps",
# Only return container IDs
"--quiet",
"--filter", f"name={container_name}"
]
# fmt: on
proc = subprocess.run(cmd, stdout=subprocess.PIPE)
if proc.stdout.decode("utf-8"):
return True

return False


def is_container_exited(container_name):
# fmt: off
cmd = [
"docker", "ps",
# Only return container IDs
"--quiet",
"--filter", f"name={container_name}",
"--filter", "status=exited"
]
# fmt: on
proc = subprocess.run(cmd, stdout=subprocess.PIPE)
if proc.stdout.decode("utf-8"):
return True

return False


def validate_log_directory(logs_dir: pathlib.Path, component_name: str) -> None:
try:
validate_path_could_be_dir(logs_dir)
except ValueError as ex:
raise ValueError(f"{component_name} logs directory is invalid: {ex}")
is_running = _is_docker_compose_project_running(project_name)
if should_compose_project_be_running and not is_running:
err_msg = f"Docker Compose project '{project_name}' is not running."
raise OSError(err_msg)
if not should_compose_project_be_running and is_running:
err_msg = f"Docker Compose project '{project_name}' is already running."
raise OSError(err_msg)


def validate_port(port_name: str, hostname: str, port: int):
Expand Down Expand Up @@ -309,6 +272,19 @@ def generate_container_config(
return container_clp_config, docker_mounts


def generate_docker_compose_container_config(clp_config: CLPConfig) -> CLPConfig:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate_container_config() should eventually be deprecated when we migrate the other (compress / decompress / search / management) scripts to docker compose

"""
Copies the given config and transforms mount paths and hosts for Docker Compose.

:param clp_config:
:return: The container config.
"""
container_clp_config = clp_config.model_copy(deep=True)
container_clp_config.transform_for_container()

return container_clp_config


def generate_worker_config(clp_config: CLPConfig) -> WorkerConfig:
worker_config = WorkerConfig()
worker_config.package = clp_config.package.model_copy(deep=True)
Expand Down Expand Up @@ -345,17 +321,15 @@ def dump_container_config(
return config_file_path_on_container, config_file_path_on_host


def dump_shared_container_config(
container_clp_config: CLPConfig, clp_config: CLPConfig
) -> Tuple[pathlib.Path, pathlib.Path]:
def dump_shared_container_config(container_clp_config: CLPConfig, clp_config: CLPConfig):
"""
Dumps the given container config to `CLP_SHARED_CONFIG_FILENAME` in the logs directory, so that
it's accessible in the container.

:param container_clp_config:
:param clp_config:
"""
return dump_container_config(container_clp_config, clp_config, CLP_SHARED_CONFIG_FILENAME)
dump_container_config(container_clp_config, clp_config, CLP_SHARED_CONFIG_FILENAME)


def generate_container_start_cmd(
Expand Down Expand Up @@ -431,11 +405,6 @@ def load_config_file(
validate_path_for_container_mount(clp_config.data_directory)
validate_path_for_container_mount(clp_config.logs_directory)

# Make data and logs directories node-specific
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BREAKING

hostname = socket.gethostname()
clp_config.data_directory /= hostname
clp_config.logs_directory /= hostname

return clp_config


Expand Down Expand Up @@ -488,35 +457,44 @@ def validate_and_load_redis_credentials_file(
clp_config.redis.load_credentials_from_file(clp_config.credentials_file_path)


def validate_db_config(clp_config: CLPConfig, data_dir: pathlib.Path, logs_dir: pathlib.Path):
def validate_db_config(
clp_config: CLPConfig,
component_config: pathlib.Path,
data_dir: pathlib.Path,
logs_dir: pathlib.Path,
):
if not component_config.exists():
raise ValueError(f"{DB_COMPONENT_NAME} configuration file missing: '{component_config}'.")
_validate_data_directory(data_dir, DB_COMPONENT_NAME)
validate_log_directory(logs_dir, DB_COMPONENT_NAME)
_validate_log_directory(logs_dir, DB_COMPONENT_NAME)

validate_port(f"{DB_COMPONENT_NAME}.port", clp_config.database.host, clp_config.database.port)
Comment on lines +460 to 471
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add file type validation for component_config.

The function checks component_config.exists() but doesn't verify it's a file. A directory with the same name would pass validation but cause issues later.

Apply this diff:

     if not component_config.exists():
         raise ValueError(f"{DB_COMPONENT_NAME} configuration file missing: '{component_config}'.")
+    if not component_config.is_file():
+        raise ValueError(f"{DB_COMPONENT_NAME} configuration at '{component_config}' is not a file.")
🧰 Tools
🪛 Ruff (0.14.0)

464-464: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 457 to
468, the current validation only checks component_config.exists() which allows
directories to pass; update the check to ensure component_config.is_file() (or
check exists() and is_file()) and if not raise a ValueError stating the
DB_COMPONENT_NAME configuration file is missing or not a file, so directories
are rejected with a clear error before proceeding.



def validate_queue_config(clp_config: CLPConfig, logs_dir: pathlib.Path):
validate_log_directory(logs_dir, QUEUE_COMPONENT_NAME)
_validate_log_directory(logs_dir, QUEUE_COMPONENT_NAME)

validate_port(f"{QUEUE_COMPONENT_NAME}.port", clp_config.queue.host, clp_config.queue.port)


def validate_redis_config(
clp_config: CLPConfig, data_dir: pathlib.Path, logs_dir: pathlib.Path, base_config: pathlib.Path
clp_config: CLPConfig,
component_config: pathlib.Path,
data_dir: pathlib.Path,
logs_dir: pathlib.Path,
):
_validate_data_directory(data_dir, REDIS_COMPONENT_NAME)
validate_log_directory(logs_dir, REDIS_COMPONENT_NAME)

if not base_config.exists():
if not component_config.exists():
raise ValueError(
f"{REDIS_COMPONENT_NAME} base configuration at {str(base_config)} is missing."
f"{REDIS_COMPONENT_NAME} configuration file missing: '{component_config}'."
)
_validate_data_directory(data_dir, REDIS_COMPONENT_NAME)
_validate_log_directory(logs_dir, REDIS_COMPONENT_NAME)

validate_port(f"{REDIS_COMPONENT_NAME}.port", clp_config.redis.host, clp_config.redis.port)
Comment on lines 480 to 493
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add file type validation for component_config.

Same issue as validate_db_config—the function only checks component_config.exists() without verifying it's a file.

Apply this diff:

     if not component_config.exists():
         raise ValueError(
             f"{REDIS_COMPONENT_NAME} configuration file missing: '{component_config}'."
         )
+    if not component_config.is_file():
+        raise ValueError(
+            f"{REDIS_COMPONENT_NAME} configuration at '{component_config}' is not a file."
+        )
🧰 Tools
🪛 Ruff (0.14.0)

487-489: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines 480 to
493, the validate_redis_config currently only checks component_config.exists()
but not that it is a regular file; update the check to validate that
component_config.exists() and component_config.is_file(), and if not raise a
ValueError with the same style/message used for missing config files (e.g.,
"{REDIS_COMPONENT_NAME} configuration file missing: '{component_config}'.") so
the function mirrors the validate_db_config behavior and rejects directories.



def validate_reducer_config(clp_config: CLPConfig, logs_dir: pathlib.Path, num_workers: int):
validate_log_directory(logs_dir, REDUCER_COMPONENT_NAME)
_validate_log_directory(logs_dir, REDUCER_COMPONENT_NAME)

for i in range(0, num_workers):
validate_port(
Expand All @@ -527,10 +505,17 @@ def validate_reducer_config(clp_config: CLPConfig, logs_dir: pathlib.Path, num_w


def validate_results_cache_config(
clp_config: CLPConfig, data_dir: pathlib.Path, logs_dir: pathlib.Path
clp_config: CLPConfig,
component_config: pathlib.Path,
data_dir: pathlib.Path,
logs_dir: pathlib.Path,
):
if not component_config.exists():
raise ValueError(
f"{RESULTS_CACHE_COMPONENT_NAME} configuration file missing: '{component_config}'."
)
_validate_data_directory(data_dir, RESULTS_CACHE_COMPONENT_NAME)
validate_log_directory(logs_dir, RESULTS_CACHE_COMPONENT_NAME)
_validate_log_directory(logs_dir, RESULTS_CACHE_COMPONENT_NAME)

validate_port(
f"{RESULTS_CACHE_COMPONENT_NAME}.port",
Expand Down Expand Up @@ -707,3 +692,42 @@ def get_celery_connection_env_vars_list(container_clp_config: CLPConfig) -> List
]

return env_vars


def _is_docker_compose_project_running(project_name: str) -> bool:
"""
Checks if a Docker Compose project is running.

:param project_name:
:return: Whether at least one instance is running.
:raise OSError: If Docker Compose is not installed or fails.
"""
cmd = ["docker", "compose", "ls", "--format", "json", "--filter", f"name={project_name}"]
try:
output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
running_instances = json.loads(output)
return len(running_instances) >= 1
except subprocess.CalledProcessError as e:
err_msg = "Docker Compose is not installed or not functioning properly."
raise OSError(err_msg) from e


def _validate_data_directory(data_dir: pathlib.Path, component_name: str) -> None:
try:
validate_path_could_be_dir(data_dir)
except ValueError as ex:
raise ValueError(f"{component_name} data directory is invalid: {ex}")
Comment on lines +715 to +719
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add exception chaining to preserve traceback.

Both validation helpers should chain the original exception to preserve the full traceback for debugging.

Apply this diff:

 def _validate_data_directory(data_dir: pathlib.Path, component_name: str) -> None:
     try:
         validate_path_could_be_dir(data_dir)
     except ValueError as ex:
-        raise ValueError(f"{component_name} data directory is invalid: {ex}")
+        raise ValueError(f"{component_name} data directory is invalid: {ex}") from ex

Apply the same pattern to _validate_log_directory:

 def _validate_log_directory(logs_dir: pathlib.Path, component_name: str):
     """
     Validates that the logs directory path for a component is valid.
 
     :param logs_dir:
     :param component_name:
     :raises ValueError: If the path is invalid or can't be a directory.
     """
     try:
         validate_path_could_be_dir(logs_dir)
     except ValueError as ex:
-        raise ValueError(f"{component_name} logs directory is invalid: {ex}")
+        raise ValueError(f"{component_name} logs directory is invalid: {ex}") from ex

Also applies to: 722-733

🧰 Tools
🪛 Ruff (0.14.0)

719-719: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


719-719: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/general.py around lines
715-719 (and similarly for the _validate_log_directory function at lines
722-733), the ValueError raised after catching validate_path_could_be_dir does
not use exception chaining; update both raises to include the original exception
using "from ex" so the original traceback is preserved (e.g., raise
ValueError(f\"{component_name} data directory is invalid: {ex}\") from ex), and
apply the identical pattern to the _validate_log_directory raise.



def _validate_log_directory(logs_dir: pathlib.Path, component_name: str):
"""
Validates that the logs directory path for a component is valid.

:param logs_dir:
:param component_name:
:raise ValueError: If the path is invalid or can't be a directory.
"""
try:
validate_path_could_be_dir(logs_dir)
except ValueError as ex:
raise ValueError(f"{component_name} logs directory is invalid: {ex}")
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@
from clp_py_utils.clp_config import (
CLP_DB_PASS_ENV_VAR_NAME,
CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLP_DEFAULT_DATASET_NAME,
StorageEngine,
StorageType,
)

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLPConfig,
DockerMount,
dump_container_config,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@
from clp_py_utils.clp_config import (
CLP_DB_PASS_ENV_VAR_NAME,
CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLP_DEFAULT_DATASET_NAME,
StorageEngine,
)
from job_orchestration.scheduler.job_config import InputType

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CONTAINER_INPUT_LOGS_ROOT_DIR,
dump_container_config,
generate_container_config,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@
ARCHIVE_MANAGER_ACTION_NAME,
CLP_DB_PASS_ENV_VAR_NAME,
CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
StorageEngine,
StorageType,
)
from clp_py_utils.s3_utils import generate_container_auth_options

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
dump_container_config,
generate_container_config,
generate_container_name,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@
from clp_py_utils.clp_config import (
CLP_DB_PASS_ENV_VAR_NAME,
CLP_DB_USER_ENV_VAR_NAME,
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLP_DEFAULT_DATASET_NAME,
CLPConfig,
StorageEngine,
StorageType,
)

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
DockerMount,
DockerMountType,
dump_container_config,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,14 @@
from pathlib import Path
from typing import Any, List, Optional

from clp_py_utils.clp_config import Database
from clp_py_utils.clp_config import CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH, Database
from clp_py_utils.clp_metadata_db_utils import (
delete_archives_from_metadata_db,
get_archives_table_name,
)
from clp_py_utils.sql_adapter import SQL_Adapter

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLPConfig,
get_clp_home,
load_config_file,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import brotli
import msgpack
from clp_py_utils.clp_config import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CLPConfig,
COMPRESSION_JOBS_TABLE_NAME,
)
Expand All @@ -29,7 +30,6 @@
)

from clp_package_utils.general import (
CLP_DEFAULT_CONFIG_FILE_RELATIVE_PATH,
CONTAINER_INPUT_LOGS_ROOT_DIR,
get_clp_home,
load_config_file,
Expand Down
Loading
Loading