-
Notifications
You must be signed in to change notification settings - Fork 2
docs: Add k8s setup instructions. #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-0.293-clp-connector
Are you sure you want to change the base?
Changes from 6 commits
5f0eb8c
5bae0ce
c82f6eb
ba8cdf6
80ec62d
e40f329
6cafbfc
17d7d31
8aa5cc1
c8666a4
d730404
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
demo-assets/clp-config.yml | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Patterns to ignore when building packages. | ||
# This supports shell glob matching, relative path matching, and | ||
# negation (prefixed with !). Only one pattern per line. | ||
.DS_Store | ||
# Common VCS dirs | ||
.git/ | ||
.gitignore | ||
.bzr/ | ||
.bzrignore | ||
.hg/ | ||
.hgignore | ||
.svn/ | ||
# Common backup files | ||
*.swp | ||
*.bak | ||
*.tmp | ||
*.orig | ||
*~ | ||
|
||
Comment on lines
+4
to
+19
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Minor style nit – drop the leading dots for Helm-relative paths Unlike 🤖 Prompt for AI Agents
|
||
# Various IDEs | ||
.project | ||
.idea/ | ||
*.tmproj | ||
.vscode/ | ||
|
||
# Demo assets | ||
/demo-assets/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
apiVersion: v2 | ||
name: presto-velox | ||
description: A Helm chart for Kubernetes | ||
|
||
# A chart can be either an 'application' or a 'library' chart. | ||
# | ||
# Application charts are a collection of templates that can be packaged into versioned archives | ||
# to be deployed. | ||
# | ||
# Library charts provide useful utilities or functions for the chart developer. They're included as | ||
# a dependency of application charts to inject those utilities and functions into the rendering | ||
# pipeline. Library charts do not define any templates and therefore cannot be deployed. | ||
type: application | ||
|
||
# This is the chart version. This version number should be incremented each time you make changes | ||
# to the chart and its templates, including the app version. | ||
# Versions are expected to follow Semantic Versioning (https://semver.org/) | ||
version: 0.1.0 | ||
|
||
# This is the version number of the application being deployed. This version number should be | ||
# incremented each time you make changes to the application. Versions are not expected to | ||
# follow Semantic Versioning. They should reflect the version the application is using. | ||
# It is recommended to use it with quotes. | ||
appVersion: "1.16.0" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# Setup local K8s cluster for presto + clp | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Keep a single H1 and demote the others -# Launch clp-package
+# ## Launch clp-package
...
-# Create k8s Cluster
+# ## Create k8s Cluster
...
-# Working with helm chart
+# ## Working with helm chart
...
-# Delete k8s Cluster
+# ## Delete k8s Cluster Also applies to: 26-26, 53-53, 59-59, 94-94 🤖 Prompt for AI Agents
|
||
|
||
## Install docker | ||
|
||
Follow the guide here: [docker] | ||
|
||
## Install kubectl | ||
|
||
`kubectl` is the command-line tool for interacting with Kubernetes clusters. You will use it to | ||
manage and inspect your k3d cluster. | ||
|
||
Follow the guide here: [kubectl] | ||
|
||
## Install k3d | ||
|
||
k3d is a lightweight wrapper to run k3s (Rancher Lab's minimal Kubernetes distribution) in docker. | ||
|
||
Follow the guide here: [k3d] | ||
|
||
## Install Helm | ||
|
||
Helm is the package manager for Kubernetes. | ||
|
||
Follow the guide here: [helm] | ||
|
||
# Launch clp-package | ||
1. Find the clp-package for test on our official website [clp-json-v0.4.0]. We also put the dataset for demo here: `mongod-256MB-presto-clp.log.tar.gz`. | ||
|
||
2. Untar it. | ||
|
||
3. Replace the content of `/path/to/clp-json-package/etc/clp-config.yml` with the output of `demo-assets/init.sh <ip_addr>` where the `<ip_addr>` is the IP address of the host that you are running the clp-package. | ||
|
||
4. Launch: | ||
```bash | ||
# You probably want to run in a 3.11 python environment | ||
sbin/start-clp.sh | ||
``` | ||
Comment on lines
+31
to
+37
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Insert blank lines around list items and fenced blocks 3. Replace the content …
-4. Launch:
-```bash
+#
+4. Launch:
+
+```bash Apply the same pattern to the other ordered-list sections. 🧰 Tools🪛 markdownlint-cli2 (0.17.2)33-33: Lists should be surrounded by blank lines (MD032, blanks-around-lists) 34-34: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) 🤖 Prompt for AI Agents
|
||
|
||
5. Compress: | ||
```bash | ||
# You can also use your own dataset | ||
sbin/compress.sh --timestamp-key 't.dollar_sign_date' datasets/mongod-256MB-processed.log | ||
``` | ||
|
||
6. Use the following command to update the CLP metadata database so that the worker can find the archives in right place: | ||
```bash | ||
# Install mysql-client if necessary | ||
sudo apt update && sudo apt install -y mysql-client | ||
Comment on lines
+33
to
+48
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Surround list items and fenced blocks with blank lines Several ordered-list steps (e.g., 3–6) butt directly against code fences, triggering MD031/MD032 and rendering oddly. Insert a blank line before and after each fenced block and between list items. 🧰 Tools🪛 LanguageTool[uncategorized] ~45-~45: Possible missing article found. (AI_HYDRA_LEO_MISSING_THE) 🪛 markdownlint-cli2 (0.17.2)33-33: Lists should be surrounded by blank lines (MD032, blanks-around-lists) 34-34: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) 39-39: Ordered list item prefix (MD029, ol-prefix) 39-39: Lists should be surrounded by blank lines (MD032, blanks-around-lists) 40-40: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) 45-45: Ordered list item prefix (MD029, ol-prefix) 45-45: Lists should be surrounded by blank lines (MD032, blanks-around-lists) 46-46: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) 🤖 Prompt for AI Agents
|
||
# Find the user and password in /path/to/clp-json-package/etc/credential.yml | ||
mysql -h ${REPLACE_IP} -P 6001 -u ${REPLACE_USER} -p'${REPLACE_PASSWORD}' clp-db -e "UPDATE clp_datasets SET archive_storage_directory = '/var/data/archives/default';" | ||
``` | ||
Comment on lines
+45
to
+51
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Minor grammar fix for clarity -mysql … -e "UPDATE clp_datasets SET archive_storage_directory = '/var/data/archives/default';"
+mysql … -e "UPDATE clp_datasets SET archive_storage_directory = '/var/data/archives/default';"
^ add “the”
🧰 Tools🪛 LanguageTool[uncategorized] ~45-~45: Possible missing article found. (AI_HYDRA_LEO_MISSING_THE) 🪛 markdownlint-cli2 (0.17.2)45-45: Ordered list item prefix (MD029, ol-prefix) 45-45: Lists should be surrounded by blank lines (MD032, blanks-around-lists) 46-46: Fenced code blocks should be surrounded by blank lines (MD031, blanks-around-fences) 🤖 Prompt for AI Agents
|
||
|
||
# Create k8s Cluster | ||
Create a local k8s cluster with port forwarding | ||
```bash | ||
k3d cluster create yscope --servers 1 --agents 1 -v $(readlink -f /path/to/clp-json-package/var/data/archives):/var/data/archives | ||
``` | ||
|
||
# Working with helm chart | ||
## Install | ||
In `yscope-k8s/templates/presto/presto-coordinator-config.yaml` replace the `${REPLACE_IP}` in `clp.metadata-db-url=jdbc:mysql://${REPLACE_IP}:6001` with the IP address of the host you are running the clp-package (basially match the IP address that you configured in the `etc/clp-config.yml` of the clp-package). | ||
|
||
```bash | ||
cd yscope-k8s | ||
|
||
helm template . | ||
|
||
helm install demo . | ||
``` | ||
|
||
## Use cli: | ||
After all containers are in "Running" states (check by `kubectl get pods`): | ||
```bash | ||
kubectl port-forward service/presto-coordinator 8080:8080 | ||
``` | ||
|
||
Then you can further forward the 8080 port to your local laptop, to access the Presto's WebUI by e.g., http://localhost:8080 | ||
|
||
To use presto-cli: | ||
```bash | ||
./presto-cli-0.293-executable.jar --catalog clp --schema default --server localhost:8080 | ||
``` | ||
|
||
Example query: | ||
``` | ||
SELECT * FROM default LIMIT 1; | ||
``` | ||
Comment on lines
+88
to
+90
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Add language hint to SQL block The example query lacks a language tag, tripping MD040 and losing syntax highlighting. -```
+```sql
SELECT * FROM default LIMIT 1;
In yscope-k8s/README.md around lines 88 to 90, the SQL code block is missing a
|
||
|
||
## Uninstall | ||
```bash | ||
helm uninstall demo | ||
``` | ||
|
||
# Delete k8s Cluster | ||
```bash | ||
k3d cluster delete yscope | ||
``` | ||
|
||
|
||
[clp-json-v0.4.0]: https://github.com/y-scope/clp/releases/tag/v0.4.0 | ||
[docker]: https://docs.docker.com/engine/install | ||
[k3d]: https://k3d.io/stable/#installation | ||
[kubectl]: https://kubernetes.io/docs/tasks/tools/#kubectl | ||
[helm]: https://helm.sh/docs/intro/install/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
package: | ||
storage_engine: "clp-s" | ||
database: | ||
type: "mariadb" | ||
host: "${REPLACE_IP}" | ||
port: 6001 | ||
name: "clp-db" | ||
query_scheduler: | ||
host: "${REPLACE_IP}" | ||
port: 6002 | ||
jobs_poll_delay: 0.1 | ||
num_archives_to_search_per_sub_job: 16 | ||
logging_level: "INFO" | ||
queue: | ||
host: "${REPLACE_IP}" | ||
port: 6003 | ||
redis: | ||
host: "${REPLACE_IP}" | ||
port: 6004 | ||
query_backend_database: 0 | ||
compression_backend_database: 1 | ||
reducer: | ||
host: "${REPLACE_IP}" | ||
base_port: 6100 | ||
logging_level: "INFO" | ||
upsert_interval: 100 | ||
results_cache: | ||
host: "${REPLACE_IP}" | ||
port: 6005 | ||
db_name: "clp-query-results" | ||
stream_collection_name: "stream-files" | ||
webui: | ||
host: "localhost" | ||
port: 6000 | ||
logging_level: "INFO" | ||
log_viewer_webui: | ||
host: "localhost" | ||
port: 6006 | ||
Comment on lines
+32
to
+38
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) Consider parameterizing Most services rely on the -webui:
- host: "localhost"
+webui:
+ host: "${REPLACE_IP}"
...
-log_viewer_webui:
- host: "localhost"
+log_viewer_webui:
+ host: "${REPLACE_IP}" 🤖 Prompt for AI Agents
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
#!/usr/bin/env bash | ||
|
||
Comment on lines
+1
to
+2
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Enable safer Bash defaults Fail fast on any unexpected error or unset variable to avoid silently producing a partially-edited config. #!/usr/bin/env bash
+set -euo pipefail 🤖 Prompt for AI Agents
|
||
if [ "$#" -ne 1 ]; then | ||
echo "Usage: $0 <ip-address>" | ||
exit 1 | ||
fi | ||
|
||
SCRIPT_PATH="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
|
||
IP="$1" | ||
FILE="${SCRIPT_PATH}/clp-config.yml" | ||
|
||
cp "${FILE}.bak" "$FILE" | ||
|
||
sed -i "s|\${REPLACE_IP}|$IP|g" "$FILE" | ||
Comment on lines
+13
to
+15
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive)
The current one-argument form works on GNU sed; macOS requires an empty string after if sed --version >/dev/null 2>&1; then
sed -i "s|${REPLACE_IP}|$IP|g" "$FILE"
else
sed -i '' "s|${REPLACE_IP}|$IP|g" "$FILE"
fi This prevents the script from failing on contributors’ Macs. 🤖 Prompt for AI Agents
|
||
|
||
echo "Replaced \${REPLACE_IP} with $IP in $FILE" |
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,8 @@ | ||||||||||||
apiVersion: "v1" | ||||||||||||
kind: "Secret" | ||||||||||||
metadata: | ||||||||||||
name: "aws-credentials" | ||||||||||||
namespace: "default" | ||||||||||||
coderabbitai[bot] marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||
type: "Opaque" | ||||||||||||
data: | ||||||||||||
credentials: "W2RlZmF1bHRdCmF3c19hY2Nlc3Nfa2V5X2lkID0gbWluaW9hZG1pbgphd3Nfc2VjcmV0X2FjY2Vzc19rZXkgPSBtaW5pb2FkbWluCg==" | ||||||||||||
|
data: | |
credentials: "W2RlZmF1bHRdCmF3c19hY2Nlc3Nfa2V5X2lkID0gbWluaW9hZG1pbgphd3Nfc2VjcmV0X2FjY2Vzc19rZXkgPSBtaW5pb2FkbWluCg==" | |
data: | |
# Base64 of a generated ~/.aws/credentials file | |
credentials: {{ .Values.objectStore.minio.awsCredentials | b64enc | quote }} |
🧰 Tools
🪛 Checkov (3.2.334)
[LOW] 1-8: The default namespace should not be used
(CKV_K8S_21)
🤖 Prompt for AI Agents
In yscope-k8s/templates/object-store/aws-credentials.yaml at lines 7-8, the
base64-encoded static credentials containing real keys are committed, risking
secret leaks. Replace the hardcoded base64 string with templated placeholders
that reference values from values.yaml. Then, add the actual secret keys in
values.yaml with the comment # pragma: allowlist secret to document and allow
secret scanning exceptions. Also, update documentation to instruct generating
and injecting secrets locally instead of committing them.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,157 @@ | ||
kind: "Job" | ||
metadata: | ||
name: "bucket-creation" | ||
spec: | ||
template: | ||
spec: | ||
containers: | ||
# Container to deploy the log viewer. To inspect logs, use the following command: | ||
# `kubectl logs job.batch/bucket-creation` | ||
- name: "bucket-creation" | ||
image: "amazon/aws-cli:latest" | ||
command: | ||
- "/bin/bash" | ||
args: | ||
- "/scripts/bucket-creation.sh" | ||
env: | ||
- name: "AWS_ENDPOINT_URL" | ||
value: "http://{{ .Values.objectStore.minio.serviceName }}.default.svc.cluster.local:{{ .Values.objectStore.minio.apiPort }}" | ||
- name: "BUCKET_NAME" | ||
value: "{{ .Values.objectStore.bucketCreation.bucketName }}" | ||
- name: "PUBLIC" | ||
value: "{{ .Values.objectStore.bucketCreation.public }}" | ||
volumeMounts: | ||
- name: "aws-credentials-volume" | ||
mountPath: "/root/.aws" | ||
- name: "scripts-volume" | ||
mountPath: "/scripts" | ||
imagePullPolicy: "IfNotPresent" | ||
|
||
restartPolicy: "Never" | ||
volumes: | ||
- name: "aws-credentials-volume" | ||
secret: | ||
secretName: "aws-credentials" | ||
|
||
- name: "scripts-volume" | ||
configMap: | ||
name: "bucket-creation" | ||
--- | ||
apiVersion: v1 | ||
kind: "ConfigMap" | ||
metadata: | ||
name: "bucket-creation" | ||
data: | ||
bucket-creation.sh: |- | ||
#!/usr/bin/env bash | ||
|
||
# Create a bucket and optionally it configure with public read access | ||
# on a S3-compatible object store such as MinIO | ||
# | ||
# Requirements: | ||
# | ||
# * AWS CLI authentication configured using any supported method---for example: | ||
# * A credentials file in $HOME/.aws/credentials | ||
# * AWS_CONFIG_FILE pointing to a custom credentials file | ||
# * Environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY | ||
# * Environment variables: | ||
# * AWS_ENDPOINT_URL: The S3-compatible object store endpoint URL | ||
# * BUCKET_NAME: The name of the bucket where the log viewer should be deployed | ||
# * NOTE: This script will make the bucket publicly readable. | ||
# * PUBLIC (Optional): If set to "true", configures bucket with public read policy | ||
set -e | ||
set -o pipefail | ||
set -u | ||
|
||
# Emits a log event to stderr with an auto-generated ISO timestamp as well as the given level | ||
# and message. | ||
# | ||
# @param $1: Level string | ||
# @param $2: Message to be logged | ||
log() { | ||
local -r LEVEL=$1 | ||
local -r MESSAGE=$2 | ||
echo "$(date --utc --date="now" +"%Y-%m-%dT%H:%M:%SZ") [${LEVEL}] ${MESSAGE}" >&2 | ||
} | ||
|
||
# Waits for the S3 endpoint to be available, or exits if it's unavailable. | ||
wait_for_s3_availability() { | ||
# Check availability by listing available buckets | ||
log "INFO" "Waiting until ${AWS_ENDPOINT_URL} endpoint becomes available." | ||
local -r MAX_RETRIES=10 | ||
local -r RETRY_DELAY_IN_SECS=6 | ||
for ((retries = 0; retries < MAX_RETRIES; retries++)); do | ||
if aws s3 ls --endpoint-url "$AWS_ENDPOINT_URL" >/dev/null; then | ||
return | ||
fi | ||
log "WARN" "S3 API endpoint unavailable. Retrying in ${RETRY_DELAY_IN_SECS} seconds." | ||
|
||
sleep "$RETRY_DELAY_IN_SECS" | ||
done | ||
|
||
if [[ $retries -eq $MAX_RETRIES ]]; then | ||
log "ERROR" "Maximum retries reached. S3 API endpoint ${AWS_ENDPOINT_URL} didn't respond." | ||
exit 1 | ||
fi | ||
} | ||
|
||
# Creates a bucket | ||
create_bucket() { | ||
# Create log-viewer bucket if it doesn't already exist | ||
log "INFO" "Creating ${BUCKET_S3_URI} bucket." | ||
if ! aws s3api head-bucket --endpoint-url "$AWS_ENDPOINT_URL" --bucket "$BUCKET_NAME" \ | ||
2>/dev/null; then | ||
aws s3api create-bucket --endpoint-url "$AWS_ENDPOINT_URL" --bucket "$BUCKET_NAME" | ||
fi | ||
} | ||
|
||
# Configures a bucket with public read access | ||
configure_bucket() { | ||
# Define and apply the bucket policy for public read access | ||
log "INFO" "Applying public read access policy to ${BUCKET_S3_URI}" | ||
local -r POLICY=$( | ||
cat <<EOP | ||
{ | ||
"Version": "2012-10-17", | ||
"Statement": [ | ||
{ | ||
"Effect": "Allow", | ||
"Principal": "*", | ||
"Action": "s3:GetObject", | ||
"Resource": "arn:aws:s3:::${BUCKET_NAME}/*" | ||
} | ||
] | ||
} | ||
EOP | ||
) | ||
if ! aws s3api put-bucket-policy \ | ||
--endpoint-url "$AWS_ENDPOINT_URL" \ | ||
--bucket "$BUCKET_NAME" \ | ||
--policy "$POLICY"; then | ||
log "ERROR" "Failed to set bucket policy for ${BUCKET_S3_URI}" | ||
exit 1 | ||
fi | ||
} | ||
|
||
# Validate required environment variables | ||
readonly REQUIRED_ENV_VARS=( | ||
# Example: "http://minio:9000" | ||
"AWS_ENDPOINT_URL" | ||
|
||
# Example: "logs" | ||
"BUCKET_NAME" | ||
) | ||
for var in "${REQUIRED_ENV_VARS[@]}"; do | ||
if ! [[ -v "$var" ]]; then | ||
log "ERROR" "$var environment variable must be set." | ||
exit 1 | ||
fi | ||
done | ||
|
||
readonly BUCKET_S3_URI="s3://${BUCKET_NAME}" | ||
|
||
wait_for_s3_availability | ||
create_bucket | ||
if [[ "${PUBLIC:-false}" = "true" ]]; then | ||
configure_bucket | ||
fi | ||
|
||
log "INFO" "Bucket ${BUCKET_NAME} created and configured successfully." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add companion ignore rules for derived or editor artefacts
You already ignore the generated
clp-config.yml
. It may be worth also excluding its backup variants and common editor swap files to keep the repo clean:📝 Committable suggestion
🤖 Prompt for AI Agents