This repository contains the Kubernetes manifests and deployment tools for deploying and managing OpenCitations services on a Kubernetes cluster. To begin working with this infrastructure, you'll need to either clone or fork this repository to your own GitHub account.
- Python 3.11
- Running Kubernetes cluster with kubectl configured
- Helm package manager
- Git (for Fleet integration)
- Storage system (NFS or alternative)
Before beginning the deployment, you must prepare the OpenCitations databases:
- Download the Meta and Index databases from https://opencitations.net/download
- Place these databases in your storage system that will be used by the infrastructure
- Make note of the storage paths as they will be needed in the configuration
Create and configure your environment file:
cp .env.example .env
Edit the .env
file with your specific configurations for services, infrastructure, and Git integration.
- Edit
preliminary/00-secrets.yaml
with your NFS configuration - Modify
preliminary/02-storage.yaml
with your NFS paths and settings - Ensure the following variables are properly set in
.env
:NFS_SERVER
NFS_CERT_PATH
NFS_DATA_PATH
NFS_LOG_PATH
NFS_LOG_TRAEFIK_SUBPATH
- Modify both
preliminary/00-secrets.yaml
andpreliminary/02-storage.yaml
to match your storage system's requirements - Update the corresponding storage variables in
.env
- Edit
preliminary/03-traefik-values.yaml
:- Modify
additionalArguments
section for HTTPS certificate configuration - If not using MetalLB, for instance in a Cloud environment, remove the MetalLB-specific configurations
- Modify
Update the domain addresses in .env configuration file:
- Web service manifests
- API service manifests
- Any other services with web addresses
Configure your service requirements in the .env
file. Each service has specific resource needs that should be adjusted based on your infrastructure capacity.
Important considerations:
- CPU and memory requests should be set according to your cluster's available resources
- Storage sizes should accommodate your data requirements plus growth
- Service versions should match your deployment requirements
- Ports should be available and not conflict with other services
The OpenCitations infrastructure currently features a showcase website developed on WordPress. To set up a showcase site using this same technological approach, uncomment the 03 and 04 YAML sections and update the WordPress configuration parameters specified in the .env file.
The infrastructure includes an automated backup system for WordPress using rclone and pCloud storage (YAML 04). The system performs daily backups of:
- WordPress database (SQL dump)
- WordPress files
- MariaDB raw data
Configure the backup system by setting the appropriate variables in your .env
:
WORDPRESS_SUBPATH=wordpress_prod
MARIADB_SUBPATH=mariadb_prod
BACKUP_SCHEDULE="0 2 * * *"
BACKUP_RETENTION_DAYS=90
PCLOUD_BACKUP_FOLDER=backup/wordpress
RCLONE_CONFIG=your_base64_encoded_config
If you install WordPress and/or the backup system, remove the _OPTIONAL
suffix from the file names.
WP backup info ---> docs/wp-backup.md Redis token implementation info ---> docs/oc-api-token.md
Fleet provides automated deployment capabilities through Git repository monitoring.
- Create a private Git repository for your production manifests
- Configure Fleet variables in
.env
:
PRIVATE_REPO_URL=https://github.com/your-org/your-repo
GIT_USERNAME=your-username
GIT_TOKEN=your-personal-access-token
- Navigate to Rancher UI → Continuous Delivery
- Create a new Git Repository with:
- Name: opencitations-fleet
- Repository URL: Your private repository URL
- Branch: main
- Paths: ./ (root directory)
- Target cluster: Your cluster name
- Create a Fleet configuration file
fleet.yaml
:
namespace: opencitations
targetCustomizations:
- name: production
clusterSelector:
matchLabels:
environment: production
- Apply the configuration:
kubectl apply -f fleet.yaml
Install Python dependencies:
pip3.11 install -r requirements.txt
The deployment script (deploy.py
) provides several options:
python3.11 ./deploy.py -i # Initialize infrastructure
python3.11 ./deploy.py -p # Preview manifest or preliminary files with variable substitution
python3.11 ./deploy.py -f # Create Fleet-ready production files
python3.11 ./deploy.py # Deploy all services
python3.11 ./deploy.py <manifests/0x-manifest.yaml> # Deploy a specific manifest file
The first time you deploy, you'll need to initialize the infrastructure by deploying all manifests in the preliminary
folder. Use the -i
switch with the deploy.py
script to do this:
./deploy.py -i
This will guide you through:
- Creating Kubernetes secrets
- Setting up MetalLB (if used)
- Configuring storage
- Installing Traefik
- Setting up the Traefik dashboard (optional)
python3.11 ./deploy.py -p <manifests/0x-manifest.yaml> # Preview how variables will be substituted
This allows you to verify your configuration before deployment by showing how environment variables will be substituted in your YAML files.
python3.11 ./deploy.py -f # For Fleet-managed deployments
This will process all manifests and push them to your Fleet repository.
To configure Fleet, use the section in .env.example:
PRIVATE_REPO_URL=https://github.com/username/private-repo.git
GIT_USERNAME=your-username
GIT_TOKEN=your-personal-access-token
If you've cloned the repository within a server in the Kubernetes cluster, you can deploy the manifests directly using these commands (remember to initialize the infrastructure first).
python3.11 ./deploy.py # Deploy all manifest
python3.11 ./deploy.py manifests/0x-manifest.yaml # Deploy a specific service
If you encounter issues during deployment:
- Check the logs of the deployment script
- Verify your environment variables in
.env
- Ensure all prerequisites are properly installed
- Check your Kubernetes cluster's status and connectivity
- Verify storage system accessibility
For more detailed troubleshooting, consult the Kubernetes and Fleet documentation.