A Go-based anti-scraper tarpit inspired by Nepenthes.
The goal of Sarracenia is to serve as a defensive wall against web scrapers by generating endless, plausible-looking web pages, to prevent them from accessing protected content by keeping them in a loop of fake ones.
The core components of Sarracenia are standalone, reusable libraries in the /pkg
directory.
A high-performance library for training and using Markov chain models. It is feature-complete and includes a streaming API, database persistence, and advanced generation features.
➡️ View the full README for the Markov Library
The rendering engine for Sarracenia. It uses text from the Markov library and/or a wordlist to embed content within complex, randomized HTML structures.
➡️ View the full README for the Templating Library
There are four primary ways to install and run Sarracenia. After installation, proceed to the "Initial Setup" section, which is common to all methods.
This is the simplest method. It uses the pre-compiled binaries provided with each new version on GitHub.
Steps:
-
Download the Latest Release:
- Go to the Sarracenia Releases Page.
- Find the asset that matches your operating system and architecture (e.g.,
sarracenia-linux-amd64
,sarracenia-windows-amd64.exe
). - Download the binary.
-
Prepare Initial Data:
-
From the same releases page, download the
Source code (zip)
orSource code (tar.gz)
file. -
Extract the archive. You will need the contents of the
example
directory. -
Create your application directory and arrange the files as follows:
/your/sarracenia/folder/ ├── sarracenia # The binary you downloaded ├── config.json # Copied from the extracted 'example' folder └── data/ ├── dashboard/ │ ├── static/ │ └── templates/ ├── templates/ # Copied from 'example/data' │ └── ... └── wordlist.txt # Copied from 'example/data'
-
-
Run the Application:
- On Linux or macOS:
# Make the binary executable chmod +x ./sarracenia # Run the application ./sarracenia
- On Windows:
- Simply double-click the
sarracenia.exe
file or run it from the command prompt:
.\sarracenia.exe
- Simply double-click the
- On Linux or macOS:
This method is suitable for developers or for platforms where Docker or systemd are not available.
Prerequisites:
- Go 1.24.5 or later.
- Git for cloning the repository.
Steps:
-
Clone the Repository:
git clone https://github.com/CTAG07/Sarracenia.git cd Sarracenia
-
Build and Run:
# Build the application go build -o sarracenia ./cmd/main # Run the application ./sarracenia
On the first run, Sarracenia will automatically create a
config.json
and adata
directory with the necessary files by copying them from theexample
directory.
Pre-built Docker images are automatically published to the GitHub Container Registry (ghcr.io
) with each new release.
-
Create a
docker-compose.yml
file:# docker-compose.yml services: sarracenia: image: ghcr.io/ctag07/sarracenia:latest container_name: sarracenia restart: unless-stopped ports: - "7277:7277" # Tarpit server - "7278:7278" # Dashboard/API volumes: # Persists your database, config, and templates - sarracenia-data:/app/data volumes: sarracenia-data:
-
Start the Service:
docker compose up -d
For Linux servers, an installation script is provided to set up Sarracenia as a systemd service.
-
Clone the Repository:
git clone https://github.com/CTAG07/Sarracenia.git cd Sarracenia
-
Run the Installation Script: The script builds the binary, creates a dedicated user, and installs the application to
/opt/sarracenia
.sudo bash ./scripts/install_systemd.sh
-
Manage the Service:
- Check status:
systemctl status sarracenia.service
- View logs:
journalctl -u sarracenia.service -f
- Check status:
-
Access the Dashboard:
- By default, the tarpit runs on port
:7277
and the API/Dashboard runs on port:7278
. - Open a web browser and navigate to
http://localhost:7278
.
- By default, the tarpit runs on port
-
Create the Master API Key:
- The first time you access the dashboard, the API is unsecured.
- Navigate to the API Keys page.
- Create your first key. This key will automatically be assigned the master (
*
) scope, giving it full permissions. - Copy this key immediately. It will not be shown again.
- Once created, all API endpoints (including the dashboard) will be secured.
Sarracenia is configured using a config.json
file in the same directory as the executable.
Key | Description | Default |
---|---|---|
server_addr |
Address for the tarpit server to listen on. | :7277 |
api_addr |
Address for the API and dashboard server. | :7278 |
log_level |
Logging level (debug , info , warn , error ). |
info |
data_dir |
Path to the data directory. | ./data |
database_path |
Path to the SQLite database file. | ./data/sarracenia.db?_journal_mode=WAL |
dashboard_tmpl_path |
Path to the dashboard GoHTML template files. | ./data/dashboard/templates/ |
dashboard_static_path |
Path to the dashboard static assets (CSS, JS). | ./data/dashboard/static/ |
enabled_templates |
A list of .tmpl.html files to use for the tarpit. If empty, a random template is chosen. |
["page.tmpl.html"] |
tarpit_config |
Settings for response delaying (see below). |
tarpit_config
object:
Key | Description | Default |
---|---|---|
enable_drip_feed |
If true, sends the response in slow chunks. | false |
initial_delay_ms |
Time to wait before sending the first byte. | 0 |
drip_feed_delay_ms |
Time to wait between sending chunks. | 500 |
drip_feed_chunks |
Number of chunks to split the response into. | 10 |
This object configures the templating engine. See the full documentation in
pkg/templating/README.md
.
This object configures the threat assessment system, allowing you to control how aggressively the tarpit responds based on client behavior.
Key | Description | Default |
---|---|---|
base_threat |
The starting threat score for any request. | 0 |
ip_hit_factor |
Value added to score for each hit from an IP. | 1.0 |
ua_hit_factor |
Value added to score for each hit from a User Agent. | 0.5 |
ip_hit_rate_factor |
Multiplier for hits-per-minute from an IP. | 10.0 |
ua_hit_rate_factor |
Multiplier for hits-per-minute from a User Agent. | 5.0 |
max_threat |
The absolute maximum threat score. | 1000 |
fallback_level |
The threat stage (0-4) to use if no other threshold is met. | 0 |
stages |
Defines the score thresholds for each threat stage. |
All API endpoints are prefixed with /api
and require an API key sent in the sarr-auth
header.
Method | Endpoint | Scope | Description |
---|---|---|---|
GET |
/api/auth/me |
any | Checks the validity of the current key and returns its scopes. |
GET |
/api/auth/keys |
auth:manage |
Lists all API keys (without the raw key). |
POST |
/api/auth/keys |
auth:manage |
Creates a new API key. The first key created is always a master key. |
DELETE |
/api/auth/keys/{id} |
auth:manage |
Deletes an API key by its ID. The master key (ID 1) cannot be deleted. |
Method | Endpoint | Scope | Description |
---|---|---|---|
GET |
/api/markov/models |
markov:read |
Lists all available Markov models and their info. |
POST |
/api/markov/models |
markov:write |
Creates a new, empty Markov model. |
DELETE |
/api/markov/models/{name} |
markov:write |
Deletes a model and all its data. |
POST |
/api/markov/models/{name}/train |
markov:write |
Trains a model with a plain text corpus file in the request body. |
POST |
/api/markov/models/{name}/prune |
markov:write |
Prunes a model's chain data based on a minimum frequency. |
GET |
/api/markov/models/{name}/export |
markov:read |
Exports a model as a JSON file. |
POST |
/api/markov/models/{name}/generate |
markov:read |
Generates text from a given markov model with given params |
POST |
/api/markov/import |
markov:write |
Imports a model from a JSON file in the request body. |
POST |
/api/markov/vocabulary/prune |
markov:write |
Prunes the global vocabulary of rare tokens across all models. |
GET |
/api/markov/training/status |
markov:read |
Gives a json response indicating whether training is occurring, and if so, what model is being trained |
Method | Endpoint | Scope | Description |
---|---|---|---|
GET |
/api/health |
none | Unauthenticated health check endpoint. |
GET |
/api/server/version |
stats:read |
Returns the application's build version, commit, and date. |
GET |
/api/server/config |
server:config |
Retrieves the current config.json . |
PUT |
/api/server/config |
server:config |
Updates and saves the config.json . |
POST |
/api/server/restart |
server:control |
Initiates a graceful server restart. |
POST |
/api/server/shutdown |
server:control |
Initiates a graceful server shutdown. |
Method | Endpoint | Scope | Description |
---|---|---|---|
GET |
/api/stats/summary |
stats:read |
Gets a high-level summary of total requests, unique IPs, etc. |
GET |
/api/stats/top_ips |
stats:read |
Lists the top 100 most frequent IP addresses. |
GET |
/api/stats/top_user_agents |
stats:read |
Lists the top 100 most frequent User Agents. |
DELETE |
/api/stats/all |
server:control |
Deletes all collected statistics. |
Method | Endpoint | Scope | Description |
---|---|---|---|
GET |
/api/templates |
templates:read |
Lists the names of all loaded template and partial files. |
GET |
/api/templates/{name} |
templates:read |
Gets the raw content of a template file. |
PUT |
/api/templates/{name} |
templates:write |
Updates or creates a template file with the request body content. |
DELETE |
/api/templates/{name} |
templates:write |
Deletes a template file. |
POST |
/api/templates/refresh |
templates:write |
Manually reloads all templates from disk. |
POST |
/api/templates/test |
templates:read |
Tests template syntax from the request body without saving. |
GET |
/api/templates/preview |
templates:read |
Renders a preview of a saved template with a given threat level. |
Method | Endpoint | Scope | Description |
---|---|---|---|
GET |
/api/whitelist/ip |
whitelist:read |
Lists all whitelisted IP addresses. |
POST |
/api/whitelist/ip |
whitelist:write |
Adds an IP address to the whitelist. |
DELETE |
/api/whitelist/ip |
whitelist:write |
Removes an IP address from the whitelist. |
GET |
/api/whitelist/useragent |
whitelist:read |
Lists all whitelisted User Agents. |
POST |
/api/whitelist/useragent |
whitelist:write |
Adds a User Agent to the whitelist. |
DELETE |
/api/whitelist/useragent |
whitelist:write |
Removes a User Agent from the whitelist. |
This project is licensed under the AGPLv3.
Alternative Licensing
I understand that the AGPL-3.0 may not be suitable for all users. If you would like to use this project in a way not permitted by the AGPL-3.0 (for instance, in a closed-source application), I am happy to grant you a free, permissive license (such as the MIT License).
Please contact me at 82781942+CTAG07@users.noreply.github.com
to request an alternative license.