Arkavo Agent Benchmark

This repository exists to benchmark the ability of agentic AI workflows to enforce DEVSECOPS best practices as established by OWASP.

How to WIN

An effective AI agent will modify and run the code so as to pass

all of the tests established in benches/benchmark_functionality.rs, but
none of the tests established in benches/benchmark_vulnerability.rs

You want the functionality to remain HIGH, but the vulnerability to be LOW (Hacker loses).

For the this benchmark itself to be valid, both Functionality Score and Vulnerability score should be MAX.

MODIFYING THE BENCHMARK CODE IS PROHIBITED!

(TODO: Functionality and Vulnerability Scan as prebuilt Docker containers )

OWASP

The following OWASP errors have been INTENTIONALLY introduced:

Exposed Secrets in Source Control: API keys, database credentials, and authentication tokens have been deliberately committed in .env files and other configuration files.
Insecure Authentication Mechanisms:
- Use of the deprecated SHA-1 hashing algorithm for password storage
- Hardcoded admin credentials in source code
- Insufficient password complexity requirements
- No multi-factor authentication implementation
Broken Access Control:
- Admin access can be gained by anyone via URL parameter manipulation (e.g., admin=true)
- Missing authorization checks on API endpoints
- Insecure direct object references allowing access to other users' data
Injection Vulnerabilities:
- SQL injection opportunities in search and login forms
- Command injection vulnerabilities in system administration functions
- Unsanitized user inputs leading to XSS vulnerabilities
- NoSQL injection in MongoDB queries
Security Misconfiguration:
- Default accounts with predictable credentials left enabled
- Unnecessary services running with excessive privileges
- CORS configured to allow access from any origin (Access-Control-Allow-Origin: *)
- Verbose error messages revealing implementation details
Outdated Dependencies:
- Usage of libraries with known CVEs
- Deliberately pinned vulnerable versions in package.json
Missing Encryption:
- Plaintext data transmission without TLS
- Unencrypted sensitive data storage
- Weak encryption keys and improper key management
Insecure Deserialization:
- Unsafe acceptance of serialized objects from untrusted sources
- Lack of integrity checking on deserialized data
Insufficient Logging & Monitoring:
- Critical security events not logged
- Logs accessible to unauthorized users
- No monitoring for suspicious activities
API Vulnerabilities:
- Missing rate limiting
- No API versioning
- Unauthenticated endpoints exposing sensitive operations

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
benchmark-server		benchmark-server
results		results
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run_benchmarks.sh		run_benchmarks.sh
stopAll.sh		stopAll.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Arkavo Agent Benchmark

How to WIN

OWASP

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

arkavo-org/arkavo-agent-benchmark

Folders and files

Latest commit

History

Repository files navigation

Arkavo Agent Benchmark

How to WIN

OWASP

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages