"Redefining how AI APIs communicate with the web — built from scratch in C and Assembly."
For detailed benchmark results comparing NeuroHTTP and NGINX, see benchmark.md
Below are the live screenshots from the actual benchmark runs.
Experience the raw performance and intelligence of NeuroHTTP in action.
This demo showcases NeuroHTTP — a high-performance, AI-driven web server built entirely in C and Assembly, redefining how AI APIs communicate with the web.
NeuroHTTP (codename: AIMux) is a next-generation web server purpose-built for the AI era.
Written entirely in C and Assembly, it delivers low-level performance, predictable latency, and extreme stability under heavy load — without relying on any external frameworks.
💡 Its mission is to redefine the networking layer for AI systems, enabling efficient handling of model streams (LLMs) and data-intensive APIs.
- 🧠 Real-time AI responses — token-by-token streaming for LLMs and chat models.
- ⚙️ Fast, low-overhead JSON processing optimized for inference workloads.
- ⚡ Concurrent model routing (AI multiplexing) for parallel AI endpoints.
- 🔌 Seamless integration with HTTP/3, WebSockets, and gRPC.
Goal: Build the world’s first AI-native web server — capable of real-time, high-throughput inference APIs with near-zero overhead.
📊 For detailed performance metrics, see benchmark.md.
- 🧠 First AI-native web server, built from scratch for intelligent workloads.
- ⚙️ Written in C + Assembly for unmatched performance and low latency.
- 🌍 Designed for the AI API era — scalable, modular, and open-source.
| Feature | Description |
|---|---|
| ⚡ Smart Threading | Dynamic load balancing for AI workloads. |
| 🧠 AI Stream Mode | Token-based real-time responses (HTTP/1.1, HTTP/3, WS). |
| 🧩 Fast JSON Parser | Assembly-optimized, SIMD-accelerated. |
| 🔐 API Keys & Quotas | Built-in auth and rate control. |
| 🛰️ gRPC / HTTP/3 | Modern, low-latency protocol support. |
| 🧰 C Plugin System | Extend core via loadable modules. |
| 📊 Live Metrics | Real-time latency and throughput stats. |
NeuroHTTP is a compact, low-level server core built entirely in C and Assembly,
focused on speed, control, and AI-native processing — no frameworks, no overhead.
| Layer | Purpose |
|---|---|
| 🧠 AI Router | Direct integration with AI models for smart request handling. |
| ⚙️ Worker Engine | Lightweight thread pool for parallel, CPU-bound tasks. |
| 🔒 Security Layer | Inline packet inspection and basic request filtering. |
| ⚡ Cache System | Fast in-memory cache with auto-expiry. |
| 🔌 Plugin Loader | Extend functionality via loadable C modules. |
🧠 Essence: A self-optimizing, AI-aware server core — minimal, fast, and built for the future.
The AIONIC AI Web Server is organized for modularity, performance, and clarity — following a clean separation between core, AI, ASM, and plugin layers.
/neurohttp
├── .github/ # GitHub Actions CI/CD workflows and automation
├── benchmarks/ # Performance benchmarking scripts and reports
│ └── benchmark.py # Python benchmark runner for latency and throughput tests
│
├── config/ # Configuration files
│ └── aionic.conf # Default server configuration (port, threads, cache, AI models)
│
├── docs/ # Technical documentation
│ ├── ARCHITECTURE.md # Detailed system design & module interactions
│ ├── PERFORMANCE.md # Performance tuning, memory footprint, benchmarks
│ └── ROADMAP.md # Planned features and development milestones
│
├── include/ # Public header files for all modules
│ ├── ai/ # AI-related headers
│ │ ├── prompt_router.h # AI model routing & dispatching layer
│ │ ├── stats.h # AI stats collection & monitoring
│ │ └── tokenizer.h # Tokenization and prompt pre-processing logic
│ ├── cache.h # In-memory caching system
│ ├── config.h # Server configuration loader
│ ├── firewall.h # Request filtering and security layer
│ ├── optimizer.h # Runtime performance optimizer
│ ├── parser.h # HTTP request parser (manual implementation in C)
│ ├── plugin.h # Plugin manager and shared library loader
│ ├── router.h # Core HTTP router and route dispatcher
│ ├── server.h # Main server core and worker thread manager
│ ├── stream.h # Streaming and chunked response system
│ ├── utils.h # Helper utilities (logging, timing, etc.)
│ └── utils.sh # Developer utility scripts
│
├── plugins/ # Dynamically loadable plugins (extensible features)
│ ├── limiter.c # Rate limiting / request throttling plugin
│ ├── logstats.c # Real-time log statistics collector
│ └── openai_proxy.c # Proxy integration for external AI APIs
│
├── src/ # Core source code
│ ├── ai/ # AI logic implementation
│ │ ├── prompt_router.c # Handles model selection and routing
│ │ ├── stats.c # Collects and exposes AI processing stats
│ │ └── tokenizer.c # Efficient tokenization engine
│ ├── asm/ # Assembly-optimized performance routines
│ │ ├── crc32.s # CRC32 checksum calculation (fast path)
│ │ ├── json_fast.s # Accelerated JSON parsing
│ │ └── memcpy_asm.s # Optimized memory copy routine
│ ├── cache.c # Caching implementation
│ ├── common.h # Common macros and type definitions
│ ├── config.c # Config file parser
│ ├── firewall.c # Firewall & security checks
│ ├── main.c # Entry point and main loop (server bootstrap)
│ ├── optimizer.c # Runtime performance optimizer logic
│ ├── parser.c # HTTP parser and header extractor
│ ├── plugin.c # Plugin loader and registry
│ ├── router.c # Core HTTP route resolution and dispatch
│ ├── server.c # Thread pool, socket management, and event loop
│ ├── stream.c # Streaming API and chunked response handler
│ └── utils.c # Utility functions (logging, timers, memory)
│
├── tests/ # Unit and integration tests
│ ├── test_json.c # Tests for JSON parser
│ ├── test_server.c # Core server tests
│ └── test_streaming.c # Streaming response validation
│
├── .gitignore # Ignored files and directories
├── CODE_OF_CONDUCT.md # Contributor behavior guidelines
├── CONTRIBUTING.md # How to contribute to AIONIC
├── LICENSE # Open-source license (MIT / custom)
├── Makefile # Build system (C + ASM compilation)
├── README.md # Main documentation and usage instructions
├── SECURITY.md # Security policy and vulnerability reporting
└── stats.json # Runtime stats snapshot (for debugging)- Language: C99 / C11
- Low-level optimizations: x86 / x86_64 Assembly
- Networking:
epoll(Linux) orlibuv - TLS:
mbedtlsorwolfSSL - gRPC support:
protobuf-c - Build tools:
make,cmake,clang
The first version focuses on simplicity and raw performance.
- Handles
HTTP POSTrequests at/v1/chat - Accepts JSON body containing
{ "prompt": "..." } - Responds with
{ "response": "Hello, AI world!" } - Supports chunked streaming responses
- Easily testable via
curl
curl http://localhost:8080
curl http://localhost:8080/health
curl http://localhost:8080/stats
curl -X POST http://localhost:8080/v1/chat -d '{"prompt":"Hello"}'
curl -X POST -H "Content-Type: application/json" -d '{"prompt":"Hello"}' http://localhost:8080/v1/chatExpected output:
{"response": "Hello, AI world!"}
Phase Description Phase 1 Core HTTP server with streaming responses Phase 2 WebSocket support for AI streaming Phase 3 Optimized C/ASM JSON Parser Phase 4 Modular Plug-in System for custom extensions Phase 5 Open-source release with detailed benchmarks vs Nginx
### Run:
wrk -t4 -c100 -d30s --latency http://localhost:8080/| Server | Requests/sec | Total Requests | Avg Latency | p50 | p75 | p90 | p99 | Max Latency | Transfer/sec |
|---|---|---|---|---|---|---|---|---|---|
| nginx | 6743 | 202,701 | 80ms | 11.79ms | 13.23ms | 218.75ms | 1.12s | 1.33s | 1.02MB |
| NeuroHTTP | 1621 | 48,762 | 61.22ms | 43.08ms | 100.72ms | 104.62ms | 114.27ms | 135.85ms | 4.94MB |
- The web was built for documents.
- Then came applications.
- Now it’s time for AI.
NeuroHTTP aims to redefine how AI models are served at scale, providing a native AI transport layer that’s fast, flexible, and open.
-
Running AI chat models with streaming responses (like GPT, Claude, Mistral)
-
Hosting LangChain or LLM orchestration pipelines
-
Serving gRPC-based AI inference APIs
-
Building multi-model routers for AI backends
Releasing NeuroHTTP on GitHub under the MIT License will attract:
Developer communities on Reddit, Hacker News, and GitHub
Early adoption by AI startups needing real-time serving
Collaboration similar to what happened with Caddy, Envoy, and Nginx
NeuroHTTP can be built directly from source on any Linux or Unix-like system.
Make sure you have the following installed:
- GCC / Clang
- Make
- libpthread, libssl, and zlib (for HTTP/3 and threading support)
# Clone the repository
git clone https://github.com/okba14/NeuroHTTP.git
# Navigate into the project directory
cd NeuroHTTP
# Build the project
make all
# Run the NeuroHTTP server
./bin/aionic| 🧠 Test | 💻 Command | 🧾 Result | ⭐ Rating | 📋 Notes |
|---|---|---|---|---|
| Root Endpoint | curl http://localhost:8080 |
Professional HTML page | ⭐⭐⭐⭐⭐ | Proper root redirect, integrated CSS, Server: AIONIC/1.0 |
| Health Check | curl http://localhost:8080/health |
Valid JSON response | ⭐⭐⭐⭐⭐ | Accurate health data, Content-Type: application/json |
| Statistics | curl http://localhost:8080/stats |
{"requests":0,"responses":0,"uptime":0,"active_connections":0,"timestamp":1760719653} |
⭐⭐⭐⭐⭐ | Real-time metrics with timestamp support |
| POST /v1/chat | curl -X POST -d '{"prompt":"Hello"}' http://localhost:8080/v1/chat |
Valid JSON | ⭐⭐⭐⭐⭐ | Auto JSON parsing, smart response formatting |
| POST /v1/chat (JSON header) | curl -X POST -H "Content-Type: application/json" -d '{"prompt":"Hello"}' http://localhost:8080/v1/chat |
{"response":"Hello! ...","model":"aionic-1.0","timestamp":1760719677} |
⭐⭐⭐⭐⭐ | Full Content-Type support, detailed metadata |
| 🧪 Test | 🧾 Result | ⭐ Rating | 📋 Notes |
|---|---|---|---|
| XML Accept Header | Returns JSON fallback | ⭐⭐⭐⭐ | Graceful degradation, safe fallback behavior |
| Message Variants | Consistent structured responses | ⭐⭐⭐⭐⭐ | Handles English, Arabic, long text & special chars (100% success) |
| Invalid JSON | HTML 400 Error page | ⭐⭐⭐⭐⭐ | JSON syntax error detected, styled error page with details |
| Unknown Routes | HTML 404 Error page | ⭐⭐⭐⭐⭐ | Consistent theme, redirect links to home |
| Concurrent Load (50 requests) | All requests processed successfully | ⭐⭐⭐⭐⭐ | Stable under load, zero request loss, consistent throughput |
| ⚙️ Module | 🧩 Log Snippet | ⭐ Rating | 📋 Notes |
|---|---|---|---|
| Optimizer | [OPTIMIZER] Performance degradation detected... |
⭐⭐⭐⭐⭐ | Automatic performance recovery, full CPU optimization cycle |
| JSON Processor | DEBUG: JSON parsed successfully using fast tokenizer |
⭐⭐⭐⭐⭐ | Custom fast tokenizer, zero parsing errors |
| Router System | DEBUG: Parsed request - Method: 1, Path: /v1/chat |
⭐⭐⭐⭐⭐ | Accurate routing, precise response length tracking, full HTTP method support |
Overall Rating: ⭐⭐⭐⭐⭐
Summary: AIONIC/1.0 demonstrates exceptional reliability, accurate routing, intelligent request handling, professional error recovery, and stable performance under concurrent load.
- ✅ Precise routing and response length calculation
- ✅ Full HTTP methods support (GET, POST, etc.)
- ✅ Intelligent JSON parsing with fast tokenizer
- ✅ Robust performance under pressure
- ✅ Elegant, consistent HTML error pages
Contributions are welcome! Whether you want to optimize Assembly routines, design the plugin API, or test benchmarks — your help is appreciated.
Fork the repository
Create a new branch (feature/your-feature)Submit a pull request
License: MIT — free for both commercial and academic use.
Credits: Built by GUIAR OQBA, with ❤️ from the open-source community.
👨💻 GUIAR OQBA 🇩🇿
Creator of NeuroHTTP — passionate about low-level performance, AI infrastructure, and modern web systems.
“Empowering the next generation of AI-native infrastructure — from Elkantara, Algeria.”
© 2025 GUIAR OQBA — All rights reserved.
If you believe in the vision of a fast, AI-native web layer, please ⭐ the repository and share it.
Every star fuels the open-source ecosystem and helps NeuroHTTP evolve. 🚀
💬 “Fast. Modular. AI-Native. — That’s NeuroHTTP.”
✨ Join the mission to redefine how the web talks to AI — one packet at a time.

