Line Server System Analysis

How does your system work?

The system is a distributed line server designed for scalable and efficient access to specific lines of a large text file.

1. Architecture

Built in Python 3.13 using the aiohttp framework for asynchronous HTTP handling.
Follows a microservices architecture, with NGINX as a load balancer.
Each service instance handles a range of line numbers (e.g., lines 1–100, 101–200).
Index files store byte offsets where each line begins.
Each index entry is 13 bytes: 12 bytes for the byte offset and 1 for \n.
Knowing the fixed entry size allows calculation of the offset in the index file for a given line.
The offset is then used to retrieve the corresponding line from the main file.

2. Components

FileIndexRepository: Maps line numbers to byte offsets using the index file.
FileRepository: Reads from the main data file using those offsets.
NGINX: Routes requests to the appropriate service shard based on the line number.
Dockerized for ease of deployment and scaling.

3. Data Storage

Index file allows O(1) access to any line.
The main file is read using offsets from the index file.
Files are sharded and assigned to instances based on line number ranges.

Performance Analysis

File Size Performance

Tested on 5GB and 10GB files using Locust for load testing with 1,000+ concurrent connections to a single instance.

Average latency: ~50ms
Test environment: MacBook Air M1, 16GB RAM, 512GB SSD

The system is I/O-bound and performs well due to efficient byte-offset access.

Concurrent Users Performance

100 Users

Easily handled by a single instance.

10,000 Users

Requires more workers per shard.
One NGINX instance should suffice for routing.

1,000,000 Users

Requires horizontal scaling: more service replicas and potentially multiple file copies.
Use DNS or GEO-based routing for geographic distribution.
Add monitoring and auto-scaling for reliability.

Technical Choices

Libraries and Tools

aiohttp: Async server framework, ideal for non-blocking I/O.
NGINX: Reverse proxy and load balancer, enables efficient routing and fault tolerance.
Docker: Simplifies deployment, scalability, and consistency across environments.
Pydantic: Provides type-safe settings management and validation.

Resources Consulted

aiohttp official documentation
Articles on file indexing strategies
Benchmarking practices for high-throughput systems

Development Time

Planning & Design: ~3 hours
Implementation: ~5 hours
Testing & Optimization: ~4 hours
Documentation: ~2 hours
Total: ~14–16 hours

Future Improvements

Given unlimited time, I would:

Add support for streaming responses to improve performance with large responses.
Improve test coverage, including load and edge cases.
Support reading from the end of the file, which might be more efficient in some cases.

Self Review

Areas for Improvement

Add structured logging for observability.
Improve automated test suite.
Optimize end-of-file access paths.
Improve fault tolerance in case of file corruption or shard failure.

Local Installation

Prerequisites

Install Python 3.13:

# On macOS using Homebrew
brew install python@3.13

# Verify installation
python3.13 --version

Setting up Development Environment

Create and activate a virtual environment:

# Create virtual environment
python3.13 -m venv venv

# Activate virtual environment
source venv/bin/activate  # On Unix/macOS

Install dependencies:

make install

Running Tests

make test

Testing Instructions

Running the Service

Clone the project and ensure Docker is installed.
Run the service:

./run.sh

The service starts at http://localhost:8001, serving lines 1 to 100.

Testing with Your Own Data

Prepare Data:
- Place your text file in the LOCAL_PROD_DATA directory.
- Ensure each line ends with \n.
Generate Index File:

python commands/create_index_file.py LOCAL_PROD_DATA/your_data_file.txt

This creates your_data_file.txt_index.txt.

Configure Docker: Update docker-compose-single-instance.yml:

- INDEX_FILE_PATH=/app/LOCAL_PROD_DATA/your_data_file.txt_index.txt
- INDEX_FIRST_LINE_NUMBER=1
- INDEX_LAST_LINE_NUMBER=<your_max_line_number>
- FILE_REPOSITORY_FILE_PATH=/app/LOCAL_PROD_DATA/your_data_file.txt

Start the Service:

./run.sh

Test Endpoint:

curl http://localhost:8001/lines/1

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LOCAL_DATA		LOCAL_DATA
LOCAL_PROD_DATA		LOCAL_PROD_DATA
commands		commands
docker		docker
file_api_app		file_api_app
tests		tests
.dockerignore		.dockerignore
.env-local		.env-local
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Line Server System Analysis

How does your system work?

1. Architecture

2. Components

3. Data Storage

Performance Analysis

File Size Performance

Concurrent Users Performance

100 Users

10,000 Users

1,000,000 Users

Technical Choices

Libraries and Tools

Resources Consulted

Development Time

Future Improvements

Self Review

Areas for Improvement

Local Installation

Prerequisites

Setting up Development Environment

Running Tests

Testing Instructions

Running the Service

Testing with Your Own Data

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ModestKamen/salsify-file-app

Folders and files

Latest commit

History

Repository files navigation

Line Server System Analysis

How does your system work?

1. Architecture

2. Components

3. Data Storage

Performance Analysis

File Size Performance

Concurrent Users Performance

100 Users

10,000 Users

1,000,000 Users

Technical Choices

Libraries and Tools

Resources Consulted

Development Time

Future Improvements

Self Review

Areas for Improvement

Local Installation

Prerequisites

Setting up Development Environment

Running Tests

Testing Instructions

Running the Service

Testing with Your Own Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages