Skip to content

roym44/large-scale-app

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Distributed Web Crawler & Cache System

A comprehensive distributed, service-oriented system demonstrating key concepts in large-scale distributed computing including Remote Procedure Call (RPC), Service Discovery, Distributed Caching, and Message Queuing.

πŸ—οΈ System Architecture

The system consists of three core services that work together to form a robust distributed architecture:

Core Services

  • Registry Service - Service discovery and health monitoring

    • Manages service registration and discovery
    • Performs health checks every 10 seconds
    • Automatically unregisters failed services after 3 retry attempts
    • Uses Chord Distributed Hash Table for data storage
  • Cache Service - Distributed in-memory caching

    • Provides distributed key-value storage
    • Implements Chord DHT for data distribution and replication
    • Supports high availability and fault tolerance
    • Integrates with Registry Service for discovery
  • Test Service (Web Crawler) - Application layer with async capabilities

    • Extracts links from URLs
    • Demonstrates basic distributed application functionality
    • Supports asynchronous message processing via ZeroMQ
    • Provides client-side testing capabilities
    • Integrates with both Registry and Cache services

Communication Protocols

  • gRPC - Primary communication between services
  • ZeroMQ - Asynchronous message queuing (Test Service)
  • Chord DHT - Distributed hash table for data storage

System Architecture

πŸš€ Quick Start

Prerequisites

  • Visual Studio Code with Dev Container extension
  • Docker (for containerized development environment)
  • Git (for cloning the repository)

Development Environment Setup

  1. Clone the repository:

    git clone git@github.com:TAULargeScaleWorkshop/RLAD.git
  2. Open in VS Code:

    • Open VS Code
    • Go to File β†’ Open Workspace from File...
    • Select the large-scale-workshop.code-workspace file
    • Click Reopen in Container

    The project will be mounted at /workspaces/RLAD/

    ⚠️ Important: Ensure the project folder is named RLAD. Using any other folder name may cause configuration and file path issues.

Building the System

  1. Install dependencies and build:

    ./build.sh

    This script will:

    • Install Python dependencies (beautifulsoup4, requests)
    • Fix OpenJDK configuration
    • Build the Go application
    • Create output directories
  2. Start all services:

    ./output/start.sh

    This launches:

    • 3 Registry Service instances (1 root + 2 replicas)
    • 3 Cache Service instances (1 root + 2 replicas)
    • 3 Test Service instances

    Wait for the "APP READY" message to confirm all services are running.

πŸ“ Project Structure

RLAD/
β”œβ”€β”€ config/                 # Configuration definitions
β”œβ”€β”€ services/              # Core service implementations
β”‚   β”œβ”€β”€ reg-service/       # Registry Service
β”‚   β”œβ”€β”€ cache-service/     # Cache Service
β”‚   └── test-service/      # Test Service
β”œβ”€β”€ interop/               # Interoperability components
β”œβ”€β”€ utils/                 # Utilities and scripts
β”œβ”€β”€ output/                # Build outputs and logs
β”œβ”€β”€ main.go               # Application entry point
β”œβ”€β”€ build.sh              # Build script
└── README.md             # This file

πŸ§ͺ Testing

Each service includes comprehensive unit tests and client-side testing capabilities:

Running Tests

# Test Service
cd services/test-service/client/
go test -v

# Cache Service  
cd services/cache-service/client/
go test -v

# Registry Service
cd services/reg-service/client/
go test -v

# Chord DHT Implementation
cd services/reg-service/servant/dht/
go test -v

Test Coverage

  • Service Clients - Test service communication and API functionality
  • Chord DHT - Test distributed hash table operations
  • Integration Tests - Test inter-service communication
  • Async Message Processing - Test ZeroMQ integration

πŸ”§ Configuration

Services are configured using YAML files located in services/{service-name}/service/:

Example Configurations

Registry Service Root:

type: "RegService"
listen_port: 8502
name: root

Cache Service:

type: "CacheService"
registry_addresses:
  - "127.0.0.1:8502"
  - "127.0.0.1:8503"
  - "127.0.0.1:8504"
name: root

πŸ“Š Monitoring and Logs

  • Logs Location: ./output/logs/
  • Service Health: Registry Service monitors all services every 10 seconds
  • Process Management: Use ps -ao pid= | xargs kill to stop all instances

πŸ› οΈ Development

Adding New Services

  1. Create service directory in services/
  2. Implement service interface
  3. Add configuration YAML
  4. Update main.go switch statement
  5. Add to start.sh script

Key Technologies

  • Go 1.22.2 - Primary language
  • gRPC - Service communication
  • ZeroMQ - Message queuing
  • Chord DHT - Distributed storage
  • YAML - Configuration
  • MetaFFI - Language interoperability

πŸ› Troubleshooting

Common Issues

  1. Build Failures:

    • Ensure you're in the Dev Container environment
    • Check that the folder is named RLAD
    • Verify Docker is running
  2. Service Startup Issues:

    • Check logs in ./output/logs/
    • Ensure ports are available
    • Verify configuration files
  3. Network Issues:

    • Check service discovery via Registry Service
    • Verify gRPC connections
    • Ensure ZeroMQ ports are accessible

Debug Commands

# Check running processes
ps aux | grep large-scale-workshop

# View service logs
tail -f ./output/logs/RegService1_root.log

# Kill all services
ps -ao pid= | xargs kill

# Check port usage
netstat -tulpn | grep :850

πŸ“š Additional Resources

Important: Please ensure that the project folder is named RLAD. Using any other folder name may cause issues with the system's configuration and file paths.

Usage

Building

In the root directory run ./build.sh to install required dependencies and build the app to ./output.

Running

Run the app using ./output/start.sh to start 3 services of each type: Registry, Cache and Test.


Note: This system is designed for educational purposes and demonstrates key concepts in distributed systems. For production use, additional considerations for security, monitoring, and scalability would be required.

About

A distributed web crawler and cache system, service-oriented architecture.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •