The Digital Bunch - LLM Data Processing Task

Overview

A Node.js application for processing data using Large Language Models (LLM).

Prerequisites

Node.js 22 or higher
Docker
OpenAI API key

Assumptions

It will always spin up 4 thread workers.
Chosen LLM model gpt-4o-mini assuming that minor accuracy trade-offs in edge cases are acceptable due to the cost efficency which in production environment where millions of entries would be proccessed might be important - if not switch model in the code to gpt-4o

What could be upgraded?

For production environment it would be wise to upgrade script to respect rate limits from headers which we are receiving from OpenAI API.

Getting Started

1. Start the Redis

Launch Redis using Docker Compose:

docker compose up -d

2. Install Dependencies

npm install

3. Build the Project

npm run build

Usage

Process Data

Run the main application:

npm start

Generate Test Data

Generate custom test datasets:

# Generate data with specified number of entries per file
npm run generate-data -- -c <count>

Where <count> is the number of entries in each file (defaults to 1000).

Configuration

The application requires proper environment configuration to work correctly. See .env.tpl for required variables.

License

This project is licensed under the MIT License.

Author: Maciej Lisowski

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
src		src
.cursorrules		.cursorrules
.env.tpl		.env.tpl
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierrc		.prettierrc
README.md		README.md
docker-compose.yml		docker-compose.yml
example_report.csv		example_report.csv
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Digital Bunch - LLM Data Processing Task

Overview

Prerequisites

Assumptions

What could be upgraded?

Getting Started

1. Start the Redis

2. Install Dependencies

3. Build the Project

Usage

Process Data

Generate Test Data

Configuration

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mac-lisowski/thedigitalbunch-task

Folders and files

Latest commit

History

Repository files navigation

The Digital Bunch - LLM Data Processing Task

Overview

Prerequisites

Assumptions

What could be upgraded?

Getting Started

1. Start the Redis

2. Install Dependencies

3. Build the Project

Usage

Process Data

Generate Test Data

Configuration

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages