RB-DRUID-INDEXER

Simple distributed druid-indexer task manager for kafka ingestion

Overview

rb-druid-indexer is a cluster-compatible service designed to manage the indexing of Kafka data streams into Druid. It handles task announcements, generates configuration specification files, and submits tasks to the Druid Supervisor.

How rb-druid-indexer fits in our new indexing system or yours

In the old system, Druid indexing relied on ShardSpec with druid-realtime, where tasks were split into multiple shards across nodes for parallel processing. This approach, defined in static realtime spec files & hard-to-deploy nodes introduced complexity in shard management and scalability. In contrast, the new system uses the rb-druid-indexer, which simplifies the process by submitting single tasks without shard splitting to druid router wich automatically distribute task in druid indexer nodes and we leave overlord to manage balancing.

You can notice this fast with this diagram

Features

Multi Druid Router compatible
Auto Finder for Druid Routers
Cluster compatible & FailOver support using ZooKeeper
Automatic task managment and load balancing when submiting / deleting tasks

Configuration

The configuration for rb-druid-indexer is defined in a YAML file that is generated with the druid-indexer cookbook in /etc/rb-druid-indexer/config.yml. It includes settings for both Zookeeper, the tasks that should be executed, the dimensions and their metrics. Below is an example configuration file:

zookeeper_servers:
  - "rb-malvarez1.node:2181"
  - "rb-malvarez3.node:2181"
  - "rb-malvarez2.node:2181"
discovery_path: "/druid/discovery/druid:router"

tasks:
  - task_name: "rb_monitor"
    feed: "rb_monitor"
    spec: "rb_monitor"
    kafka_brokers:
      - "rb-malvarez1.node:9092"
    dimensions_exclusions:
        - "unit"
        - "type"
        - "value"
    metrics:
        - type: count
          name: events
        - type: doubleSum
          name: sum_value
          fieldName: value
        - type: doubleMax
          name: max_value
          fieldName: value
        - type: doubleMin
          name: min_value
          fieldName: value
  - task_name: "rb_flow"
    feed: "rb_flow_post"
    spec: "rb_flow"
    kafka_brokers:
      - "rb-malvarez1.node:9092"
      - "rb-malvarez3.node:9092"
      - "rb-malvarez2.node:9092"
    dimensions:
      - "application_id_name"
      - "building"
      - "building_uuid"
      - "campus"
      - "campus_uuid"
      - "client_accounting_type"
      - "client_auth_type"
      - "client_fullname"
      - "client_gender"
      - "client_id"
      - "client_latlong"
      - "client_loyality"
      - "client_mac"
      - "client_mac_vendor"
      - "client_rssi"
      - "client_vip"
      - "conversation"
      - "coordinates_map"
      - "deployment"
      - "deployment_uuid"
      - "direction"
    dimensions_exclusions:
      - "bytes"
      - "pkts"
      - "flow_end_reason"
      - "first_switched"
      - "wan_ip_name"
    metrics:
      - type: count
        name: events
      - type: longSum
        name: sum_bytes
        fieldName: bytes
      - type: longSum
        name: sum_pkts
        fieldName: pkts
      - type: longSum
        name: sum_rssi
        fieldName: client_rssi_num
      - type: hyperUnique
        name: clients
        fieldName: client_mac
      - type: hyperUnique
        name: wireless_stations
        fieldName: wireless_station
...

zookeeper_servers

Description: A list of Zookeeper servers used for leadership checks and coordination.
Type: Array of strings.
Example:
- "127.0.0.1:2181"
- "127.0.0.2:2181"

discovery_path

Description: (optional field) ZooKeeper path where Druid routers are announced
Type: String.
Example:
- "/druid/discovery/druid:router"

tasks

Description: A list of tasks to be managed by the indexer. Each task contains the following attributes:

task_name

Description: The name of the task. This is used to identify the task in the system.
Type: String.
Example:
- "rb_monitor"
- "rb_flow"

spec

Description: The spec file name associated with the task (for realtime configuration)
Type: String.
Example:
- "rb_flow"

feed

Description: The name of the Kafka feed associated with the task. This specifies which feed to listen to.
Type: String.
Example:
- "rb_monitor"
- "rb_flow_post"

kafka_brokers

Description: The list of kafka brokers for supervisor
Type: Array.
Example: kafka_brokers: - "kafka.service:9092"

dimensions

Description: The list of dimensions for the datasource
Type: Array.
Example: dimensions: - "lan_ip"

dimensions_exclusions

Description: The list of dimensions that will be excluded for the datasource
Type: Array.
Example: dimensions_exclusions: - "wan_ip"

metrics

Description: The list of metrics that will be used for the datasource
Type: Array of objects.
Example: metrics: - type: count name: events - type: longSum name: sum_bytes fieldName: bytes

Project Structure

rb-druid-indexer
├── LICENSE
├── Makefile
├── README.md
├── assets
│   ├── arch_img_new.png
│   ├── image.png
│   └── old_vs_new.png
├── config
│   ├── config.go
│   └── config_test.go
├── druid
│   ├── config
│   │   ├── config.go
│   │   └── config_test.go
│   ├── realtime.go
│   ├── realtime_test.go
│   ├── router.go
│   └── router_test.go
├── example_config.yml
├── go.mod
├── go.sum
├── integration
│   ├── config.yml
│   ├── docker-compose.yml
│   ├── environment
│   ├── rb_create_topics.sh
│   ├── rb_generate_compose.sh
│   ├── rb_produce_syn_data.sh
│   └── rb_run_integration_tests.sh
├── logger
│   ├── logger.go
│   └── logger_test.go
├── main.go
├── main_test.go
├── packaging
│   └── rpm
│       ├── Makefile
│       ├── rb-druid-indexer.service
│       └── rb-druid-indexer.spec
├── rb-druid-indexer
└── zkclient
    ├── client.go
    ├── client_test.go
    ├── election.go
    ├── election_test.go
    ├── task_announcer.go
    └── task_announcer_test.go

Getting Started

Prerequisites

Before getting started with rb-druid-indexer, ensure your runtime environment meets the following requirements:

Programming Language: Go
Package Manager: Go modules

Installation

Install rb-druid-indexer using one of the following methods:

Build from source:

Clone the rb-druid-indexer repository:

❯ git clone https://github.com/redBorder/rb-druid-indexer

Navigate to the project directory:

❯ cd rb-druid-indexer

Install the project dependencies:

Using go modules

❯ go build

Usage

Run rb-druid-indexer using the following command: Using go modules

❯ ./rb-druid-indexer --config example_config.yml

Contributing

💬 Join the Discussions: Share your insights, provide feedback, or ask questions.
🐛 Report Issues: Submit bugs found or log feature requests for the rb-druid-indexer project.
💡 Submit Pull Requests: Review open PRs, and submit your own PRs.

Contributing Guidelines

Fork the Repository: Start by forking the project repository to your github account.
Clone Locally: Clone the forked repository to your local machine using a git client.
```
git clone https://github.com/redBorder/rb-druid-indexer
```
Create a New Branch: Always work on a new branch, giving it a descriptive name.
```
git checkout -b new-feature-x
```
Make Your Changes: Develop and test your changes locally.
Commit Your Changes: Commit with a clear message describing your updates.
```
git commit -m 'Implemented new feature x.'
```
Push to github: Push the changes to your forked repository.
```
git push origin new-feature-x
```
Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!

Contributor Graph

License

This project is protected under the AGPL-3.0 License. For more details, refer to the LICENSE file.

Author

This project is developed for redBorder and the OS community by Miguel Álvarez malvarez@redborder.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RB-DRUID-INDEXER

Table of Contents

Overview

How rb-druid-indexer fits in our new indexing system or yours

Features

Configuration

zookeeper_servers

discovery_path

tasks

task_name

spec

feed

kafka_brokers

dimensions

dimensions_exclusions

metrics

Project Structure

Getting Started

Prerequisites

Installation

Usage

Contributing

License

Author

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 9

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 191 Commits
.github/workflows		.github/workflows
assets		assets
config		config
druid		druid
integration		integration
logger		logger
packaging/rpm		packaging/rpm
zkclient		zkclient
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
VERSION		VERSION
example_config.yml		example_config.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_test.go		main_test.go

License

redBorder/rb-druid-indexer

Folders and files

Latest commit

History

Repository files navigation

RB-DRUID-INDEXER

Table of Contents

Overview

How rb-druid-indexer fits in our new indexing system or yours

Features

Configuration

zookeeper_servers

discovery_path

tasks

task_name

spec

feed

kafka_brokers

dimensions

dimensions_exclusions

metrics

Project Structure

Getting Started

Prerequisites

Installation

Usage

Contributing

License

Author

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 9

Uh oh!

Languages

Packages