Home

Pinot

Pinot is a realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). At LinkedIn, it powers dozens of both internal and customer-facing analytical applications, such as profile and page views, with interactive-level response times.

Introduction

In many analytical applications, having a low latency between events occurring and them being queryable opens new possibilities for data analysis. Pinot allows near real time ingestion of events through Kafka as well as batch processing through Hadoop. Because data sizes and query rates can vary a lot between applications, Pinot is designed to scale horizontally and query data sets with billions of rows with sub second latency.

Pinot is queried through the Pinot Query Language (PQL), which is a query language similar to SQL. For example, the following query behaves as one would normally expect:

SELECT SUM(saleValue) FROM sales
  WHERE year BETWEEN 2012 AND 2015
    AND quarter = 'Q4'
  GROUP BY region, department

What is it for (and not)?

Pinot is well suited for analytical use cases on immutable append-only data that require low latency between an event being ingested and it being available to be queried. Because of the design choices to achieve these goals, there are certain limitations present in Pinot:

Key Features

A column-oriented database
Pluggable indexing mechanism - Sorted Index, Bitmap Index, Posting List based Inverted Index
Near real time ingestion from Kafka and batch ingestion from Hadoop
SQL like language that supports Selection,Aggregation, Filtering, Group By, Order By, Distinct queries on fact data.
Support for Multivalue fields
Horizontally scalable and Fault tolerant

What is not supported

Pinot cannot be treated as a regular database i.e Not a source of truth, No support to mutate data
Full text search
User defined functions

Pinot works very well for querying time series data with lots of Dimensions and Metrics. Example - Query (profile views, ad campaign performance, etc.) in an analytical fashion (who viewed this profile in the last weeks, how many ads were clicked per campaign).

Key Terminology

Before we get to quickstart, lets go over the terminology.

Table: A table is a logical abstraction to refer to a collection of related data. It consists of columns and rows (Document). Table Schema defines column names and their metadata.
Segment: A logical table is divided into multiple physical units refered to as segments.

Architecture

Pinot has the following components:

Pinot Controller: Manages the nodes in the cluster. Leverages Apache Helix. Responsibilities :- -- Handles all Create, Update, Delete operations on Tables and Segments. -- Computes assignment of Table and its segments to Pinot Servers.
Pinot Server: Hosts one or more physical segments. Responsibilities: - -- When assigned a pre created segment, download it and load it. If assigned a Kafka topic, start consuming from a sub set of partitions in Kafka. -- Process queries forwared by Pinot Broker and return the response to Pinot Broker. Pinot Broker: Accepts queries from clients and routes them to multiple servers (based of routing strategy) and merges the response from various servers before sending it to the clients

[Insert Image here]

Quickstart

In this quickstart, we will load BaseBall stats from 1878 to 2013 into Pinot and run queries against it. The baseball data contains 100000 records and 15 columns.

Step 1: Install Pinot

Build from code

git clone https://github.com/linkedin/pinot.git
cd pinot
mvn install package  -DskipTests

Download tarball

wget [insert link here]
tar -xzf pinot-*-pkg.tar.gz

Step 2: Run quickstart

Execute the quickstart script in bin folder which performs the following:

Converts the Baseball data in csv into Pinot data format.

Starts Pinot components, Zookeeper, Controller, Broker, Server.

Uploads the segment to Pinot

cd bin
./quickstart.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

Pinot

Introduction

What is it for (and not)?

Key Features

What is not supported

Key Terminology

Architecture

Quickstart

Step 1: Install Pinot

Build from code

Download tarball

Step 2: Run quickstart

Converts the Baseball data in csv into Pinot data format.

Starts Pinot components, Zookeeper, Controller, Broker, Server.

Uploads the segment to Pinot

Step 3: Pinot Data Explorer

Documentation

Pinot Documentation

Pinot Administration

Contributor

Design Docs

License

Uh oh!

Clone this wiki locally