Skip to content

[Phase 1] Rework Data Prepper section #9976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
160 changes: 0 additions & 160 deletions _data-prepper/getting-started.md

This file was deleted.

20 changes: 20 additions & 0 deletions _data-prepper/getting-started/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
layout: default
title: Concepts
nav_order: 10
grand_parent: OpenSearch Data Prepper
parent: Getting started with OpenSearch Data Prepper
---

# Key concepts and fundamentals

Data Prepper ingests data through customizable [pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). These pipelines consist of pluggable components that you can customize to fit your needs, even allowing you to plug in your own implementations. A Data Prepper pipeline consists of the following components:

- One [source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/sources/)
- One or more [sinks]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/sinks/)
- (Optional) One [buffer]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/buffers/buffers/)
- (Optional) One or more [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/)

Each pipeline contains two required components: `source` and `sink`. If a `buffer`, a `processor`, or both are missing from the pipeline, then Data Prepper uses the default `bounded_blocking` buffer and a no-op processor. Note that a single instance of Data Prepper can have one or more pipelines.

<!----Add additional concepts here---->
27 changes: 27 additions & 0 deletions _data-prepper/getting-started/getting-started.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
layout: default
title: Getting started with OpenSearch Data Prepper
nav_order: 5
has_children: true
has_toc: false
redirect_from:
- /clients/data-prepper/get-started/
- /data-prepper/getting-started/
items:
- heading: "Understand key concepts"
description: "Learn about the core components and architecture of Data Prepper."
link: "/data-prepper/getting-started/concepts/"
- heading: "Install and configure Data Prepper"
description: "Set up Data Prepper for your environment and configure basic settings."
link: "/data-prepper/getting-started/install-and-configure/"
- heading: "Run Data Prepper"
description: "Start the service and verify that Data Prepper is running correctly."
link: "/data-prepper/getting-started/run-data-prepper/"
---

# Getting started with OpenSearch Data Prepper

This section provides the foundational steps for using OpenSearch Data Prepper. It covers the initial setup, introduces core concepts, and guides you through creating and managing Data Prepper pipelines. Whether your focus is on log collection, trace analysis, or specific use cases, these resources will help you begin working effectively with Data Prepper.

{% include list.html list_items=page.items%}

68 changes: 68 additions & 0 deletions _data-prepper/getting-started/install-and-configure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
layout: default
title: Install and configure OpenSearch Data Prepper
nav_order: 10
grand_parent: OpenSearch Data Prepper
parent: Getting started with OpenSearch Data Prepper
---

# Install and configure OpenSearch Data Prepper

This page guides you through the process of installing and configuring OpenSearch Data Prepper. You can install Data Prepper using a pre-built Docker image or by building the project from source, depending on your environment and requirements.

After installation, you must configure a set of required files that define how Data Prepper runs and processes data. This includes specifying pipeline definitions, server settings, and optional logging configurations. Configuration details vary slightly depending on the version you are using.

Use this guide to prepare your environment and set up Data Prepper for trace analytics, log ingestion, or other supported use cases.

## 1. Installing Data Prepper

There are two ways to install Data Prepper: you can run the Docker image or build from source.

The easiest way to use Data Prepper is by running the Docker image. We suggest that you use this approach if you have [Docker](https://www.docker.com) available. Run the following command:

```
docker pull opensearchproject/data-prepper:latest
```
{% include copy.html %}

If you have special requirements that require you to build from source, or if you want to contribute, see the [Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md).

## 2. Configuring Data Prepper

Two configuration files are required to run a Data Prepper instance. Optionally, you can configure a Log4j 2 configuration file. See [Configuring Log4j]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-log4j/) for more information. The following list describes the purpose of each configuration file:

* `pipelines.yaml`: This file describes which data pipelines to run, including sources, processors, and sinks.
* `data-prepper-config.yaml`: This file contains Data Prepper server settings that allow you to interact with exposed Data Prepper server APIs.
* `log4j2-rolling.properties` (optional): This file contains Log4j 2 configuration options and can be a JSON, YAML, XML, or .properties file type.

For Data Prepper versions earlier than 2.0, the `.jar` file expects the pipeline configuration file path to be followed by the server configuration file path. See the following configuration path example:

```
java -jar data-prepper-core-$VERSION.jar pipelines.yaml data-prepper-config.yaml
```

Optionally, you can add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command to pass a custom Log4j 2 configuration file. If you don't provide a properties file, Data Prepper defaults to the `log4j2.properties` file in the `shared-config` directory.


Starting with Data Prepper 2.0, you can launch Data Prepper by using the following `data-prepper` script that does not require any additional command line arguments:

```
bin/data-prepper
```

Configuration files are read from specific subdirectories in the application's home directory:
1. `pipelines/`: Used for pipeline configurations. Pipeline configurations can be written in one or more YAML files.
2. `config/data-prepper-config.yaml`: Used for the Data Prepper server configuration.

You can supply your own pipeline configuration file path followed by the server configuration file path. However, this method will not be supported in a future release. See the following example:

```
bin/data-prepper pipelines.yaml data-prepper-config.yaml
```

The Log4j 2 configuration file is read from the `config/log4j2.properties` file located in the application's home directory.

To configure Data Prepper, see the following information for each use case:

* [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/): Learn how to collect trace data and customize a pipeline that ingests and transforms that data.
* [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/): Learn how to set up Data Prepper for log observability.
Loading