diff --git a/_data-prepper/getting-started.md b/_data-prepper/getting-started.md deleted file mode 100644 index 5dc90316d0f..00000000000 --- a/_data-prepper/getting-started.md +++ /dev/null @@ -1,160 +0,0 @@ ---- -layout: default -title: Getting started with OpenSearch Data Prepper -nav_order: 5 -redirect_from: - - /clients/data-prepper/get-started/ ---- - -# Getting started with OpenSearch Data Prepper - -OpenSearch Data Prepper is an independent component, not an OpenSearch plugin, that converts data for use with OpenSearch. It's not bundled with the all-in-one OpenSearch installation packages. - -If you are migrating from Open Distro Data Prepper, see [Migrating from Open Distro]({{site.url}}{{site.baseurl}}/data-prepper/migrate-open-distro/). -{: .note} - -## 1. Installing Data Prepper - -There are two ways to install Data Prepper: you can run the Docker image or build from source. - -The easiest way to use Data Prepper is by running the Docker image. We suggest that you use this approach if you have [Docker](https://www.docker.com) available. Run the following command: - -``` -docker pull opensearchproject/data-prepper:latest -``` -{% include copy.html %} - -If you have special requirements that require you to build from source, or if you want to contribute, see the [Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md). - -## 2. Configuring Data Prepper - -Two configuration files are required to run a Data Prepper instance. Optionally, you can configure a Log4j 2 configuration file. See [Configuring Log4j]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-log4j/) for more information. The following list describes the purpose of each configuration file: - -* `pipelines.yaml`: This file describes which data pipelines to run, including sources, processors, and sinks. -* `data-prepper-config.yaml`: This file contains Data Prepper server settings that allow you to interact with exposed Data Prepper server APIs. -* `log4j2-rolling.properties` (optional): This file contains Log4j 2 configuration options and can be a JSON, YAML, XML, or .properties file type. - -For Data Prepper versions earlier than 2.0, the `.jar` file expects the pipeline configuration file path to be followed by the server configuration file path. See the following configuration path example: - -``` -java -jar data-prepper-core-$VERSION.jar pipelines.yaml data-prepper-config.yaml -``` - -Optionally, you can add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command to pass a custom Log4j 2 configuration file. If you don't provide a properties file, Data Prepper defaults to the `log4j2.properties` file in the `shared-config` directory. - - -Starting with Data Prepper 2.0, you can launch Data Prepper by using the following `data-prepper` script that does not require any additional command line arguments: - -``` -bin/data-prepper -``` - -Configuration files are read from specific subdirectories in the application's home directory: -1. `pipelines/`: Used for pipeline configurations. Pipeline configurations can be written in one or more YAML files. -2. `config/data-prepper-config.yaml`: Used for the Data Prepper server configuration. - -You can supply your own pipeline configuration file path followed by the server configuration file path. However, this method will not be supported in a future release. See the following example: -``` -bin/data-prepper pipelines.yaml data-prepper-config.yaml -``` - -The Log4j 2 configuration file is read from the `config/log4j2.properties` file located in the application's home directory. - -To configure Data Prepper, see the following information for each use case: - -* [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/): Learn how to collect trace data and customize a pipeline that ingests and transforms that data. -* [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/): Learn how to set up Data Prepper for log observability. - -## 3. Defining a pipeline - -Create a Data Prepper pipeline file named `pipelines.yaml` using the following configuration: - -```yml -simple-sample-pipeline: - workers: 2 - delay: "5000" - source: - random: - sink: - - stdout: -``` -{% include copy.html %} - -## 4. Running Data Prepper - -Run the following command with your pipeline configuration YAML. - -```bash -docker run --name data-prepper \ - -v /${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml \ - opensearchproject/data-prepper:latest - -``` -{% include copy.html %} - -The example pipeline configuration above demonstrates a simple pipeline with a source (`random`) sending data to a sink (`stdout`). For examples of more advanced pipeline configurations, see [Pipelines]({{site.url}}{{site.baseurl}}/clients/data-prepper/pipelines/). - -After starting Data Prepper, you should see log output and some UUIDs after a few seconds: - -```yml -2021-09-30T20:19:44,147 [main] INFO com.amazon.dataprepper.pipeline.server.DataPrepperServer - Data Prepper server running at :4900 -2021-09-30T20:19:44,681 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer -2021-09-30T20:19:45,183 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer -2021-09-30T20:19:45,687 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer -2021-09-30T20:19:46,191 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer -2021-09-30T20:19:46,694 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer -2021-09-30T20:19:47,200 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer -2021-09-30T20:19:49,181 [simple-test-pipeline-processor-worker-1-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - simple-test-pipeline Worker: Processing 6 records from buffer -07dc0d37-da2c-447e-a8df-64792095fb72 -5ac9b10a-1d21-4306-851a-6fb12f797010 -99040c79-e97b-4f1d-a70b-409286f2a671 -5319a842-c028-4c17-a613-3ef101bd2bdd -e51e700e-5cab-4f6d-879a-1c3235a77d18 -b4ed2d7e-cf9c-4e9d-967c-b18e8af35c90 -``` -The remainder of this page provides examples for running Data Prepper from the Docker image. If you -built it from source, refer to the [Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) for more information. - -However you configure your pipeline, you'll run Data Prepper the same way. You run the Docker -image and modify both the `pipelines.yaml` and `data-prepper-config.yaml` files. - -For Data Prepper 2.0 or later, use this command: - -``` -docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml opensearchproject/data-prepper:latest -``` -{% include copy.html %} - -For Data Prepper versions earlier than 2.0, use this command: - -``` -docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/data-prepper:1.x -``` -{% include copy.html %} - -Once Data Prepper is running, it processes data until it is shut down. Once you are done, shut it down with the following command: - -``` -POST /shutdown -``` -{% include copy-curl.html %} - -### Additional configurations - -For Data Prepper 2.0 or later, the Log4j 2 configuration file is read from `config/log4j2.properties` in the application's home directory. By default, it uses `log4j2-rolling.properties` in the *shared-config* directory. - -For Data Prepper 1.5 or earlier, optionally add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command if you want to pass a custom log4j2 properties file. If no properties file is provided, Data Prepper defaults to the log4j2.properties file in the *shared-config* directory. - -## Next steps - -Trace analytics is an important Data Prepper use case. If you haven't yet configured it, see [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/). - -Log ingestion is also an important Data Prepper use case. To learn more, see [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/). - -To learn how to run Data Prepper with a Logstash configuration, see [Migrating from Logstash]({{site.url}}{{site.baseurl}}/data-prepper/migrating-from-logstash-data-prepper/). - -For information on how to monitor Data Prepper, see [Monitoring]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/monitoring/). - -## More examples - -For more examples of Data Prepper, see [examples](https://github.com/opensearch-project/data-prepper/tree/main/examples/) in the Data Prepper repo. diff --git a/_data-prepper/getting-started/concepts.md b/_data-prepper/getting-started/concepts.md new file mode 100644 index 00000000000..10b2d9e4b14 --- /dev/null +++ b/_data-prepper/getting-started/concepts.md @@ -0,0 +1,20 @@ +--- +layout: default +title: Concepts +nav_order: 10 +grand_parent: OpenSearch Data Prepper +parent: Getting started with OpenSearch Data Prepper +--- + +# Key concepts and fundamentals + +Data Prepper ingests data through customizable [pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). These pipelines consist of pluggable components that you can customize to fit your needs, even allowing you to plug in your own implementations. A Data Prepper pipeline consists of the following components: + +- One [source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/sources/) +- One or more [sinks]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/sinks/) +- (Optional) One [buffer]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/buffers/buffers/) +- (Optional) One or more [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/) + +Each pipeline contains two required components: `source` and `sink`. If a `buffer`, a `processor`, or both are missing from the pipeline, then Data Prepper uses the default `bounded_blocking` buffer and a no-op processor. Note that a single instance of Data Prepper can have one or more pipelines. + + \ No newline at end of file diff --git a/_data-prepper/getting-started/getting-started.md b/_data-prepper/getting-started/getting-started.md new file mode 100644 index 00000000000..ce7634541d4 --- /dev/null +++ b/_data-prepper/getting-started/getting-started.md @@ -0,0 +1,27 @@ +--- +layout: default +title: Getting started with OpenSearch Data Prepper +nav_order: 5 +has_children: true +has_toc: false +redirect_from: + - /clients/data-prepper/get-started/ + - /data-prepper/getting-started/ +items: + - heading: "Understand key concepts" + description: "Learn about the core components and architecture of Data Prepper." + link: "/data-prepper/getting-started/concepts/" + - heading: "Install and configure Data Prepper" + description: "Set up Data Prepper for your environment and configure basic settings." + link: "/data-prepper/getting-started/install-and-configure/" + - heading: "Run Data Prepper" + description: "Start the service and verify that Data Prepper is running correctly." + link: "/data-prepper/getting-started/run-data-prepper/" +--- + +# Getting started with OpenSearch Data Prepper + +This section provides the foundational steps for using OpenSearch Data Prepper. It covers the initial setup, introduces core concepts, and guides you through creating and managing Data Prepper pipelines. Whether your focus is on log collection, trace analysis, or specific use cases, these resources will help you begin working effectively with Data Prepper. + +{% include list.html list_items=page.items%} + diff --git a/_data-prepper/getting-started/install-and-configure.md b/_data-prepper/getting-started/install-and-configure.md new file mode 100644 index 00000000000..736e2a5cbeb --- /dev/null +++ b/_data-prepper/getting-started/install-and-configure.md @@ -0,0 +1,68 @@ +--- +layout: default +title: Install and configure OpenSearch Data Prepper +nav_order: 10 +grand_parent: OpenSearch Data Prepper +parent: Getting started with OpenSearch Data Prepper +--- + +# Install and configure OpenSearch Data Prepper + +This page guides you through the process of installing and configuring OpenSearch Data Prepper. You can install Data Prepper using a pre-built Docker image or by building the project from source, depending on your environment and requirements. + +After installation, you must configure a set of required files that define how Data Prepper runs and processes data. This includes specifying pipeline definitions, server settings, and optional logging configurations. Configuration details vary slightly depending on the version you are using. + +Use this guide to prepare your environment and set up Data Prepper for trace analytics, log ingestion, or other supported use cases. + +## 1. Installing Data Prepper + +There are two ways to install Data Prepper: you can run the Docker image or build from source. + +The easiest way to use Data Prepper is by running the Docker image. We suggest that you use this approach if you have [Docker](https://www.docker.com) available. Run the following command: + +``` +docker pull opensearchproject/data-prepper:latest +``` +{% include copy.html %} + +If you have special requirements that require you to build from source, or if you want to contribute, see the [Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md). + +## 2. Configuring Data Prepper + +Two configuration files are required to run a Data Prepper instance. Optionally, you can configure a Log4j 2 configuration file. See [Configuring Log4j]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/configuring-log4j/) for more information. The following list describes the purpose of each configuration file: + +* `pipelines.yaml`: This file describes which data pipelines to run, including sources, processors, and sinks. +* `data-prepper-config.yaml`: This file contains Data Prepper server settings that allow you to interact with exposed Data Prepper server APIs. +* `log4j2-rolling.properties` (optional): This file contains Log4j 2 configuration options and can be a JSON, YAML, XML, or .properties file type. + +For Data Prepper versions earlier than 2.0, the `.jar` file expects the pipeline configuration file path to be followed by the server configuration file path. See the following configuration path example: + +``` +java -jar data-prepper-core-$VERSION.jar pipelines.yaml data-prepper-config.yaml +``` + +Optionally, you can add `"-Dlog4j.configurationFile=config/log4j2.properties"` to the command to pass a custom Log4j 2 configuration file. If you don't provide a properties file, Data Prepper defaults to the `log4j2.properties` file in the `shared-config` directory. + + +Starting with Data Prepper 2.0, you can launch Data Prepper by using the following `data-prepper` script that does not require any additional command line arguments: + +``` +bin/data-prepper +``` + +Configuration files are read from specific subdirectories in the application's home directory: +1. `pipelines/`: Used for pipeline configurations. Pipeline configurations can be written in one or more YAML files. +2. `config/data-prepper-config.yaml`: Used for the Data Prepper server configuration. + +You can supply your own pipeline configuration file path followed by the server configuration file path. However, this method will not be supported in a future release. See the following example: + +``` +bin/data-prepper pipelines.yaml data-prepper-config.yaml +``` + +The Log4j 2 configuration file is read from the `config/log4j2.properties` file located in the application's home directory. + +To configure Data Prepper, see the following information for each use case: + +* [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/): Learn how to collect trace data and customize a pipeline that ingests and transforms that data. +* [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/): Learn how to set up Data Prepper for log observability. \ No newline at end of file diff --git a/_data-prepper/getting-started/run-data-prepper.md b/_data-prepper/getting-started/run-data-prepper.md new file mode 100644 index 00000000000..48614d07cf8 --- /dev/null +++ b/_data-prepper/getting-started/run-data-prepper.md @@ -0,0 +1,128 @@ +--- +layout: default +title: Running Data Prepper +nav_order: 15 +grand_parent: OpenSearch Data Prepper +parent: Getting started with OpenSearch Data Prepper +--- + +This section explains how to run OpenSearch Data Prepper using a defined pipeline configuration. Before starting the service, you must create a valid pipeline YAML file that defines the data flow—from source to sink—with optional processors and buffers. + +You can run Data Prepper using a Docker container or a local build, depending on your setup. This page provides examples for running the Docker image, along with configuration options for different versions of Data Prepper. Once launched, Data Prepper begins processing data according to the specified pipeline and continues until it is manually shut down. + +## Defining a pipeline + +Create a Data Prepper pipeline file named `pipelines.yaml`, similar to the following sample configuration: + +```yml +simple-sample-pipeline: + workers: 2 + delay: "5000" + source: + random: + sink: + - stdout: +``` +{% include copy.html %} + +### Basic pipeline configurations + +To understand how the pipeline components function within a Data Prepper configuration, see the following examples. Each pipeline configuration uses a `yaml` file format. For more information, see [Pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/) for more information and examples. + +#### Minimal configuration + +The following minimal pipeline configuration reads from the file source and writes the data to another file on the same path. It uses the default options for the `buffer` and `processor` components. + +```yml +sample-pipeline: + source: + file: + path: + sink: + - file: + path: +``` + +#### Comprehensive configuration + +The following comprehensive pipeline configuration uses both required and optional components: + +```yml +sample-pipeline: + workers: 4 # Number of workers + delay: 100 # in milliseconds, how often the workers should run + source: + file: + path: + buffer: + bounded_blocking: + buffer_size: 1024 # max number of events the buffer will accept + batch_size: 256 # max number of events the buffer will drain for each read + processor: + - string_converter: + upper_case: true + sink: + - file: + path: +``` + +In the given pipeline configuration, the `source` component reads string events from the `input-file` and pushes the data to a bounded buffer with a maximum size of `1024`. The `workers` component specifies `4` concurrent threads that will process events from the buffer, each reading a maximum of `256` events from the buffer every `100` milliseconds. Each `workers` component runs the `string_converter` processor, which converts the strings to uppercase and writes the processed output to the `output-file`. + +## 4. Running Data Prepper + +Run the following command with your pipeline configuration YAML. + +```bash +docker run --name data-prepper \ + -v /${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml \ + opensearchproject/data-prepper:latest + +``` +{% include copy.html %} + +The example pipeline configuration above demonstrates a simple pipeline with a source (`random`) sending data to a sink (`stdout`). For examples of more advanced pipeline configurations, see [Pipelines]({{site.url}}{{site.baseurl}}/clients/data-prepper/pipelines/). + +After starting Data Prepper, you should see log output and some UUIDs after a few seconds: + +```yml +2021-09-30T20:19:44,147 [main] INFO com.amazon.dataprepper.pipeline.server.DataPrepperServer - Data Prepper server running at :4900 +2021-09-30T20:19:44,681 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer +2021-09-30T20:19:45,183 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer +2021-09-30T20:19:45,687 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer +2021-09-30T20:19:46,191 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer +2021-09-30T20:19:46,694 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer +2021-09-30T20:19:47,200 [random-source-pool-0] INFO com.amazon.dataprepper.plugins.source.RandomStringSource - Writing to buffer +2021-09-30T20:19:49,181 [simple-test-pipeline-processor-worker-1-thread-1] INFO com.amazon.dataprepper.pipeline.ProcessWorker - simple-test-pipeline Worker: Processing 6 records from buffer +07dc0d37-da2c-447e-a8df-64792095fb72 +5ac9b10a-1d21-4306-851a-6fb12f797010 +99040c79-e97b-4f1d-a70b-409286f2a671 +5319a842-c028-4c17-a613-3ef101bd2bdd +e51e700e-5cab-4f6d-879a-1c3235a77d18 +b4ed2d7e-cf9c-4e9d-967c-b18e8af35c90 +``` +The remainder of this page provides examples for running Data Prepper from the Docker image. If you +built it from source, refer to the [Developer Guide](https://github.com/opensearch-project/data-prepper/blob/main/docs/developer_guide.md) for more information. + +However you configure your pipeline, you'll run Data Prepper the same way. You run the Docker +image and modify both the `pipelines.yaml` and `data-prepper-config.yaml` files. + +For Data Prepper 2.0 or later, use this command: + +``` +docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/config/data-prepper-config.yaml opensearchproject/data-prepper:latest +``` +{% include copy.html %} + +For Data Prepper versions earlier than 2.0, use this command: + +``` +docker run --name data-prepper -p 4900:4900 -v ${PWD}/pipelines.yaml:/usr/share/data-prepper/pipelines.yaml -v ${PWD}/data-prepper-config.yaml:/usr/share/data-prepper/data-prepper-config.yaml opensearchproject/data-prepper:1.x +``` +{% include copy.html %} + +Once Data Prepper is running, it processes data until it is shut down. Once you are done, shut it down with the following command: + +``` +POST /shutdown +``` +{% include copy-curl.html %} \ No newline at end of file diff --git a/_data-prepper/index.md b/_data-prepper/index.md index 63ff2fd07c1..505303d875c 100644 --- a/_data-prepper/index.md +++ b/_data-prepper/index.md @@ -10,70 +10,33 @@ redirect_from: - /clients/data-prepper/index/ - /monitoring-plugins/trace/data-prepper/ - /data-prepper/index/ +tutorial_cards: + - heading: "Trace analytics" + description: "Visualize event flows and find performance issues." + link: "/data-prepper/common-use-cases/trace-analytics/" + - heading: "Log analytics" + description: "Search, analyze, and gain insights from logs." + link: "/data-prepper/common-use-cases/log-analytics/" +items: + - heading: "Getting started with OpenSearch Data Prepper" + description: "Set up Data Prepper and start processing data." + link: "/data-prepper/getting-started/" + - heading: "Get familiar with Data Prepper pipelines" + description: "Learn how to build and configure pipelines." + link: "/data-prepper/pipelines/pipelines/" + - heading: "Explore common use cases" + description: "See how Data Prepper supports key use cases." + link: "/data-prepper/common-use-cases/common-use-cases/" --- # OpenSearch Data Prepper OpenSearch Data Prepper is a server-side data collector capable of filtering, enriching, transforming, normalizing, and aggregating data for downstream analysis and visualization. Data Prepper is the preferred data ingestion tool for OpenSearch. It is recommended for most data ingestion use cases in OpenSearch and for processing large, complex datasets. -With Data Prepper you can build custom pipelines to improve the operational view of applications. Two common use cases for Data Prepper are trace analytics and log analytics. [Trace analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/trace-analytics/) can help you visualize event flows and identify performance problems. [Log analytics]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/log-analytics/) equips you with tools to enhance your search capabilities, conduct comprehensive analysis, and gain insights into your applications' performance and behavior. +With Data Prepper you can build custom pipelines to improve the operational view of applications. Two common use cases for Data Prepper are trace analytics and log analytics. -## Key concepts and fundamentals +{% include cards.html cards=page.tutorial_cards %} -Data Prepper ingests data through customizable [pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). These pipelines consist of pluggable components that you can customize to fit your needs, even allowing you to plug in your own implementations. A Data Prepper pipeline consists of the following components: +## Using OpenSearch Data Prepper -- One [source]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/sources/) -- One or more [sinks]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/sinks/) -- (Optional) One [buffer]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/buffers/buffers/) -- (Optional) One or more [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/) - -Each pipeline contains two required components: `source` and `sink`. If a `buffer`, a `processor`, or both are missing from the pipeline, then Data Prepper uses the default `bounded_blocking` buffer and a no-op processor. Note that a single instance of Data Prepper can have one or more pipelines. - -## Basic pipeline configurations - -To understand how the pipeline components function within a Data Prepper configuration, see the following examples. Each pipeline configuration uses a `yaml` file format. For more information, see [Pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/) for more information and examples. - -### Minimal configuration - -The following minimal pipeline configuration reads from the file source and writes the data to another file on the same path. It uses the default options for the `buffer` and `processor` components. - -```yml -sample-pipeline: - source: - file: - path: - sink: - - file: - path: -``` - -### Comprehensive configuration - -The following comprehensive pipeline configuration uses both required and optional components: - -```yml -sample-pipeline: - workers: 4 # Number of workers - delay: 100 # in milliseconds, how often the workers should run - source: - file: - path: - buffer: - bounded_blocking: - buffer_size: 1024 # max number of events the buffer will accept - batch_size: 256 # max number of events the buffer will drain for each read - processor: - - string_converter: - upper_case: true - sink: - - file: - path: -``` - -In the given pipeline configuration, the `source` component reads string events from the `input-file` and pushes the data to a bounded buffer with a maximum size of `1024`. The `workers` component specifies `4` concurrent threads that will process events from the buffer, each reading a maximum of `256` events from the buffer every `100` milliseconds. Each `workers` component runs the `string_converter` processor, which converts the strings to uppercase and writes the processed output to the `output-file`. - -## Next steps - -- [Getting started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/). -- [Get familiar with Data Prepper pipelines]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines/). -- [Explore common use cases]({{site.url}}{{site.baseurl}}/data-prepper/common-use-cases/common-use-cases/). +{% include list.html list_items=page.items%} \ No newline at end of file diff --git a/_data-prepper/managing-data-prepper/configuring-data-prepper.md b/_data-prepper/managing-data-prepper/configuring-data-prepper.md index ab5f3aa0667..52b3f775d3b 100644 --- a/_data-prepper/managing-data-prepper/configuring-data-prepper.md +++ b/_data-prepper/managing-data-prepper/configuring-data-prepper.md @@ -103,7 +103,7 @@ check_interval | No | Duration | Specifies the time between checks of the heap s ### Extension plugins -Data Prepper provides support for user-configurable extension plugins. Extension plugins are common configurations shared across pipeline plugins, such as [sources, buffers, processors, and sinks]({{site.url}}{{site.baseurl}}/data-prepper/index/#key-concepts-and-fundamentals). +Data Prepper provides support for user-configurable extension plugins. Extension plugins are common configurations shared across pipeline plugins, such as [sources, buffers, processors, and sinks]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/concepts/). ### AWS extension plugins diff --git a/_data-prepper/managing-data-prepper/configuring-log4j.md b/_data-prepper/managing-data-prepper/configuring-log4j.md index fe256e0da5e..ac4010bed05 100644 --- a/_data-prepper/managing-data-prepper/configuring-log4j.md +++ b/_data-prepper/managing-data-prepper/configuring-log4j.md @@ -5,6 +5,8 @@ parent: Managing OpenSearch Data Prepper nav_order: 20 --- + + # Configuring Log4j You can configure logging using Log4j in OpenSearch Data Prepper. diff --git a/_data-prepper/managing-data-prepper/managing-data-prepper.md b/_data-prepper/managing-data-prepper/managing-data-prepper.md index 204510be248..1e42b965f7e 100644 --- a/_data-prepper/managing-data-prepper/managing-data-prepper.md +++ b/_data-prepper/managing-data-prepper/managing-data-prepper.md @@ -7,4 +7,6 @@ nav_order: 20 # Managing OpenSearch Data Prepper -You can perform administrator functions for OpenSearch Data Prepper, including system configuration, interacting with core APIs, Log4j configuration, and monitoring. You can set up peer forwarding to coordinate multiple Data Prepper nodes when using stateful aggregation. \ No newline at end of file +You can perform administrator functions for OpenSearch Data Prepper, including system configuration, interacting with core APIs, Log4j configuration, and monitoring. You can set up peer forwarding to coordinate multiple Data Prepper nodes when using stateful aggregation. + + \ No newline at end of file diff --git a/_data-prepper/managing-data-prepper/monitoring.md b/_data-prepper/managing-data-prepper/monitoring.md index cb29e49a518..10199b90100 100644 --- a/_data-prepper/managing-data-prepper/monitoring.md +++ b/_data-prepper/managing-data-prepper/monitoring.md @@ -4,6 +4,8 @@ title: Monitoring parent: Managing OpenSearch Data Prepper nav_order: 25 --- + + # Monitoring OpenSearch Data Prepper with metrics diff --git a/_data-prepper/managing-data-prepper/peer-forwarder.md b/_data-prepper/managing-data-prepper/peer-forwarder.md index 9d54aef87c9..13ad9361f7e 100644 --- a/_data-prepper/managing-data-prepper/peer-forwarder.md +++ b/_data-prepper/managing-data-prepper/peer-forwarder.md @@ -5,6 +5,8 @@ nav_order: 12 parent: Managing OpenSearch Data Prepper --- + + # Peer forwarder Peer forwarder is an HTTP service that performs peer forwarding of an `event` between OpenSearch Data Prepper nodes for aggregation. This HTTP service uses a hash-ring approach to aggregate events and determine which Data Prepper node it should handle on a given trace before rerouting it to that node. Currently, peer forwarder is supported by the `aggregate`, `service_map_stateful`, and `otel_traces_raw` [processors]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/processors/). diff --git a/_data-prepper/managing-data-prepper/source-coordination.md b/_data-prepper/managing-data-prepper/source-coordination.md index 5dc85e50a7c..5a77ba501e1 100644 --- a/_data-prepper/managing-data-prepper/source-coordination.md +++ b/_data-prepper/managing-data-prepper/source-coordination.md @@ -5,6 +5,8 @@ nav_order: 35 parent: Managing OpenSearch Data Prepper --- + + # Source coordination _Source coordination_ is the concept of coordinating and distributing work between OpenSearch Data Prepper data sources in a multi-node environment. Some data sources, such as Amazon Kinesis or Amazon Simple Queue Service (Amazon SQS), handle coordination natively. Other data sources, such as OpenSearch, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, and JDBC/ODBC, do not support source coordination. diff --git a/_data-prepper/migrating-from-logstash-data-prepper.md b/_data-prepper/migrating-from-logstash-data-prepper.md index 13548092dce..8e442f3ebc8 100644 --- a/_data-prepper/migrating-from-logstash-data-prepper.md +++ b/_data-prepper/migrating-from-logstash-data-prepper.md @@ -29,7 +29,7 @@ As of the Data Prepper 1.2 release, the following plugins from the Logstash conf ## Running Data Prepper with a Logstash configuration -1. To install Data Prepper's Docker image, see Installing Data Prepper in [Getting Started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started#1-installing-data-prepper). +1. To install Data Prepper's Docker image, see Installing Data Prepper in [Getting Started with OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/install-and-configure/#1-installing-data-prepper). 2. Run the Docker image installed in Step 1 by supplying your `logstash.conf` configuration. diff --git a/_data-prepper/pipelines/configuration/processors/convert-entry-type.md b/_data-prepper/pipelines/configuration/processors/convert-entry-type.md index cc707832ad7..4d191adbb85 100644 --- a/_data-prepper/pipelines/configuration/processors/convert-entry-type.md +++ b/_data-prepper/pipelines/configuration/processors/convert-entry-type.md @@ -47,7 +47,7 @@ type-conv-pipeline: ``` {% include copy.html %} -Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/install-and-configure/#2-configuring-data-prepper). For example, before you run the `convert_entry_type` processor, if the `logs_json.log` file contains the following event record: diff --git a/_data-prepper/pipelines/configuration/processors/delete-entries.md b/_data-prepper/pipelines/configuration/processors/delete-entries.md index f30bccae232..5a1b940d09e 100644 --- a/_data-prepper/pipelines/configuration/processors/delete-entries.md +++ b/_data-prepper/pipelines/configuration/processors/delete-entries.md @@ -41,7 +41,7 @@ pipeline: ``` {% include copy.html %} -Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/install-and-configure/#2-configuring-data-prepper). For example, before you run the `delete_entries` processor, if the `logs_json.log` file contains the following event record: diff --git a/_data-prepper/pipelines/configuration/processors/mutate-string.md b/_data-prepper/pipelines/configuration/processors/mutate-string.md index b84e63ea61b..3c6269fdaac 100644 --- a/_data-prepper/pipelines/configuration/processors/mutate-string.md +++ b/_data-prepper/pipelines/configuration/processors/mutate-string.md @@ -53,7 +53,7 @@ pipeline: ``` {% include copy.html %} -Next, create a log file named `logs_json.log`. After that, replace the `path` of the file source in your `pipeline.yaml` file with your file path. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log`. After that, replace the `path` of the file source in your `pipeline.yaml` file with your file path. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/install-and-configure/#2-configuring-data-prepper). Before you run OpenSearch Data Prepper, the source appears in the following format: @@ -105,7 +105,7 @@ pipeline: ``` {% include copy.html %} -Next, create a log file named `logs_json.log`. After that, replace the `path` in the file source of your `pipeline.yaml` file with your file path. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log`. After that, replace the `path` in the file source of your `pipeline.yaml` file with your file path. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/install-and-configure/#2-configuring-data-prepper). Before you run Data Prepper, the source appears in the following format: @@ -150,7 +150,7 @@ pipeline: ``` {% include copy.html %} -Next, create a log file named `logs_json.log`. After that, replace the `path` in the file source of your `pipeline.yaml` file with the correct file path. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log`. After that, replace the `path` in the file source of your `pipeline.yaml` file with the correct file path. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/install-and-configure/#2-configuring-data-prepper). Before you run Data Prepper, the source appears in the following format: @@ -195,7 +195,7 @@ pipeline: ``` {% include copy.html %} -Next, create a log file named `logs_json.log`. After that, replace the `path` in the file source of your `pipeline.yaml` file with the correct file path. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log`. After that, replace the `path` in the file source of your `pipeline.yaml` file with the correct file path. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/install-and-configure/#2-configuring-data-prepper). Before you run Data Prepper, the source appears in the following format: @@ -241,7 +241,7 @@ pipeline: ``` {% include copy.html %} -Next, create a log file named `logs_json.log`. After that, replace the `path` in the file source of your `pipeline.yaml` file with the correct file path. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log`. After that, replace the `path` in the file source of your `pipeline.yaml` file with the correct file path. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/install-and-configure/#2-configuring-data-prepper). Before you run Data Prepper, the source appears in the following format: diff --git a/_data-prepper/pipelines/configuration/processors/rename-keys.md b/_data-prepper/pipelines/configuration/processors/rename-keys.md index a2f1711ebf4..c14d3c69b2e 100644 --- a/_data-prepper/pipelines/configuration/processors/rename-keys.md +++ b/_data-prepper/pipelines/configuration/processors/rename-keys.md @@ -44,7 +44,7 @@ pipeline: {% include copy.html %} -Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +Next, create a log file named `logs_json.log` and replace the `path` in the file source of your `pipeline.yaml` file with that filepath. For more information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/install-and-configure/#2-configuring-data-prepper). For example, before you run the `rename_keys` processor, if the `logs_json.log` file contains the following event record: diff --git a/_data-prepper/pipelines/configuration/processors/trace-peer-forwarder.md b/_data-prepper/pipelines/configuration/processors/trace-peer-forwarder.md index 2665b985f72..4f5b70f1076 100644 --- a/_data-prepper/pipelines/configuration/processors/trace-peer-forwarder.md +++ b/_data-prepper/pipelines/configuration/processors/trace-peer-forwarder.md @@ -14,7 +14,7 @@ You should use `trace_peer_forwarder` for Trace Analytics pipelines when you hav ## Usage -To get started with `trace_peer_forwarder`, first configure [peer forwarder]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/peer-forwarder/). Then create a `pipeline.yaml` file and specify `trace peer forwarder` as the processor. You can configure `peer forwarder` in your `data-prepper-config.yaml` file. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/#2-configuring-data-prepper). +To get started with `trace_peer_forwarder`, first configure [peer forwarder]({{site.url}}{{site.baseurl}}/data-prepper/managing-data-prepper/peer-forwarder/). Then create a `pipeline.yaml` file and specify `trace peer forwarder` as the processor. You can configure `peer forwarder` in your `data-prepper-config.yaml` file. For more detailed information, see [Configuring OpenSearch Data Prepper]({{site.url}}{{site.baseurl}}/data-prepper/getting-started/install-and-configure/#2-configuring-data-prepper). See the following example `pipeline.yaml` file: diff --git a/_data-prepper/pipelines/functions.md b/_data-prepper/pipelines/functions.md deleted file mode 100644 index caed78ac550..00000000000 --- a/_data-prepper/pipelines/functions.md +++ /dev/null @@ -1,18 +0,0 @@ ---- -layout: default -title: Functions -parent: Pipelines -nav_order: 10 -has_children: true ---- - -# Functions - -OpenSearch Data Prepper offers a range of built-in functions that can be used within expressions to perform common data preprocessing tasks, such as calculating lengths, checking for tags, retrieving metadata, searching for substrings, checking IP address ranges, and joining list elements. These functions include the following: - -- [`cidrContains()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/cidrcontains/) -- [`contains()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/contains/) -- [`getMetadata()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/get-metadata/) -- [`hasTags()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/has-tags/) -- [`join()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/join/) -- [`length()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/length/) \ No newline at end of file diff --git a/_data-prepper/pipelines/cidrcontains.md b/_data-prepper/pipelines/functions/cidrcontains.md similarity index 95% rename from _data-prepper/pipelines/cidrcontains.md rename to _data-prepper/pipelines/functions/cidrcontains.md index 898f1bc1f58..65a1a7eb606 100644 --- a/_data-prepper/pipelines/cidrcontains.md +++ b/_data-prepper/pipelines/functions/cidrcontains.md @@ -4,6 +4,8 @@ title: cidrContains() parent: Functions grand_parent: Pipelines nav_order: 5 +redirect_from: + - /data-prepper/pipelines/cidrcontains/ --- # cidrContains() diff --git a/_data-prepper/pipelines/contains.md b/_data-prepper/pipelines/functions/contains.md similarity index 96% rename from _data-prepper/pipelines/contains.md rename to _data-prepper/pipelines/functions/contains.md index 657f66bd28e..565acdc08ee 100644 --- a/_data-prepper/pipelines/contains.md +++ b/_data-prepper/pipelines/functions/contains.md @@ -4,6 +4,8 @@ title: contains() parent: Functions grand_parent: Pipelines nav_order: 10 +redirect_from: + - /data-prepper/pipelines/contains/ --- # contains() diff --git a/_data-prepper/pipelines/functions/functions.md b/_data-prepper/pipelines/functions/functions.md new file mode 100644 index 00000000000..1c5cb24d74f --- /dev/null +++ b/_data-prepper/pipelines/functions/functions.md @@ -0,0 +1,34 @@ +--- +layout: default +title: Functions +parent: Pipelines +nav_order: 10 +has_children: true +redirect_from: + - /data-prepper/pipelines/functions/ +tutorial_cards: + - heading: "cidrContains()" + description: "Checks if an IP is in a CIDR block." + link: "/data-prepper/pipelines/cidrcontains/" + - heading: "contains()" + description: "Checks if a value exists in a string or list." + link: "/data-prepper/pipelines/contains/" + - heading: "getMetadata()" + description: "Retrieves metadata from a record." + link: "/data-prepper/pipelines/get-metadata/" + - heading: "hasTags()" + description: "Checks if a record has specific tags." + link: "/data-prepper/pipelines/has-tags/" + - heading: "join()" + description: "Combines list items into a string." + link: "/data-prepper/pipelines/join/" + - heading: "length()" + description: "Gets the length of a string or list." + link: "/data-prepper/pipelines/length/" +--- + +# Functions + +OpenSearch Data Prepper offers a range of built-in functions that can be used within expressions to perform common data preprocessing tasks, such as calculating lengths, checking for tags, retrieving metadata, searching for substrings, checking IP address ranges, and joining list elements. + +{% include cards.html cards=page.tutorial_cards %} \ No newline at end of file diff --git a/_data-prepper/pipelines/get-metadata.md b/_data-prepper/pipelines/functions/get-metadata.md similarity index 93% rename from _data-prepper/pipelines/get-metadata.md rename to _data-prepper/pipelines/functions/get-metadata.md index fc89ed51d6c..e0753322050 100644 --- a/_data-prepper/pipelines/get-metadata.md +++ b/_data-prepper/pipelines/functions/get-metadata.md @@ -4,6 +4,8 @@ title: getMetadata() parent: Functions grand_parent: Pipelines nav_order: 15 +redirect_from: + - /data-prepper/pipelines/get-metadata/ --- # getMetadata() diff --git a/_data-prepper/pipelines/has-tags.md b/_data-prepper/pipelines/functions/has-tags.md similarity index 94% rename from _data-prepper/pipelines/has-tags.md rename to _data-prepper/pipelines/functions/has-tags.md index 85058429936..e65b541f5d9 100644 --- a/_data-prepper/pipelines/has-tags.md +++ b/_data-prepper/pipelines/functions/has-tags.md @@ -4,6 +4,8 @@ title: hasTags() parent: Functions grand_parent: Pipelines nav_order: 20 +redirect_from: + - /data-prepper/pipelines/has-tags/ --- # hasTags() diff --git a/_data-prepper/pipelines/join.md b/_data-prepper/pipelines/functions/join.md similarity index 93% rename from _data-prepper/pipelines/join.md rename to _data-prepper/pipelines/functions/join.md index 3a4d77d5c2e..17305bec3d0 100644 --- a/_data-prepper/pipelines/join.md +++ b/_data-prepper/pipelines/functions/join.md @@ -4,6 +4,8 @@ title: join() parent: Functions grand_parent: Pipelines nav_order: 25 +redirect_from: + - /data-prepper/pipelines/join/ --- # join() diff --git a/_data-prepper/pipelines/length.md b/_data-prepper/pipelines/functions/length.md similarity index 90% rename from _data-prepper/pipelines/length.md rename to _data-prepper/pipelines/functions/length.md index fca4b10df2a..53a620687f2 100644 --- a/_data-prepper/pipelines/length.md +++ b/_data-prepper/pipelines/functions/length.md @@ -4,6 +4,8 @@ title: length() parent: Functions grand_parent: Pipelines nav_order: 30 +redirect_from: + - /data-prepper/pipelines/length/ --- # length()