Skip to content

revamp kinesis analytics docs #1324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
190 changes: 64 additions & 126 deletions content/en/user-guide/aws/kinesisanalytics/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,168 +6,106 @@ description: >
tags: ["Pro image"]
---

## Introduction

Kinesis Data Analytics is a service offered by Amazon Web Services (AWS) that enables you to process and analyze streaming data in real-time.
Kinesis Data Analytics allows you to apply transformations, filtering, and enrichment to streaming data using standard SQL syntax.
You can also run Java or Scala programs against streaming sources to perform various operations on the data using Apache Flink.

LocalStack allows you to use the Kinesis Data Analytics APIs in your local environment to run continuous SQL queries directly over your Kinesis data streams.
The supported APIs are available on our [API coverage page](https://docs.localstack.cloud/references/coverage/coverage_kinesisanalyticsv2/), which provides information on the extent of Kinesis Data Analytics integration with LocalStack.
LocalStack allows you to use the Kinesis Data Analytics APIs in your local environment.
The API coverage is available on:

* [Kinesis Data Analytics V1](https://docs.localstack.cloud/references/coverage/coverage_kinesisanalytics/)
* [Kinesis Data Analytics V2](https://docs.localstack.cloud/references/coverage/coverage_kinesisanalyticsv2/)

This provides information on the extent of Kinesis Data Analytics integration with LocalStack.

## Getting started

This guide is designed for users new to Kinesis Data Analytics and assumes basic knowledge of the AWS CLI and our [`awslocal`](https://github.com/localstack/awscli-local) wrapper script.

Start your LocalStack container using your preferred method.
We will demonstrate how to create a Kinesis Analytics application for Apache Flink and the DataStream API using AWS CLI.

### Create Amazon Kinesis Data Streams

Before creating a Kinesis Data Analytics application, you need to create two Kinesis Data Streams.
You can create the streams using the [`CreateStream`](https://docs.aws.amazon.com/kinesis/latest/APIReference/API_CreateStream.html) API.
Execute the following command to create the streams:

{{< command >}}
$ awslocal kinesis create-stream \
--stream-name ExampleInputStream \
--shard-count 1
--region us-west-2

$ awslocal kinesis create-stream \
--stream-name ExampleOutputStream \
--shard-count 1
--region us-west-2
{{< /command >}}

### Download Apache Flink Streaming Java Code
We will demonstrate how to create a Kinesis Analytics application using AWS CLI.

To create a Kinesis Data Analytics application, you need to download the Java application code for Apache Flink.
You can find the code in the [Kinesis Data Analytics for Apache Flink GitHub repository](https://github.com/aws-samples/amazon-kinesis-data-analytics-java-examples).
Clone it on your local machine using [`git clone`](https://git-scm.com/docs/git-clone).
### Create an application

{{< command >}}
$ git clone https://github.com/aws-samples/amazon-kinesis-data-analytics-java-examples
{{< /command >}}

You can navigate to the `amazon-kinesis-data-analytics-java-examples/GettingStarted` directory to find the Java code for the Kinesis Data Analytics application.
The application creates source and sink connectors to access external resources using a `StreamExecutionEnvironment` object.


You can now compile the project using Apache Maven and the Java Development Kit (JDK) to create a JAR file.
Run the following command to compile and package the application into a JAR file:
You can create a Kinesis Analytics application using the [`CreateApplication`](https://docs.aws.amazon.com/kinesisanalytics/latest/APIReference/API_CreateApplication.html) API by running the following command:

{{< command >}}
$ mvn package -Dflink.version=1.15.3
$ awslocal kinesisanalytics create-application \
--application-name test-analytics-app
{{< /command >}}

After the application is compiled successfully, you can find the JAR file in the `target/aws-kinesis-analytics-java-apps-1.0.jar` directory.
The following output would be retrieved:

### Upload the Apache Flink Streaming Java Code

You can now create an S3 bucket to upload the JAR file.
Create an S3 bucket using the [`mb`](https://docs.aws.amazon.com/cli/latest/reference/s3/mb.html) command:
```bash
{
"ApplicationSummary": {
"ApplicationName": "test-analytics-app",
"ApplicationARN": "arn:aws:kinesisanalytics:us-east-1:000000000000:application/test-analytics-app",
"ApplicationStatus": "READY"
}
}
```

{{< command >}}
$ awslocal s3 mb s3://ka-app-code-kafka --region us-west-2
{{< /command >}}
### Describe the application

You can now upload the JAR file to the S3 bucket using the [`cp`](https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html) command:
You can describe the application using the [`DescribeApplication`](https://docs.aws.amazon.com/kinesisanalytics/latest/APIReference/API_DescribeApplication.html) API by running the following command:

{{< command >}}
$ awslocal s3 cp ./target/aws-kinesis-analytics-java-apps-1.0.jar s3://ka-app-code-kafka --region us-west-2
$ awslocal kinesisanalytics describe-application \
--application-name test-analytics-app
{{< /command >}}

### Create a Kinesis Data Analytics Application
The following output would be retrieved:

You can now use the AWS CLI to create the Kinesis Data Analytics application.
Create a JSON file named `create_request.json`, and upload the following code to the file:

```json
```bash
{
"ApplicationName": "test",
"ApplicationDescription": "my java test app",
"RuntimeEnvironment": "FLINK-1_15",
"ServiceExecutionRole": "arn:aws:iam::000000000000:role/KA-stream-rw-role",
"ApplicationConfiguration": {
"ApplicationCodeConfiguration": {
"CodeContent": {
"S3ContentLocation": {
"BucketARN": "arn:aws:s3:::ka-app-code-kafka",
"FileKey": "aws-kinesis-analytics-java-apps-1.0.jar"
}
},
"CodeContentType": "ZIPFILE"
},
"EnvironmentProperties": {
"PropertyGroups": [
{
"PropertyGroupId": "ProducerConfigProperties",
"PropertyMap" : {
"flink.stream.initpos" : "LATEST",
"aws.region" : "us-east-1",
"AggregationEnabled" : "false"
}
},
{
"PropertyGroupId": "ConsumerConfigProperties",
"PropertyMap" : {
"aws.region" : "us-east-1"
}
}
]
}
"ApplicationDetail": {
"ApplicationName": "test-analytics-app",
"ApplicationARN": "arn:aws:kinesisanalytics:us-east-1:000000000000:application/test-analytics-app",
"ApplicationStatus": "READY",
"CreateTimestamp": 1718194721.567,
"InputDescriptions": [],
"OutputDescriptions": [],
"ReferenceDataSourceDescriptions": [],
"CloudWatchLoggingOptionDescriptions": [],
"ApplicationVersionId": 1
}
}
```

You can now create the Kinesis Data Analytics application using the [`CreateApplication`](https://docs.aws.amazon.com/kinesisanalytics/latest/apiv2/API_CreateApplication.html) API.
Execute the following command to create the application:
### Tag the application

Add tags to the application using the [`TagResource`](https://docs.aws.amazon.com/kinesisanalytics/latest/APIReference/API_TagResource.html) API by running the following command:

{{< command >}}
$ awslocal kinesisanalyticsv2 create-application \
--cli-input-json file://create_request.json \
--region us-west-2
$ awslocal kinesisanalytics tag-resource \
--resource-arn arn:aws:kinesisanalytics:us-east-1:000000000000:application/test-analytics-app \
--tags Key=test,Value=test
{{< /command >}}

The application is now created.
You can now go ahead and run the application!

### Writing sample data to the input stream

You can now write sample data to the input stream using the following Python script, named `script.py`:

```python3
import datetime
import json
import random
import boto3

STREAM_NAME = "ExampleInputStream"
You can list the tags for the application using the [`ListTagsForResource`](https://docs.aws.amazon.com/kinesisanalytics/latest/APIReference/API_ListTagsForResource.html) API by running the following command:

endpoint_url = "http://localhost.localstack.cloud:4566"

def get_data():
return {
'event_time': datetime.datetime.now().isoformat(),
'ticker': random.choice(['AAPL', 'AMZN', 'MSFT', 'INTC', 'TBV']),
'price': round(random.random() * 100, 2)}


def generate(stream_name, kinesis_client):
while True:
data = get_data()
print(data)
kinesis_client.put_record(
StreamName=stream_name,
Data=json.dumps(data),
PartitionKey="partitionkey")
{{< command >}}
$ awslocal kinesisanalytics list-tags-for-resource \
--resource-arn arn:aws:kinesisanalytics:us-east-1:000000000000:application/test-analytics-app
{{< /command >}}

The following output would be retrieved:

if __name__ == '__main__':
generate(STREAM_NAME, boto3.client('kinesis', endpoint_url=endpoint_url, region_name='us-west-2'))
```bash
{
"Tags": [
{
"Key": "test",
"Value": "test"
}
]
}
```

Run the following command to execute the script:
## Current Limitations

{{< command >}}
$ python3 script.py
{{< /command >}}
- LocalStack supports basic emulation for the version 1 of the Kinesis Data Analytics API. However, the queries are not fully supported and lack parity with AWS.
- LocalStack supports CRUD mocking for the version 2 of the Kinesis Data Analytics API.
Loading