The S3 connector Parquet format allows you to export data from Kafka topics to S3 objects in Parquet format.
Note! Testing has been minimal thus far.
mvn clean package
-
Insert your AWS credentials to
example/.env
-
Create a bucket in S3 and insert the bucket name in
example/sink.json
-
Copy target/parquet-format-5.2.0-SNAPSHOT-shaded.jar to example/jars/ folder
-
Run following steps
cd example docker-compose up -d # Create the topic docker exec -it connect bash -c \ "kafka-topics --zookeeper zookeeper \ --topic s3_topic --create \ --replication-factor 1 --partitions 1" # Add the connector with ParquetFormat `format.class` curl -X POST \ -H 'Host: connect.example.com' \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ http://localhost:8083/connectors -d @config/sink.json # Open a shell docker exec -it connect bash # Produce at least 3 Avro messages to the topic kafka-avro-console-producer --broker-list kafka:9092 \ --property schema.registry.url=http://schema_registry:8081/ --topic s3_topic \ --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}' {"f1":"1"} {"f1":"2"} {"f1":"3"} docker-compose down
Check that files exist in you bucket, e.g.
/topics/s3_topic/partition=0/s3_topic+0+0000000000.parquet
Copy the file(s) to a local folder and verify content
parquet-tools cat -j ~/Downloads/s3_topic+0+0000000000.parquet
{"f1":"1"}
{"f1":"2"}
{"f1":"3"}