Skip to content

Commit 1483136

Browse files
kristoffSCKrzysztof Chmielewski
authored andcommitted
Init commit
0 parents  commit 1483136

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+3141
-0
lines changed

.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
/.idea/
2+
.idea
3+
target
4+
/flink.http.connector.iml
5+
/src/main/flink-http-connector.iml
6+
/src/main/main.iml
7+
/src/test/test.iml
8+
/flink-http-connector.iml

.gitlab-ci.yml

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
image: maven:3.6.3-jdk-11
2+
3+
stages:
4+
- pre
5+
- build
6+
- test
7+
- visualize
8+
- deploy
9+
- .post
10+
11+
variables:
12+
MAVEN_CLI_OPTS: "--batch-mode"
13+
MAVEN_OPTS: "-Dmaven.repo.local=.m2/repository"
14+
JAVA_ADDITIONAL_OPTS: "-Dorg.slf4j.simpleLogger.showDateTime=true -Dorg.slf4j.simpleLogger.dateTimeFormat=HH:mm:ss,SSS"
15+
FF_USE_FASTZIP: "true"
16+
17+
cache:
18+
paths:
19+
- .m2/repository/
20+
- target/
21+
22+
build:
23+
stage: build
24+
script:
25+
- mvn $MAVEN_CLI_OPTS $JAVA_ADDITIONAL_OPTS compile
26+
27+
test:
28+
stage: test
29+
script:
30+
- mvn $MAVEN_CLI_OPTS $JAVA_ADDITIONAL_OPTS $JAVA_DOCKER_OPTS test integration-test
31+
- cat target/site/jacoco/index.html | grep -o 'Total[^%]*%'
32+
artifacts:
33+
paths:
34+
- target/site/jacoco/jacoco.xml
35+
- target/site/jacoco/jacoco.html
36+
37+
coverage:
38+
# Must be in a stage later than test-jdk11's stage.
39+
# The `visualize` stage does not exist by default.
40+
# Please define it first, or choose an existing stage like `deploy`.
41+
stage: visualize
42+
image: registry.gitlab.com/haynes/jacoco2cobertura:1.0.7
43+
script:
44+
# convert report from jacoco to cobertura, using relative project path
45+
- python /opt/cover2cover.py target/site/jacoco/jacoco.xml $CI_PROJECT_DIR/src/main/java/ > target/site/cobertura.xml
46+
needs: ["test"]
47+
artifacts:
48+
reports:
49+
cobertura: target/site/cobertura.xml

README.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# flink-http-connector
2+
The HTTP TableLookup connector that allows for pulling data from external system via HTTP GET method.
3+
The goal for this connector was to use it in Flink SQL statement as a standard table that can be later joined with other stream using pure SQL Flink.
4+
5+
Currently, connector supports only Lookup Joins [1] and expects JSON as a response body.
6+
7+
Connector supports only STRING types.
8+
9+
## Prerequisites
10+
* Java 11
11+
* Maven 3
12+
* Flink 14+
13+
14+
## Structure and further work
15+
The main code can be found under `com.getindata.connectors.http.table` package plus additional classes directly from `com.getindata.connectors.http`
16+
17+
The `com.getindata.connectors.http.stream` package is pure PoC and currently not meant to be use. The purpose of this package was to test out the new Unified Source design for Flink Source Connector
18+
[2]. Currently, the implementation is purely PoC, and requires further development.
19+
20+
## Usage
21+
Flink SQL table definition:
22+
23+
```roomsql
24+
CREATE TABLE Customers (
25+
id STRING,
26+
id2 STRING,
27+
msg STRING,
28+
uuid STRING,
29+
isActive STRING,
30+
balance STRING
31+
) WITH (
32+
'connector' = 'rest-lookup',
33+
'url' = 'http://localhost:8080/client',
34+
'asyncPolling' = 'true',
35+
'field.isActive.path' = '$.details.isActive',
36+
'field.balance.path' = '$.details.nestedDetails.balance'
37+
)
38+
```
39+
Using _Customers_ table in Flink SQL Lookup Join:
40+
41+
```roomsql
42+
SELECT o.id, o.id2, c.msg, c.uuid, c.isActive, c.balance FROM Orders AS o
43+
JOIN Customers FOR SYSTEM_TIME AS OF o.proc_time AS c ON o.id = c.id AND o.id2 = c.id2
44+
```
45+
46+
The columns and their values used for JOIN `ON` condition will be used as HTTP get parameters where the column name will be used as a request parameter name.
47+
For Example:
48+
``
49+
http://localhost:8080/client/service?id=1&uuid=2
50+
``
51+
52+
## Http Response to Table schema mapping
53+
The mapping from Http Json Response to SQL table schema is done via Json Paths [3].
54+
This is achieved thanks to `com.jayway.jsonpath:json-path` library.
55+
56+
If no `root` or `field.#.path` option is defined, the connector will use the column name as json path and will try to look for Json Node with that name in received Json. If no node with a given name is found, the connector will return `null` as value for this field.
57+
58+
If the `field.#.path` option is defined, connector will use given Json path from option's value in order to find Json data that should be used for this column.
59+
For example `'field.isActive.path' = '$.details.isActive'` - the value for table column `isActive` will be taken from `$.details.isActive` node from received Json.
60+
61+
## Connector Options
62+
| Option | Required | Description/Value|
63+
| -------------- | ----------- | -------------- |
64+
| connector | required | The Value should be set to _rest-lookup_|
65+
| url | required | The base URL that should be use for GET requests. For example _http://localhost:8080/client_|
66+
| asyncPolling | optional | true/false - determines whether Async Pooling should be used. Mechanism is based on Flink's Async I/O.|
67+
| root | optional | Sets the json root node for entire table. The value should be presented as Json Path [3], for example `$.details`.|
68+
| field.#.path | optional | The Json Path from response model that should be use for given `#` field. If `root` option was defined it will be added to field path. The value must be presented in Json Path format [3], for example `$.details.nestedDetails.balance` |
69+
70+
## TODO
71+
- Implement caches.
72+
- Add support for other Flink types. Currently, STRING type is only fully supported.
73+
- Think about Retry Policy for Http Request
74+
- Use Flink Format [4] to parse Json response
75+
- Add Configurable Timeout value
76+
- Check other `//TODO`'s.
77+
78+
###
79+
[1] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/queries/joins/#lookup-join
80+
</br>
81+
[2] https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/sources/
82+
</br>
83+
[3] https://support.smartbear.com/alertsite/docs/monitors/api/endpoint/jsonpath.html
84+
</br>
85+
[4] https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/json/

0 commit comments

Comments
 (0)