Skip to content

Commit ed998d6

Browse files
committed
Singer/Meltano: Add example github-to-cratedb
It uses the `meltano-target-cratedb` Singer component. https://github.com/crate-workbench/meltano-target-cratedb
1 parent c9a59ec commit ed998d6

File tree

12 files changed

+503
-1
lines changed

12 files changed

+503
-1
lines changed
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
name: Python SQLAlchemy
2+
3+
on:
4+
pull_request:
5+
branches: ~
6+
paths:
7+
- '.github/workflows/test-singer-meltano.yml'
8+
- 'framework/singer-meltano/**'
9+
- 'requirements.txt'
10+
push:
11+
branches: [ main ]
12+
paths:
13+
- '.github/workflows/test-singer-meltano.yml'
14+
- 'framework/singer-meltano/**'
15+
- 'requirements.txt'
16+
17+
# Allow job to be triggered manually.
18+
workflow_dispatch:
19+
20+
# Run job each night after CrateDB nightly has been published.
21+
schedule:
22+
- cron: '0 3 * * *'
23+
24+
# Cancel in-progress jobs when pushing to the same branch.
25+
concurrency:
26+
cancel-in-progress: true
27+
group: ${{ github.workflow }}-${{ github.ref }}
28+
29+
jobs:
30+
test:
31+
name: "
32+
Python: ${{ matrix.python-version }}
33+
CrateDB: ${{ matrix.cratedb-version }}
34+
on ${{ matrix.os }}"
35+
runs-on: ${{ matrix.os }}
36+
strategy:
37+
fail-fast: false
38+
matrix:
39+
os: [ 'ubuntu-latest' ]
40+
python-version: [ '3.10', '3.11' ]
41+
cratedb-version: [ 'nightly' ]
42+
43+
services:
44+
cratedb:
45+
image: crate/crate:nightly
46+
ports:
47+
- 4200:4200
48+
- 5432:5432
49+
50+
steps:
51+
52+
- name: Acquire sources
53+
uses: actions/checkout@v4
54+
55+
- name: Set up Python
56+
uses: actions/setup-python@v5
57+
with:
58+
python-version: ${{ matrix.python-version }}
59+
architecture: x64
60+
cache: 'pip'
61+
cache-dependency-path: |
62+
requirements.txt
63+
framework/singer-meltano/requirements.txt
64+
framework/singer-meltano/requirements-dev.txt
65+
66+
- name: Install utilities
67+
run: |
68+
pip install -r requirements.txt
69+
70+
- name: Validate framework/singer-meltano
71+
run: |
72+
ngr test --accept-no-venv framework/singer-meltano

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1+
.DS_Store
12
.idea
3+
.env
24
.venv*
35
__pycache__
46
.coverage
57
coverage.xml
68
mlruns/
79
archive/
8-
logs.log

framework/singer-meltano/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
.meltano
2+
output

framework/singer-meltano/README.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# Meltano Examples
2+
3+
Concise examples about working with [CrateDB] and [Meltano], for conceiving and
4+
running flexible ELT tasks. All the recipes are using [meltano-target-cratedb]
5+
for reading and writing data from/to CrateDB.
6+
7+
## What's inside
8+
9+
- `singerfile-to-cratedb`: Acquire data from Singer File, and load it into
10+
CrateDB database table.
11+
12+
- `github-to-cratedb`: Acquire repository metadata from GitHub API, and load
13+
it separated per entity into 32 CrateDB database tables.
14+
15+
## Prerequisites
16+
17+
Before running an examples within the subdirectories, make sure to install
18+
Meltano and its dependencies.
19+
20+
```shell
21+
python3 -m venv .venv
22+
source .venv/bin/activate
23+
pip install -r requirements.txt
24+
```
25+
26+
## Usage
27+
28+
Then, explore the individual Meltano projects, either invoke them from within
29+
their directories, or by using the `--cwd` option from the root folder.
30+
31+
```shell
32+
meltano --cwd github-to-cratedb install
33+
meltano --cwd github-to-cratedb run tap-github target-cratedb
34+
```
35+
36+
## Software Tests
37+
```shell
38+
pip install -r requirements-dev.txt
39+
poe check
40+
```
41+
42+
43+
[CrateDB]: https://cratedb.com/product
44+
[Meltano]: https://meltano.com/
45+
[meltano-target-cratedb]: https://github.com/crate-workbench/meltano-target-cratedb
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Meltano GitHub -> CrateDB example
2+
3+
## About
4+
5+
Acquire repository metadata from GitHub API, and insert into CrateDB database
6+
tables, using [meltano-target-cratedb].
7+
8+
It follows the canonical example demonstrated at the [Meltano Getting Started Tutorial].
9+
10+
## Configuration
11+
12+
### tap-github
13+
14+
For accessing the GitHub API, you will need an authentication token. It
15+
can be acquired at [GitHub Developer Settings » Tokens].
16+
17+
To configure the recipe, please store it into the `TAP_GITHUB_AUTH_TOKEN`
18+
environment variable, either interactively, or by creating a dotenv
19+
configuration file `.env`.
20+
21+
```shell
22+
TAP_GITHUB_AUTH_TOKEN='ghp_hmQR3XTFWkfIcuyjRTBuVrRt6mnL1j2mMPT8'
23+
```
24+
25+
Then, in `meltano.yml`, identify the `tap-github` section in `plugins.extractors`,
26+
and adjust the value of `config.repositories` to correspond to the repository
27+
you intend to scrape.
28+
29+
### target-cratedb
30+
31+
Within `loaders` section `target-cratedb`, adjust `config.sqlalchemy_url` to
32+
match your database connectivity settings.
33+
34+
35+
## Usage
36+
37+
Install dependencies.
38+
```shell
39+
meltano install
40+
```
41+
42+
Invoke data transfer to JSONL files.
43+
```shell
44+
meltano run tap-github target-jsonl
45+
cat github-to-cratedb/output/commits.jsonl
46+
```
47+
48+
Invoke data transfer to CrateDB database.
49+
```shell
50+
meltano run tap-github target-cratedb
51+
```
52+
53+
## Screenshot
54+
55+
Enjoy the release notes.
56+
```sql
57+
SELECT repo, tag_name, body FROM melty.releases ORDER BY tag_name DESC;
58+
```
59+
60+
![image](https://github.com/crate-workbench/cratedb-toolkit/assets/453543/ac37c9cc-8e42-4c7c-84aa-64498bf48f4d)
61+
62+
## Troubleshooting
63+
64+
If you see such errors on stdout, please verify your GitHub authentication
65+
token stored within the `TAP_GITHUB_AUTH_TOKEN` environment variable.
66+
```python
67+
singer_sdk.exceptions.RetriableAPIError: 401 Client Error: b'{"message":"This endpoint requires you to be authenticated.","documentation_url":"https://docs.github.com/graphql/guides/forming-calls-with-graphql#authenticating-with-graphql"}' (Reason: Unauthorized) for path: /graphql cmd_type=elb consumer=False name=tap-github producer=True stdio=stderr string_id=tap-github
68+
```
69+
70+
## Development
71+
In order to link the sandbox to a development installation of [meltano-target-cratedb],
72+
configure the `pip_url` of the component like this:
73+
```yaml
74+
pip_url: --editable=/path/to/sources/meltano-target-cratedb
75+
```
76+
77+
78+
[GitHub Developer Settings » Tokens]: https://github.com/settings/tokens
79+
[Meltano Getting Started Tutorial]: https://docs.meltano.com/getting-started/part1
80+
[meltano-target-cratedb]: https://github.com/crate-workbench/meltano-target-cratedb
81+
[tap-github]: https://hub.meltano.com/extractors/tap-github/
82+
[target-jsonl]: https://hub.meltano.com/loaders/target-jsonl/
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# A Meltano project is just a directory on your filesystem containing text-based files.
2+
# At a minimum, a Meltano project must contain a project file named `meltano.yml`,
3+
# which contains your project configuration, and tells Meltano that a particular
4+
# directory is a Meltano project.
5+
---
6+
version: 1
7+
default_environment: dev
8+
send_anonymous_usage_stats: false
9+
project_id: f14797b9-9d1c-414c-851c-c91e08ddbc2e
10+
11+
environments:
12+
- name: dev
13+
- name: staging
14+
- name: prod
15+
16+
plugins:
17+
18+
# Configure data source.
19+
# In Singer jargon, it is an "extractor", wrapped into a "tap".
20+
extractors:
21+
22+
- name: tap-github
23+
variant: cratedb
24+
namespace: cratedb
25+
pip_url: git+https://github.com/crate-workbench/tap-github.git@cratedb
26+
# Note: Configure your GitHub repository here.
27+
config:
28+
start_date: '2023-12-01'
29+
repositories:
30+
- crate-workbench/cratedb-toolkit
31+
32+
# Configure data sinks.
33+
# In Singer jargon, it is a "loader", wrapped into a "target".
34+
loaders:
35+
36+
- name: target-jsonl
37+
variant: andyh1203
38+
pip_url: target-jsonl
39+
40+
- name: target-cratedb
41+
namespace: cratedb
42+
variant: cratedb
43+
# Acquire from PyPI.
44+
pip_url: meltano-target-cratedb
45+
# Acquire from GitHub.
46+
# pip_url: git+https://github.com/crate-workbench/meltano-target-cratedb.git
47+
48+
# Note: Configure your database server and credentials here.
49+
config:
50+
sqlalchemy_url: crate://crate@localhost/
51+
add_record_metadata: true

0 commit comments

Comments
 (0)