Skip to content
This repository was archived by the owner on Sep 23, 2024. It is now read-only.
This repository was archived by the owner on Sep 23, 2024. It is now read-only.

Cannot override discovered schema #128

@deanmorin

Description

@deanmorin

Describe the bug
Changing the schema for a stream in the catalog file has no affect, since it's alway overwritten with the discovered stream in refresh_streams_schema.

To Reproduce
Steps to reproduce the behavior:

  1. Create a test postgres database with a couple of tables:

    CREATE TABLE a (a integer PRIMARY KEY, data jsonb);
    INSERT INTO a VALUES (1, '{}');
    
  2. Create config files for the tap and target, for example:

    tap_config.json

    {
      "host": "127.0.0.1",
      "port": 5432,
      "user": "myuser",
      "password": "mypass",
      "dbname": "tap_postgres",
      "filter_schemas": "public",
      "logical_poll_total_seconds": 60
    }

    target_config.json

    {
      "host": "127.0.0.1",
      "port": 5432,
      "user": "myuser",
      "password": "mypass",
      "dbname": "target_postgres",
      "default_target_schema": "public"
    }
  3. Install the tap and create catalog.json

    $ mkvirtualenv tap-postgres
    $ pip install pipelinewise-tap-postgres==1.8.1
    $ tap-postgres --config tap_config.json --discover > catalog.json
    # Modify the catalog
    # In the metadata section where breadcrumb = [],  add:
    #             "selected": true,
    #             "replication-method": "FULL_TABLE",
    # and under schema->properties->data->type change it to:
    #             ["null", "string"]
    $ deactivate
  4. Install the target

    $ mkvirtualenv target-postgres
    $ pip install pipelinewise-target-postgres==2.1.1
    $ deactivate
  5. Run the pipeline

    $ ~/.virtualenvs/tap-postgres/bin/tap-postgres \
          --config tap_config.json \
          --properties catalog.json \
        | ~/.virtualenvs/target-postgres/bin/target-postgres \
          --config target_config.json
  6. Check the table created in the target

    target_postgres=# SELECT pg_typeof("data") FROM a;
     pg_typeof
    -----------
     jsonb
    (1 row)
    

Expected behavior
If a catalog file is provided, its schema should take precedence over the discovered schema for that stream. The data type in the target should be character varying.

Screenshots
N/A

Your environment

  • Version of tap: [e.g. 1.8.1]
  • Version of python [e.g. 3.9.7]

Additional context
I discovered this while using meltano.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions