Skip to content

opendatadiscovery/odd-collector-profiler

Repository files navigation

ODD Collector Profiler

Reads batch of data and uses DataProfiler for getting statistics and map them to ODD DataEntities statistics.

Supported data sources

  • Postgres
  • Azure SQL

Config example

Key Value
default_pulling_interval Once per interval collector will collect statistics and send them to ODD Platform
token Token created during collector registration via UI or programmatically
platform_host_url ODD Platform host
profilers List of configs for datasources profilers

collector-config.yaml

default_pulling_interval: 360
token:  <COLLECTOR_TOKEN>
platform_host_url: http://localhost:8080
profilers:
  - type: postgres
    name: my_postgres
    host: localhost
    port: 5432
    username: postgres
    password: ""
    database: db
    tables: ["some_table"]

Docker build

docker build . -t odd_collector_profiler

M1 Issue

Pyodbc

On M1 pyodbc needs to be installed as no-binary. Command below will add that info to poetry.toml:

poetry config --local installer.no-binary pyodbc

grpcio

Needs an env variables:

export GRPC_PYTHON_BUILD_SYSTEM_OPENSSL=1
export GRPC_PYTHON_BUILD_SYSTEM_ZLIB=1

tensorflow

DataProfiler uses tensorflow package for auto-labeling, there are no ready .whl for M1. Need it to be builded and used manually, read the Tensorflow documentation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •