Skip to content

hotgluexyz/gluestick-ts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gluestick TypeScript

A powerful TypeScript library for data processing and ETL operations on the hotglue IPaaS platform, built with Polars for high-performance data manipulation. Supports multiple export formats including CSV, JSON, Parquet, and Singer specification.

Installation

npm install @hotglue/gluestick-ts

npm version

Quick Start

import * as gs from '@hotglue/gluestick-ts';

// Create a Reader to access your data
const reader = new gs.Reader();

// Get available data streams
const streams = reader.keys();
console.log('Available streams:', streams);

// Read and process a specific stream
const dataFrame = reader.get('your_stream_name', { catalogTypes: true });

// Export processed data (defaults to singer)
gs.toExport(dataFrame, 'output_name', './etl-output');

// Export as CSV
gs.toExport(dataFrame, 'output_name', './etl-output', { exportFormat: 'csv' });

Core Components

Reader Class

The Reader class is your main interface for accessing data streams:

const reader = new gs.Reader(inputDir?, rootDir?);

Methods:

  • get(stream, options) - Read a specific stream as a Polars DataFrame
  • keys() - Get all available stream names
  • getPk(stream) - Get primary keys for a stream from catalog

Options:

  • catalogTypes: boolean - Use catalog for automatic type inference
  • Other options will be passed through to Polars when reading. See ReadCSV and ReadParquet options for more information

Export Functions

Export your processed data in multiple formats:

gs.toExport(dataFrame, outputName, outputDir, options?);

Supported formats:

  • Singer (default) - Singer specification format for data integration
  • CSV - Comma-separated values
  • JSON - Single JSON array
  • JSONL - Newline-delimited JSON
  • Parquet - Columnar storage format

Development

Build the project:

npm run build

Run examples:

# Run CSV processing example
npm run run:example:csv

# Run Parquet processing example  
npm run run:example:parquet

API Reference

Reader Constructor

new Reader(inputDir?: string, rootDir?: string)
  • inputDir - Custom input directory (default: ${rootDir}/sync-output)
  • rootDir - Root directory (default: process.env.ROOT_DIR || '.')

Reader Methods

get(stream: string, options?: ReadOptions): DataFrame | null

Read a data stream as a Polars DataFrame.

const df = reader.get('users', { catalogTypes: true });

Options:

  • catalogTypes: boolean - Use catalog for automatic type inference

keys(): string[]

Get all available stream names.

const streams = reader.keys();
// Returns: ['users', 'orders', 'products']

getPk(stream: string): string[] | null

Get primary keys for a stream from the catalog.

const primaryKeys = reader.getPk('users');
// Returns: ['id']

Export Function

toExport(
  dataFrame: DataFrame,
  outputName: string,
  outputDir: string,
  options?: ExportOptions
): void

Parameters:

  • dataFrame - Polars DataFrame to export
  • outputName - Name for the output file (without extension)
  • outputDir - Directory to write the output file
  • options - Export configuration options

Export Options:

interface ExportOptions {
  exportFormat?: 'csv' | 'json' | 'jsonl' | 'parquet' | 'singer';
  outputFilePrefix?: string;
  keys?: string[];  // Primary keys for the data
  stringifyObjects?: boolean;
  reservedVariables?: Record<string, string>;
  allowObjects?: boolean;  // For Singer format
  schema?: SingerHeaderMap;  // For Singer format
}

Examples:

// Export as CSV with prefix
gs.toExport(dataFrame, 'processed_users', './output', {
  exportFormat: 'csv',
  outputFilePrefix: 'tenant_123_',
  keys: ['user_id']
});

// Export as Singer format
gs.toExport(dataFrame, 'processed_users', './output', {
  exportFormat: 'singer',
  allowObjects: true,
  keys: ['user_id']
});

Singer Format Support

Export data in Singer specification format for data integration pipelines:

// Basic Singer export
gs.toExport(dataFrame, 'users', './output', {
  exportFormat: 'singer',
  keys: ['id']
});

// Singer export with object support
gs.toExport(dataFrame, 'users', './output', {
  exportFormat: 'singer',
  allowObjects: true,
  keys: ['id']
});

The Singer export automatically generates SCHEMA, RECORD, and STATE messages according to the Singer specification.

About

gluestick-ts is a library for running ETL operations on hotglue, built on polars

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •