A powerful TypeScript library for data processing and ETL operations on the hotglue IPaaS platform, built with Polars for high-performance data manipulation. Supports multiple export formats including CSV, JSON, Parquet, and Singer specification.
npm install @hotglue/gluestick-ts
import * as gs from '@hotglue/gluestick-ts';
// Create a Reader to access your data
const reader = new gs.Reader();
// Get available data streams
const streams = reader.keys();
console.log('Available streams:', streams);
// Read and process a specific stream
const dataFrame = reader.get('your_stream_name', { catalogTypes: true });
// Export processed data (defaults to singer)
gs.toExport(dataFrame, 'output_name', './etl-output');
// Export as CSV
gs.toExport(dataFrame, 'output_name', './etl-output', { exportFormat: 'csv' });
The Reader
class is your main interface for accessing data streams:
const reader = new gs.Reader(inputDir?, rootDir?);
Methods:
get(stream, options)
- Read a specific stream as a Polars DataFramekeys()
- Get all available stream namesgetPk(stream)
- Get primary keys for a stream from catalog
Options:
catalogTypes: boolean
- Use catalog for automatic type inference- Other options will be passed through to Polars when reading. See ReadCSV and ReadParquet options for more information
Export your processed data in multiple formats:
gs.toExport(dataFrame, outputName, outputDir, options?);
Supported formats:
- Singer (default) - Singer specification format for data integration
- CSV - Comma-separated values
- JSON - Single JSON array
- JSONL - Newline-delimited JSON
- Parquet - Columnar storage format
Build the project:
npm run build
Run examples:
# Run CSV processing example
npm run run:example:csv
# Run Parquet processing example
npm run run:example:parquet
new Reader(inputDir?: string, rootDir?: string)
inputDir
- Custom input directory (default:${rootDir}/sync-output
)rootDir
- Root directory (default:process.env.ROOT_DIR || '.'
)
Read a data stream as a Polars DataFrame.
const df = reader.get('users', { catalogTypes: true });
Options:
catalogTypes: boolean
- Use catalog for automatic type inference
Get all available stream names.
const streams = reader.keys();
// Returns: ['users', 'orders', 'products']
Get primary keys for a stream from the catalog.
const primaryKeys = reader.getPk('users');
// Returns: ['id']
toExport(
dataFrame: DataFrame,
outputName: string,
outputDir: string,
options?: ExportOptions
): void
Parameters:
dataFrame
- Polars DataFrame to exportoutputName
- Name for the output file (without extension)outputDir
- Directory to write the output fileoptions
- Export configuration options
Export Options:
interface ExportOptions {
exportFormat?: 'csv' | 'json' | 'jsonl' | 'parquet' | 'singer';
outputFilePrefix?: string;
keys?: string[]; // Primary keys for the data
stringifyObjects?: boolean;
reservedVariables?: Record<string, string>;
allowObjects?: boolean; // For Singer format
schema?: SingerHeaderMap; // For Singer format
}
Examples:
// Export as CSV with prefix
gs.toExport(dataFrame, 'processed_users', './output', {
exportFormat: 'csv',
outputFilePrefix: 'tenant_123_',
keys: ['user_id']
});
// Export as Singer format
gs.toExport(dataFrame, 'processed_users', './output', {
exportFormat: 'singer',
allowObjects: true,
keys: ['user_id']
});
Export data in Singer specification format for data integration pipelines:
// Basic Singer export
gs.toExport(dataFrame, 'users', './output', {
exportFormat: 'singer',
keys: ['id']
});
// Singer export with object support
gs.toExport(dataFrame, 'users', './output', {
exportFormat: 'singer',
allowObjects: true,
keys: ['id']
});
The Singer export automatically generates SCHEMA, RECORD, and STATE messages according to the Singer specification.