Skip to content

work towards 1.14.0 #298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@

# SmarterCSV 1.x Change Log

## 1.14.0 (2025-04-07)
* adding advanced configuration options for writing CSV files. ([issue 297](https://github.com/tilo/smarter_csv/issues/297) thanks to Robert Reiz, [issue 296](https://github.com/tilo/smarter_csv/issues/296))

## 1.13.1 (2024-12-12)
* fix bug with SmarterCSV.generate with `force_quotes: true` ([issue 294](https://github.com/tilo/smarter_csv/issues/294))

Expand Down
1 change: 1 addition & 0 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,3 +58,4 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed
* [Simon Rentzke](https://github.com/simonrentzke)
* [Randall B](https://github.com/randall-coding)
* [Matthew Kennedy](https://github.com/MattKitmanLabs)
* [Robert Reiz](https://github.com/reiz)
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

# SmarterCSV

[![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) ![Gem Version](https://img.shields.io/gem/v/smarter_csv) [View on RubyGems](https://rubygems.org/gems/smarter_csv) [View on RubyToolbox](https://www.ruby-toolbox.com/search?q=smarter_csv)
![Gem Version](https://img.shields.io/gem/v/smarter_csv) [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) [View on RubyGems](https://rubygems.org/gems/smarter_csv) [View on RubyToolbox](https://www.ruby-toolbox.com/search?q=smarter_csv)

SmarterCSV provides a convenient interface for reading and writing CSV files and data.

Expand Down Expand Up @@ -35,7 +35,8 @@ Or install it yourself as:
# Documentation

* [Introduction](docs/_introduction.md)
* [The Basic API](docs/basic_api.md)
* [The Basic Read API](docs/basic_read_api.md)
* [The Basic Write API](docs/basic_write_api.md)
* [Batch Processing](./docs/batch_processing.md)
* [Configuration Options](docs/options.md)
* [Row and Column Separators](docs/row_col_sep.md)
Expand All @@ -45,10 +46,10 @@ Or install it yourself as:
* [Value Converters](docs/value_converters.md)

# Articles
* [Parsing CSV Files in Ruby with SmarterCSV](https://tilo-sloboda.medium.com/parsing-csv-files-in-ruby-with-smartercsv-6ce66fb6cf38)
* [Processing 1.4 Million CSV Records in Ruby, fast ](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/)
* [Faster Parsing CSV with Parallel Processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing) by [Jack lin](https://github.com/xjlin0/)
* [The original post](http://www.unixgods.org/Ruby/process_csv_as_hashes.html) that started SmarterCSV
* [Parsing CSV Files in Ruby with SmarterCSV](https://tilo-sloboda.medium.com/parsing-csv-files-in-ruby-with-smartercsv-6ce66fb6cf38)
* [Processing 1.4 Million CSV Records in Ruby, fast ](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/)
* [Faster Parsing CSV with Parallel Processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing) by [Jack lin](https://github.com/xjlin0/)
* [The original post](http://www.unixgods.org/Ruby/process_csv_as_hashes.html) that started SmarterCSV

# [ChangeLog](./CHANGELOG.md)

Expand Down
5 changes: 3 additions & 2 deletions docs/_introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
### Contents

* [**Introduction**](./_introduction.md)
* [The Basic API](./basic_api.md)
* [The Basic Read API](./basic_read_api.md)
* [The Basic Write API](./basic_write_api.md)
* [Batch Processing](././batch_processing.md)
* [Configuration Options](./options.md)
* [Row and Column Separators](./row_col_sep.md)
Expand Down Expand Up @@ -53,4 +54,4 @@ The CSV processing also needed to be robust against variations in the input data
(planned feature)

---------------
PREVIOUS [README](../README.md) | NEXT: [The Basic API](./basic_api.md)
PREVIOUS [README](../README.md) | NEXT: [The Basic Read API](./basic_read_api.md)
45 changes: 3 additions & 42 deletions docs/basic_api.md → docs/basic_read_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
### Contents

* [Introduction](./_introduction.md)
* [**The Basic API**](./basic_api.md)
* [**The Basic Read API**](./basic_read_api.md)
* [The Basic Write API](./basic_write_api.md)
* [Batch Processing](././batch_processing.md)
* [Configuration Options](./options.md)
* [Row and Column Separators](./row_col_sep.md)
Expand Down Expand Up @@ -70,46 +71,6 @@ It cal also be used with a block:
This allows you access to the internal state of the `reader` instance after processing.


## Interface for Writing CSV

To generate a CSV file, we use the `<<` operator to append new data to the file.

The input operator for adding data to a CSV file `<<` can handle single hashes, array-of-hashes, or array-of-arrays-of-hashes, and can be called one or multiple times for each file.

One smart feature of writing CSV data is the discovery of headers.

If you have hashes of data, where each hash can have different keys, the `SmarterCSV::Reader` automatically discovers the superset of keys as the headers of the CSV file. This can be disabled by either providing one of the options `headers`, `map_headers`, or `discover_headers: false`.


### Simplified Interface

The simplified interface takes a block:

```
SmarterCSV.generate(filename, options) do |csv_writer|

MyModel.find_in_batches(batch_size: 100) do |batch|
batch.pluck(:name, :description, :instructor).each do |record|
csv_writer << record
end
end

end
```

### Full Interface

```
writer = SmarterCSV::Writer.new(file_path, options)

MyModel.find_in_batches(batch_size: 100) do |batch|
batch.pluck(:name, :description, :instructor).each do |record|
csv_writer << record
end

writer.finalize
```

## Rescue from Exceptions

While SmarterCSV uses sensible defaults to process the most common CSV files, it will raise exceptions if it can not auto-detect `col_sep`, `row_sep`, or if it encounters other problems. Therefore please rescue from `SmarterCSV::Error`, and handle outliers according to your requirements.
Expand Down Expand Up @@ -154,4 +115,4 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv
```

----------------
PREVIOUS: [Introduction](./_introduction.md) | NEXT: [Batch Processing](./batch_processing.md)
PREVIOUS: [Introduction](./_introduction.md) | NEXT: [The Basic Write API](./basic_write_api.md)
160 changes: 160 additions & 0 deletions docs/basic_write_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@

### Contents

* [Introduction](./_introduction.md)
* [The Basic Read API](./basic_read_api.md)
* [**The Basic Write API**](./basic_write_api.md)
* [Batch Processing](././batch_processing.md)
* [Configuration Options](./options.md)
* [Row and Column Separators](./row_col_sep.md)
* [Header Transformations](./header_transformations.md)
* [Header Validations](./header_validations.md)
* [Data Transformations](./data_transformations.md)
* [Value Converters](./value_converters.md)

--------------

# SmarterCSV Basic Write API

Let's explore the basic API for writing CSV files. There is a simplified API (backwards conpatible with previous SmarterCSV versions) and the full API, which allows you to access the internal state of the writer instance after processing.

## Writing CSV Files

To generate a CSV file, we use the `<<` operator to append new data to the file.

The input operator for adding data to a CSV file `<<` can handle single hashes, array-of-hashes, or array-of-arrays-of-hashes, and can be called one or multiple times in order to create a file.

### Auto-Discovery of Headers

By default, the `SmarterCSV::Writer` discovers all keys that are present in the input data, and as they become know, appends them to the CSV headers. This ensures that all data will be included in the output CSV file.

If you want to customize the output file, or only include select headers, check the section about Advanced Features below.

### Auto-Quoting of Problematic Values

CSV files use some special characters that are important for the CSV format to function:
* @row_sep : typically `\n` the carriage return
* @col_sep : typically `,` the comma
* @quote_char : typically `"` the double-quote

When your data for a given field in a CSV row contains either of these characters, we need to prevent them to break the CSV file format.

`SmarterCSV::Writer` automatically detects if a field contains either of these three characters. If a field contains the `@quote_char`, it will be prefixed by another `@qoute_char` as per CSV conventions.
In either case the corresponding field will be put in double-quotes.


### Simplified Interface

The simplified interface takes a block:

```
SmarterCSV.generate(filename, options) do |csv_writer|

MyModel.find_in_batches(batch_size: 100) do |batch|
batch.pluck(:name, :description, :instructor).each do |record|
csv_writer << record
end
end

end
```

### Full Interface

```
writer = SmarterCSV::Writer.new(file_path, options)

MyModel.find_in_batches(batch_size: 100) do |batch|
batch.pluck(:name, :description, :instructor).each do |record|
csv_writer << record
end

writer.finalize
```

## Advanced Features: Customizing the Output Format

You can customize the output format through different features.

In the options, you can pass-in either of these parameters to customize your output format.
* `headers`, which limits the CSV headers to just the specified list.
* `map_header`, which maps a given list of Hash keys to custom strings, and limits the CSV headers to just those.
* `value_converters`, which specifies a hash with more advanced value transformations.

### Limited Headers

You can use the `headers` option to limit the CSV headers to only a sub-set of Hash keys from your data.
This will switch-off the automatic detection of headers, and limit the CSV output file to only the CSV headers you provide in this option.


### Mapping Headers

Similar to the `headers` option, you can define `map_headers` in order to rename a given set of Hash keys to some custom strings in order to rename them in the CSV header. This will switch-off the automatic detection of headers.


### Per Key Value Converters


Using per-key value converters, you can control how specific hash keys in your data are converted in the output.

Example 1:

```
options = {
value_converters: {
active: ->(v) { !!v ? 'YES' : 'NO' },
}
}
```

This maps the boolean value of the hash key `:active` into strings `"YES"`, `"NO"`.

Example 2:

```
options = {
value_converters: {
active: ->(v) { !!v ? '✅' : '❌' },
balance: ->(v) do
case v
when Float
'$%.2f' % v.round(2)
when Integer
"$#{v}"
else
v.to_s
end
end,
}
}
```

This maps the hash key `:balance` to a string. Floats are rounded and displayed with 2 decimals and prefixed by `$`. Integers are prefixed by `$`.
The boolean value of the key `:active` is mapped into an emoji.

### Global Value Converters

You can also use the special keyword `:_all` to define transformations that are applied to each field of the CSV file.

```
options = {
value_converters: {
disable_auto_quoting: true, # ⚠️ Important: turn off auto-quoting because we're messing with it below
active: ->(v) { !!v ? 'YES' : 'NO' },
_all: ->(k, v) { v.is_a?(String) ? "\"#{v}\"" : v } # only double-quote string fields
}
}
```

Using the `:_all` keyword, you can set up rules to convert all hash keys. This is applied after all per-key conversions are made.

This example puts double-quotes around all String-value data, but leaves other types unchanged.

Note that when you're customizing putting quote-chars around fields, you need to `disable_auto_quoting`.

## More Examples

Check out the [RSpec tests](../spec/smarter_csv/writer_spec.rb) for more examples.

----------------
PREVIOUS: [The Basic Read API](./basic_read_api.md) | NEXT: [Batch Processing](./batch_processing.md)
5 changes: 3 additions & 2 deletions docs/batch_processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
### Contents

* [Introduction](./_introduction.md)
* [The Basic API](./basic_api.md)
* [The Basic Read API](./basic_read_api.md)
* [The Basic Write API](./basic_write_api.md)
* [**Batch Processing**](././batch_processing.md)
* [Configuration Options](./options.md)
* [Row and Column Separators](./row_col_sep.md)
Expand Down Expand Up @@ -65,4 +66,4 @@ and how the `process` method returns the number of chunks when called with a blo
```

----------------
PREVIOUS: [The Basic API](./basic_api.md) | NEXT: [Configuration Options](./options.md)
PREVIOUS: [The Basic Write API](./basic_write_api.md) | NEXT: [Configuration Options](./options.md)
3 changes: 2 additions & 1 deletion docs/data_transformations.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
### Contents

* [Introduction](./_introduction.md)
* [The Basic API](./basic_api.md)
* [The Basic Read API](./basic_read_api.md)
* [The Basic Write API](./basic_write_api.md)
* [Batch Processing](././batch_processing.md)
* [Configuration Options](./options.md)
* [Row and Column Separators](./row_col_sep.md)
Expand Down
3 changes: 2 additions & 1 deletion docs/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
### Contents

* [Introduction](./_introduction.md)
* [The Basic API](./basic_api.md)
* [The Basic Read API](./basic_read_api.md)
* [The Basic Write API](./basic_write_api.md)
* [Batch Processing](././batch_processing.md)
* [Configuration Options](./options.md)
* [Row and Column Separators](./row_col_sep.md)
Expand Down
3 changes: 2 additions & 1 deletion docs/header_transformations.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
### Contents

* [Introduction](./_introduction.md)
* [The Basic API](./basic_api.md)
* [The Basic Read API](./basic_read_api.md)
* [The Basic Write API](./basic_write_api.md)
* [Batch Processing](././batch_processing.md)
* [Configuration Options](./options.md)
* [Row and Column Separators](./row_col_sep.md)
Expand Down
3 changes: 2 additions & 1 deletion docs/header_validations.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
### Contents

* [Introduction](./_introduction.md)
* [The Basic API](./basic_api.md)
* [The Basic Read API](./basic_read_api.md)
* [The Basic Write API](./basic_write_api.md)
* [Batch Processing](././batch_processing.md)
* [Configuration Options](./options.md)
* [Row and Column Separators](./row_col_sep.md)
Expand Down
17 changes: 11 additions & 6 deletions docs/options.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
### Contents

* [Introduction](./_introduction.md)
* [The Basic API](./basic_api.md)
* [The Basic Read API](./basic_read_api.md)
* [The Basic Write API](./basic_write_api.md)
* [Batch Processing](././batch_processing.md)
* [**Configuration Options**](./options.md)
* [Row and Column Separators](./row_col_sep.md)
Expand All @@ -20,14 +21,18 @@
| Option | Default | Explanation |
---------------------------------------------------------------------------------------------------------------------------------
| :row_sep | $/ | Separates rows; Defaults to your OS row separator. `/n` on UNIX, `/r/n` oon Windows |
| :col_sep | "," | Separates each value in a row |
| :quote_char | '"' | |
| :col_sep | "," | Separates each value in a row |
| :quote_char | '"' | To quote CSV fields. |
| :force_quotes | false | Forces each individual value to be quoted |
| :discover_headers | true | Automatically detects all keys in the input before writing the header |
| | | This can be disabled by providing `headers` or `map_headers` options. |
| :headers | [] | You can provide the specific list of keys from the input you'd like to be used as headers in the CSV file |
| | | ⚠️ This disables automatic header detection! |
| :map_headers | {} | Similar to `headers`, but also maps each desired key to a user-specified value that is uesd as the header. |
|
| | | ⚠️ This disables automatic header detection! |
| :discover_headers | true | Automatically detects all keys in the input before writing the header |
| | | Do not manually set this to `false`. ⚠️ |
| | | But you can set this to `true` when using `map_headers` option. |
| :disable_auto_quoting | false | To manually disable auto-quoting of special characters. ⚠️ Be careful with this! |


## CSV Reading

Expand Down
3 changes: 2 additions & 1 deletion docs/row_col_sep.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
### Contents

* [Introduction](./_introduction.md)
* [The Basic API](./basic_api.md)
* [The Basic Read API](./basic_read_api.md)
* [The Basic Write API](./basic_write_api.md)
* [Batch Processing](././batch_processing.md)
* [Configuration Options](./options.md)
* [**Row and Column Separators**](./row_col_sep.md)
Expand Down
Loading
Loading