diff --git a/CHANGELOG.md b/CHANGELOG.md index 4d78809..1766b99 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,6 +1,9 @@ # SmarterCSV 1.x Change Log +## 1.14.0 (2025-04-07) + * adding advanced configuration options for writing CSV files. ([issue 297](https://github.com/tilo/smarter_csv/issues/297) thanks to Robert Reiz, [issue 296](https://github.com/tilo/smarter_csv/issues/296)) + ## 1.13.1 (2024-12-12) * fix bug with SmarterCSV.generate with `force_quotes: true` ([issue 294](https://github.com/tilo/smarter_csv/issues/294)) diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md index 8dc180e..637fc00 100644 --- a/CONTRIBUTORS.md +++ b/CONTRIBUTORS.md @@ -58,3 +58,4 @@ A Big Thank you to everyone who filed issues, sent comments, and who contributed * [Simon Rentzke](https://github.com/simonrentzke) * [Randall B](https://github.com/randall-coding) * [Matthew Kennedy](https://github.com/MattKitmanLabs) + * [Robert Reiz](https://github.com/reiz) diff --git a/README.md b/README.md index 76dff8f..51c0a44 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # SmarterCSV - [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) ![Gem Version](https://img.shields.io/gem/v/smarter_csv) [View on RubyGems](https://rubygems.org/gems/smarter_csv) [View on RubyToolbox](https://www.ruby-toolbox.com/search?q=smarter_csv) + ![Gem Version](https://img.shields.io/gem/v/smarter_csv) [![codecov](https://codecov.io/gh/tilo/smarter_csv/branch/main/graph/badge.svg?token=1L7OD80182)](https://codecov.io/gh/tilo/smarter_csv) [View on RubyGems](https://rubygems.org/gems/smarter_csv) [View on RubyToolbox](https://www.ruby-toolbox.com/search?q=smarter_csv) SmarterCSV provides a convenient interface for reading and writing CSV files and data. @@ -35,7 +35,8 @@ Or install it yourself as: # Documentation * [Introduction](docs/_introduction.md) - * [The Basic API](docs/basic_api.md) + * [The Basic Read API](docs/basic_read_api.md) + * [The Basic Write API](docs/basic_write_api.md) * [Batch Processing](./docs/batch_processing.md) * [Configuration Options](docs/options.md) * [Row and Column Separators](docs/row_col_sep.md) @@ -45,10 +46,10 @@ Or install it yourself as: * [Value Converters](docs/value_converters.md) # Articles -* [Parsing CSV Files in Ruby with SmarterCSV](https://tilo-sloboda.medium.com/parsing-csv-files-in-ruby-with-smartercsv-6ce66fb6cf38) -* [Processing 1.4 Million CSV Records in Ruby, fast ](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/) -* [Faster Parsing CSV with Parallel Processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing) by [Jack lin](https://github.com/xjlin0/) -* [The original post](http://www.unixgods.org/Ruby/process_csv_as_hashes.html) that started SmarterCSV + * [Parsing CSV Files in Ruby with SmarterCSV](https://tilo-sloboda.medium.com/parsing-csv-files-in-ruby-with-smartercsv-6ce66fb6cf38) + * [Processing 1.4 Million CSV Records in Ruby, fast ](https://lcx.wien/blog/processing-14-million-csv-records-in-ruby/) + * [Faster Parsing CSV with Parallel Processing](http://xjlin0.github.io/tech/2015/05/25/faster-parsing-csv-with-parallel-processing) by [Jack lin](https://github.com/xjlin0/) + * [The original post](http://www.unixgods.org/Ruby/process_csv_as_hashes.html) that started SmarterCSV # [ChangeLog](./CHANGELOG.md) diff --git a/docs/_introduction.md b/docs/_introduction.md index a4e5aa2..ee37ed8 100644 --- a/docs/_introduction.md +++ b/docs/_introduction.md @@ -2,7 +2,8 @@ ### Contents * [**Introduction**](./_introduction.md) - * [The Basic API](./basic_api.md) + * [The Basic Read API](./basic_read_api.md) + * [The Basic Write API](./basic_write_api.md) * [Batch Processing](././batch_processing.md) * [Configuration Options](./options.md) * [Row and Column Separators](./row_col_sep.md) @@ -53,4 +54,4 @@ The CSV processing also needed to be robust against variations in the input data (planned feature) --------------- -PREVIOUS [README](../README.md) | NEXT: [The Basic API](./basic_api.md) +PREVIOUS [README](../README.md) | NEXT: [The Basic Read API](./basic_read_api.md) diff --git a/docs/basic_api.md b/docs/basic_read_api.md similarity index 74% rename from docs/basic_api.md rename to docs/basic_read_api.md index 2947481..06247dd 100644 --- a/docs/basic_api.md +++ b/docs/basic_read_api.md @@ -2,7 +2,8 @@ ### Contents * [Introduction](./_introduction.md) - * [**The Basic API**](./basic_api.md) + * [**The Basic Read API**](./basic_read_api.md) + * [The Basic Write API](./basic_write_api.md) * [Batch Processing](././batch_processing.md) * [Configuration Options](./options.md) * [Row and Column Separators](./row_col_sep.md) @@ -70,46 +71,6 @@ It cal also be used with a block: This allows you access to the internal state of the `reader` instance after processing. -## Interface for Writing CSV - -To generate a CSV file, we use the `<<` operator to append new data to the file. - -The input operator for adding data to a CSV file `<<` can handle single hashes, array-of-hashes, or array-of-arrays-of-hashes, and can be called one or multiple times for each file. - -One smart feature of writing CSV data is the discovery of headers. - -If you have hashes of data, where each hash can have different keys, the `SmarterCSV::Reader` automatically discovers the superset of keys as the headers of the CSV file. This can be disabled by either providing one of the options `headers`, `map_headers`, or `discover_headers: false`. - - -### Simplified Interface - -The simplified interface takes a block: - - ``` - SmarterCSV.generate(filename, options) do |csv_writer| - - MyModel.find_in_batches(batch_size: 100) do |batch| - batch.pluck(:name, :description, :instructor).each do |record| - csv_writer << record - end - end - - end - ``` - -### Full Interface - - ``` - writer = SmarterCSV::Writer.new(file_path, options) - - MyModel.find_in_batches(batch_size: 100) do |batch| - batch.pluck(:name, :description, :instructor).each do |record| - csv_writer << record - end - - writer.finalize - ``` - ## Rescue from Exceptions While SmarterCSV uses sensible defaults to process the most common CSV files, it will raise exceptions if it can not auto-detect `col_sep`, `row_sep`, or if it encounters other problems. Therefore please rescue from `SmarterCSV::Error`, and handle outliers according to your requirements. @@ -154,4 +115,4 @@ $ hexdump -C spec/fixtures/bom_test_feff.csv ``` ---------------- -PREVIOUS: [Introduction](./_introduction.md) | NEXT: [Batch Processing](./batch_processing.md) +PREVIOUS: [Introduction](./_introduction.md) | NEXT: [The Basic Write API](./basic_write_api.md) diff --git a/docs/basic_write_api.md b/docs/basic_write_api.md new file mode 100644 index 0000000..360b1f4 --- /dev/null +++ b/docs/basic_write_api.md @@ -0,0 +1,160 @@ + +### Contents + + * [Introduction](./_introduction.md) + * [The Basic Read API](./basic_read_api.md) + * [**The Basic Write API**](./basic_write_api.md) + * [Batch Processing](././batch_processing.md) + * [Configuration Options](./options.md) + * [Row and Column Separators](./row_col_sep.md) + * [Header Transformations](./header_transformations.md) + * [Header Validations](./header_validations.md) + * [Data Transformations](./data_transformations.md) + * [Value Converters](./value_converters.md) + +-------------- + +# SmarterCSV Basic Write API + +Let's explore the basic API for writing CSV files. There is a simplified API (backwards conpatible with previous SmarterCSV versions) and the full API, which allows you to access the internal state of the writer instance after processing. + +## Writing CSV Files + +To generate a CSV file, we use the `<<` operator to append new data to the file. + +The input operator for adding data to a CSV file `<<` can handle single hashes, array-of-hashes, or array-of-arrays-of-hashes, and can be called one or multiple times in order to create a file. + +### Auto-Discovery of Headers + +By default, the `SmarterCSV::Writer` discovers all keys that are present in the input data, and as they become know, appends them to the CSV headers. This ensures that all data will be included in the output CSV file. + +If you want to customize the output file, or only include select headers, check the section about Advanced Features below. + +### Auto-Quoting of Problematic Values + +CSV files use some special characters that are important for the CSV format to function: +* @row_sep : typically `\n` the carriage return +* @col_sep : typically `,` the comma +* @quote_char : typically `"` the double-quote + +When your data for a given field in a CSV row contains either of these characters, we need to prevent them to break the CSV file format. + +`SmarterCSV::Writer` automatically detects if a field contains either of these three characters. If a field contains the `@quote_char`, it will be prefixed by another `@qoute_char` as per CSV conventions. +In either case the corresponding field will be put in double-quotes. + + +### Simplified Interface + +The simplified interface takes a block: + + ``` + SmarterCSV.generate(filename, options) do |csv_writer| + + MyModel.find_in_batches(batch_size: 100) do |batch| + batch.pluck(:name, :description, :instructor).each do |record| + csv_writer << record + end + end + + end + ``` + +### Full Interface + + ``` + writer = SmarterCSV::Writer.new(file_path, options) + + MyModel.find_in_batches(batch_size: 100) do |batch| + batch.pluck(:name, :description, :instructor).each do |record| + csv_writer << record + end + + writer.finalize + ``` + +## Advanced Features: Customizing the Output Format + +You can customize the output format through different features. + +In the options, you can pass-in either of these parameters to customize your output format. +* `headers`, which limits the CSV headers to just the specified list. +* `map_header`, which maps a given list of Hash keys to custom strings, and limits the CSV headers to just those. +* `value_converters`, which specifies a hash with more advanced value transformations. + +### Limited Headers + +You can use the `headers` option to limit the CSV headers to only a sub-set of Hash keys from your data. +This will switch-off the automatic detection of headers, and limit the CSV output file to only the CSV headers you provide in this option. + + +### Mapping Headers + +Similar to the `headers` option, you can define `map_headers` in order to rename a given set of Hash keys to some custom strings in order to rename them in the CSV header. This will switch-off the automatic detection of headers. + + +### Per Key Value Converters + + +Using per-key value converters, you can control how specific hash keys in your data are converted in the output. + +Example 1: + +``` + options = { + value_converters: { + active: ->(v) { !!v ? 'YES' : 'NO' }, + } + } +``` + +This maps the boolean value of the hash key `:active` into strings `"YES"`, `"NO"`. + +Example 2: + +``` + options = { + value_converters: { + active: ->(v) { !!v ? '✅' : '❌' }, + balance: ->(v) do + case v + when Float + '$%.2f' % v.round(2) + when Integer + "$#{v}" + else + v.to_s + end + end, + } + } +``` + +This maps the hash key `:balance` to a string. Floats are rounded and displayed with 2 decimals and prefixed by `$`. Integers are prefixed by `$`. +The boolean value of the key `:active` is mapped into an emoji. + +### Global Value Converters + +You can also use the special keyword `:_all` to define transformations that are applied to each field of the CSV file. + +``` + options = { + value_converters: { + disable_auto_quoting: true, # ⚠️ Important: turn off auto-quoting because we're messing with it below + active: ->(v) { !!v ? 'YES' : 'NO' }, + _all: ->(k, v) { v.is_a?(String) ? "\"#{v}\"" : v } # only double-quote string fields + } + } +``` + +Using the `:_all` keyword, you can set up rules to convert all hash keys. This is applied after all per-key conversions are made. + +This example puts double-quotes around all String-value data, but leaves other types unchanged. + +Note that when you're customizing putting quote-chars around fields, you need to `disable_auto_quoting`. + +## More Examples + +Check out the [RSpec tests](../spec/smarter_csv/writer_spec.rb) for more examples. + +---------------- +PREVIOUS: [The Basic Read API](./basic_read_api.md) | NEXT: [Batch Processing](./batch_processing.md) diff --git a/docs/batch_processing.md b/docs/batch_processing.md index 6bd392e..0b511c6 100644 --- a/docs/batch_processing.md +++ b/docs/batch_processing.md @@ -2,7 +2,8 @@ ### Contents * [Introduction](./_introduction.md) - * [The Basic API](./basic_api.md) + * [The Basic Read API](./basic_read_api.md) + * [The Basic Write API](./basic_write_api.md) * [**Batch Processing**](././batch_processing.md) * [Configuration Options](./options.md) * [Row and Column Separators](./row_col_sep.md) @@ -65,4 +66,4 @@ and how the `process` method returns the number of chunks when called with a blo ``` ---------------- -PREVIOUS: [The Basic API](./basic_api.md) | NEXT: [Configuration Options](./options.md) +PREVIOUS: [The Basic Write API](./basic_write_api.md) | NEXT: [Configuration Options](./options.md) diff --git a/docs/data_transformations.md b/docs/data_transformations.md index b099f02..4d3da9e 100644 --- a/docs/data_transformations.md +++ b/docs/data_transformations.md @@ -2,7 +2,8 @@ ### Contents * [Introduction](./_introduction.md) - * [The Basic API](./basic_api.md) + * [The Basic Read API](./basic_read_api.md) + * [The Basic Write API](./basic_write_api.md) * [Batch Processing](././batch_processing.md) * [Configuration Options](./options.md) * [Row and Column Separators](./row_col_sep.md) diff --git a/docs/examples.md b/docs/examples.md index c58d1d9..62a037c 100644 --- a/docs/examples.md +++ b/docs/examples.md @@ -2,7 +2,8 @@ ### Contents * [Introduction](./_introduction.md) - * [The Basic API](./basic_api.md) + * [The Basic Read API](./basic_read_api.md) + * [The Basic Write API](./basic_write_api.md) * [Batch Processing](././batch_processing.md) * [Configuration Options](./options.md) * [Row and Column Separators](./row_col_sep.md) diff --git a/docs/header_transformations.md b/docs/header_transformations.md index 8e99af6..3595860 100644 --- a/docs/header_transformations.md +++ b/docs/header_transformations.md @@ -2,7 +2,8 @@ ### Contents * [Introduction](./_introduction.md) - * [The Basic API](./basic_api.md) + * [The Basic Read API](./basic_read_api.md) + * [The Basic Write API](./basic_write_api.md) * [Batch Processing](././batch_processing.md) * [Configuration Options](./options.md) * [Row and Column Separators](./row_col_sep.md) diff --git a/docs/header_validations.md b/docs/header_validations.md index a8a4956..3365076 100644 --- a/docs/header_validations.md +++ b/docs/header_validations.md @@ -2,7 +2,8 @@ ### Contents * [Introduction](./_introduction.md) - * [The Basic API](./basic_api.md) + * [The Basic Read API](./basic_read_api.md) + * [The Basic Write API](./basic_write_api.md) * [Batch Processing](././batch_processing.md) * [Configuration Options](./options.md) * [Row and Column Separators](./row_col_sep.md) diff --git a/docs/options.md b/docs/options.md index 497e363..6e6774f 100644 --- a/docs/options.md +++ b/docs/options.md @@ -2,7 +2,8 @@ ### Contents * [Introduction](./_introduction.md) - * [The Basic API](./basic_api.md) + * [The Basic Read API](./basic_read_api.md) + * [The Basic Write API](./basic_write_api.md) * [Batch Processing](././batch_processing.md) * [**Configuration Options**](./options.md) * [Row and Column Separators](./row_col_sep.md) @@ -20,14 +21,18 @@ | Option | Default | Explanation | --------------------------------------------------------------------------------------------------------------------------------- | :row_sep | $/ | Separates rows; Defaults to your OS row separator. `/n` on UNIX, `/r/n` oon Windows | - | :col_sep | "," | Separates each value in a row | - | :quote_char | '"' | | + | :col_sep | "," | Separates each value in a row | + | :quote_char | '"' | To quote CSV fields. | | :force_quotes | false | Forces each individual value to be quoted | - | :discover_headers | true | Automatically detects all keys in the input before writing the header | - | | | This can be disabled by providing `headers` or `map_headers` options. | | :headers | [] | You can provide the specific list of keys from the input you'd like to be used as headers in the CSV file | + | | | ⚠️ This disables automatic header detection! | | :map_headers | {} | Similar to `headers`, but also maps each desired key to a user-specified value that is uesd as the header. | - | + | | | ⚠️ This disables automatic header detection! | + | :discover_headers | true | Automatically detects all keys in the input before writing the header | + | | | Do not manually set this to `false`. ⚠️ | + | | | But you can set this to `true` when using `map_headers` option. | + | :disable_auto_quoting | false | To manually disable auto-quoting of special characters. ⚠️ Be careful with this! | + ## CSV Reading diff --git a/docs/row_col_sep.md b/docs/row_col_sep.md index b25a006..9dede76 100644 --- a/docs/row_col_sep.md +++ b/docs/row_col_sep.md @@ -2,7 +2,8 @@ ### Contents * [Introduction](./_introduction.md) - * [The Basic API](./basic_api.md) + * [The Basic Read API](./basic_read_api.md) + * [The Basic Write API](./basic_write_api.md) * [Batch Processing](././batch_processing.md) * [Configuration Options](./options.md) * [**Row and Column Separators**](./row_col_sep.md) diff --git a/docs/value_converters.md b/docs/value_converters.md index b69b8dc..ba7d9aa 100644 --- a/docs/value_converters.md +++ b/docs/value_converters.md @@ -2,7 +2,8 @@ ### Contents * [Introduction](./_introduction.md) - * [The Basic API](./basic_api.md) + * [The Basic Read API](./basic_read_api.md) + * [The Basic Write API](./basic_write_api.md) * [Batch Processing](././batch_processing.md) * [Configuration Options](./options.md) * [Row and Column Separators](./row_col_sep.md) @@ -13,7 +14,7 @@ -------------- -# Using Value Converters +# Using Value Converters for Reading CSV Value Converters allow you to do custom transformations specific rows, to help you massage the data so it fits the expectations of your down-stream process, such as creating a DB record. diff --git a/lib/smarter_csv/parser.rb b/lib/smarter_csv/parser.rb index abe68c9..836027a 100644 --- a/lib/smarter_csv/parser.rb +++ b/lib/smarter_csv/parser.rb @@ -88,7 +88,9 @@ def parse_csv_line_ruby(line, options, header_size = nil) # Check for unclosed quotes at the end of the line if in_quotes + # :nocov: raise MalformedCSV, "Unclosed quoted field detected in line: #{line}" + # :nocov: end # Process the remaining field diff --git a/lib/smarter_csv/reader.rb b/lib/smarter_csv/reader.rb index c7f76b5..eb0c924 100644 --- a/lib/smarter_csv/reader.rb +++ b/lib/smarter_csv/reader.rb @@ -112,7 +112,9 @@ def process(&block) # rubocop:disable Lint/UnusedMethodArgument raise MalformedCSV, "Unclosed quoted field detected in multiline data" else # Quotes are balanced; proceed without raising an error. + # :nocov: break + # :nocov: end end end diff --git a/lib/smarter_csv/version.rb b/lib/smarter_csv/version.rb index 0a7138f..f65d8fa 100644 --- a/lib/smarter_csv/version.rb +++ b/lib/smarter_csv/version.rb @@ -1,5 +1,5 @@ # frozen_string_literal: true module SmarterCSV - VERSION = "1.13.1" + VERSION = "1.14.0" end diff --git a/lib/smarter_csv/writer.rb b/lib/smarter_csv/writer.rb index ed5b474..afa13ac 100644 --- a/lib/smarter_csv/writer.rb +++ b/lib/smarter_csv/writer.rb @@ -1,15 +1,17 @@ # frozen_string_literal: true +require 'tempfile' + module SmarterCSV # # Generate CSV files # # Create an instance of the Writer class with the filename and options. - # call `<<` one or mulltiple times to append data to the file. + # call `<<` one or multiple times to append data to the file. # call `finalize` to save the file. # # The `<<` method can take different arguments: - # * a signle Hash + # * a single Hash # * an array of Hashes # * nested arrays of arrays of Hashes # @@ -29,6 +31,7 @@ module SmarterCSV # headers : defaults to [] # force_quotes: defaults to false # map_headers: defaults to {}, can be a hash of key -> value mappings + # value_converters: optional hash of key -> lambda to control serialization # IMPORTANT NOTES: # * Data hashes could contain strings or symbols as keys. @@ -41,30 +44,32 @@ class Writer def initialize(file_path, options = {}) @options = options - @row_sep = options[:row_sep] || $/ # Defaults to system's row separator. RFC4180 "\r\n" + @row_sep = options[:row_sep] || $/ @col_sep = options[:col_sep] || ',' @quote_char = options[:quote_char] || '"' @force_quotes = options[:force_quotes] == true - @discover_headers = true # defaults to true + @disable_auto_quoting = options[:disable_auto_quoting] == true + @value_converters = options[:value_converters] || {} + @map_all_keys = @value_converters.has_key?(:_all) + @mapped_keys = @value_converters.keys - [:_all] + + @discover_headers = true if options.has_key?(:discover_headers) - # passing in the option overrides the default behavior - @discover_headers = options[:discover_headers] == true + @discover_headers = options[:discover_headers] == true # ⚠️ this option should not be exposed else - # disable discover_headers when headers are given explicitly @discover_headers = !(options.has_key?(:map_headers) || options.has_key?(:headers)) end - @headers = [] # start with empty headers - @headers = options[:headers] if options.has_key?(:headers) # unless explicitly given + + @headers = [] + @headers = options[:headers] if options.has_key?(:headers) @headers = options[:map_headers].keys if options.has_key?(:map_headers) && !options.has_key?(:headers) @map_headers = options[:map_headers] || {} @output_file = File.open(file_path, 'w+') - # hidden state: @temp_file = Tempfile.new('tempfile', '/tmp') @quote_regex = Regexp.union(@col_sep, @row_sep, @quote_char) end - # this can be called many times in order to append lines to the csv file def <<(data) case data when Hash @@ -74,14 +79,15 @@ def <<(data) when NilClass # ignore else + # :nocov: raise InvalidInputData, "Invalid data type: #{data.class}. Must be a Hash or an Array." + # :nocov: end end def finalize - # Map headers if :map_headers option is provided mapped_headers = @headers.map { |header| @map_headers[header] || header } - mapped_headers = mapped_headers.map{|x| escape_csv_field(x)} if @force_quotes + mapped_headers = mapped_headers.map { |x| escape_csv_field(x) } if @force_quotes @temp_file.rewind @output_file.write(mapped_headers.join(@col_sep) + @row_sep) @@ -100,17 +106,43 @@ def process_hash(hash) @headers.concat(new_keys) end - # Reorder the hash to match the current headers order and fill missing fields - ordered_row = @headers.map { |header| hash[header] || '' } + # Reorder the hash to match the current headers order and fill + map missing keys + ordered_row = @headers.map do |header| + has_header = hash.key?(header) + value = has_header ? hash[header] : '' # default to empty value + + # first map individual keys + value = map_value(header, value) if @mapped_keys.include?(header) + + # then apply general mapping rules + value = map_all_values(header, value) if @map_all_keys + + escape_csv_field(value) # for backwards compatibility + end - @temp_file.write ordered_row.map { |value| escape_csv_field(value) }.join(@col_sep) + @row_sep + @temp_file.write ordered_row.join(@col_sep) + @row_sep + end + + def map_value(key, value) + @value_converters[key].call(value) + end + + def map_all_values(key, value) + @value_converters[:_all].call(key, value) end def escape_csv_field(field) - if @force_quotes || field.to_s.match(@quote_regex) - "\"#{field}\"" + str = field.to_s + return str if @disable_auto_quoting + + # double-quote fields if we force that, or if the field contains the comma, new-line, or quote character + contains_special_char = str.to_s.match(@quote_regex) + if @force_quotes || contains_special_char + str = str.gsub(@quote_char, @quote_char * 2) if contains_special_char # escape double-quote + + "\"#{str}\"" else - field.to_s + str end end end diff --git a/spec/smarter_csv/writer_spec.rb b/spec/smarter_csv/writer_spec.rb index 57b8b8a..24825fc 100644 --- a/spec/smarter_csv/writer_spec.rb +++ b/spec/smarter_csv/writer_spec.rb @@ -123,9 +123,9 @@ end context "when map_headers is given explicitly" do - let(:options) { {map_headers: {name: "Person", country: "Country"}} } + let(:options) { {map_headers: { name: "Person", country: "Country"} } } - it 'writes the given headers and data correctly' do + it 'writes the given headers and data correctly and does not auto-discover headers' do create_csv_file output = File.read(file_path) @@ -137,6 +137,27 @@ expect(output).to include("Alex,USA#{row_sep}") end end + + context "when map_headers is given explicitly" do + let(:options) do + { + map_headers: { name: "Person", country: "Country" }, + discover_headers: true # still auto-discover other headers + } + end + + it 'writes the given headers and data correctly and auto-discovers all headers' do + create_csv_file + + output = File.read(file_path) + + expect(output).to include("Person,Country,age,city,state#{row_sep}") + expect(output).to include("John,,30,New York#{row_sep}") + expect(output).to include("Jane,USA,25,#{row_sep}") + expect(output).to include("Mike,,35,Chicago,IL#{row_sep}") + expect(output).to include("Alex,USA,,,#{row_sep}") + end + end end context 'when headers are given explicitly' do @@ -194,169 +215,177 @@ end end - context 'Initialization with Default Options' do - it 'initializes with default options' do - writer = SmarterCSV::Writer.new(file_path) - expect(writer.instance_variable_get(:@discover_headers)).to be true - expect(writer.instance_variable_get(:@headers)).to eq([]) - expect(writer.instance_variable_get(:@col_sep)).to eq(',') - end - end + context 'when automatic header discovery is disabled' do + context 'when we give explicit list of headers' do + let(:options) do + { + headers: [:name, :city, :state] # giving an explicit headers list will disable header discovery + } + end - context 'Initialization with Custom Options' do - it 'initializes with custom options' do - options = { discover_headers: false, headers: ['a', 'b'], col_sep: ';', force_quotes: true, map_headers: { 'a' => 'A' } } - writer = SmarterCSV::Writer.new(file_path, options) - expect(writer.instance_variable_get(:@discover_headers)).to be false - expect(writer.instance_variable_get(:@headers)).to eq(['a', 'b']) - expect(writer.instance_variable_get(:@col_sep)).to eq(';') - expect(writer.instance_variable_get(:@force_quotes)).to be true - expect(writer.instance_variable_get(:@map_headers)).to eq({ 'a' => 'A' }) - end - end + it 'limits the CSV file to only the given headers' do + create_csv_file - context 'Appending Data' do - it 'appends multiple hashes over multiple calls' do - writer = SmarterCSV::Writer.new(file_path) - writer << [{ a: 1, b: 2 }, {c: 3}] - writer << [{ d: 4, a: 5 }] - writer.finalize - output = File.read(file_path) + output = File.read(file_path) - expect(output).to include("a,b,c,d#{row_sep}") - expect(output).to include("1,2#{row_sep}") - expect(output).to include(",,3#{row_sep}") - expect(output).to include("5,,,4#{row_sep}") + expect(output).to include("name,city,state#{row_sep}") + expect(output).to include("John,New York,#{row_sep}") + expect(output).to include("Jane,,#{row_sep}") + expect(output).to include("Mike,Chicago,IL#{row_sep}") + expect(output).to include("Alex,,#{row_sep}") + end end - it 'appends with missing fields' do - writer = SmarterCSV::Writer.new(file_path) - writer << [{ a: 1, b: 2 }, { a: 3 }] - writer.finalize + context 'when we explicitly disable header discovery' do + let(:options) do + { discover_headers: false } # THIS SHOULD NOT BE USED LIKE THIS!! + end - expect(File.read(file_path)).to eq("a,b#{row_sep}1,2#{row_sep}3,#{row_sep}") + it 'limits the CSV file to only the given headers' do + create_csv_file + + output = File.read(file_path) + expect(output).to eq "\n\n\n\n\n" # THIS SHOULD NOT BE USED LIKE THIS!! + end end end - context 'Finalizing the Output File' do - it 'maps headers' do - options = { map_headers: { a: 'A', b: 'B' } } - writer = SmarterCSV::Writer.new(file_path, options) - writer << [{ a: 1, b: 2 }] - writer.finalize + context 'when quoted CSV fields' do + describe 'when quote_char' do + let(:options) { {} } + let(:data_batches) do + [ + { name: 'John', age: 30, city: 'New "York' }, + ] + end + + it 'auto-escapes quote_char' do + create_csv_file - expect(File.read(file_path)).to eq("A,B#{row_sep}1,2#{row_sep}") + output = File.read(file_path) + expect(output).to include("name,age,city#{row_sep}") + expect(output).to include('John,30,"New ""York"') + end end - it 'writes header and appends content to output file' do - writer = SmarterCSV::Writer.new(file_path) - writer << [{ a: 1, b: 2 }] - writer.finalize - expect(File.read(file_path)).to eq("a,b#{row_sep}1,2#{row_sep}") - end + describe 'when special_char row_sep' do + let(:options) { {} } + let(:data_batches) do + [ + { name: 'John', age: 30, city: "New \nYork" }, + ] + end - it 'properly closes the output file' do - writer = SmarterCSV::Writer.new(file_path) - writer << [{ a: 1, b: 2 }] - writer.finalize + it 'auto-escapes row_sep' do + create_csv_file - expect(File).to be_exist(file_path) + output = File.read(file_path) + expect(output).to include("name,age,city#{row_sep}") + expect(output).to match(/John,30,"New \nYork"/) + end end - end - context 'CSV Field Escaping' do - it 'does not quote fields without commas unless force_quotes is enabled' do - writer = SmarterCSV::Writer.new(file_path) - writer << [{ a: 'hello', b: 'world' }] - writer.finalize + describe 'when comma' do + let(:options) { {} } + let(:data_batches) do + [ + { name: 'John', age: 30, city: "New York, New York" }, + ] + end - expect(File.read(file_path)).to eq("a,b#{row_sep}hello,world#{row_sep}") - end + it 'auto-escapes comma' do + create_csv_file - it 'quotes fields with column separator' do - writer = SmarterCSV::Writer.new(file_path) - writer << [{ a: 'hello, world', b: 'test' }] - writer.finalize + output = File.read(file_path) + expect(output).to include("name,age,city#{row_sep}") + expect(output).to match(/John,30,"New York, New York"/) + end + end + end - expect(File.read(file_path)).to eq("a,b#{row_sep}\"hello, world\",test#{row_sep}") + context 'Value Converters' do + let(:options) do + { + value_converters: { + active: ->(v) { v ? 'YES' : 'NO' }, + } + } end - it 'quotes all fields when force_quotes is enabled' do - options = { force_quotes: true } + it 'applies value converters to matching keys' do writer = SmarterCSV::Writer.new(file_path, options) - writer << [{ a: 'hello', b: 'world' }] + writer << { name: 'Alice', age: 42, active: true, balance: 234.235 } writer.finalize - expect(File.read(file_path)).to eq("\"a\",\"b\"#{row_sep}\"hello\",\"world\"#{row_sep}") + output = File.read(file_path) + expect(output).to include("name,age,active,balance#{row_sep}") + expect(output).to include("Alice,42,YES,234.235#{row_sep}") end - context 'force_quotes also applies to headers' do - let(:options) { {force_quotes: true} } - let(:data) do - { name: 'John', age: 30, city: 'New York' } + describe 'when doing advanced mapping' do + let(:options) do + { + disable_auto_quoting: true, # ⚠️ Important: turn off auto-quoting because we're messing with it below + value_converters: { + active: ->(v) { v ? '✅' : '❌' }, + balance: ->(v) do + case v + when Float + '$%.2f' % v.round(2) + when Integer + "$#{v}" + else + v.to_s + end + end, + _all: ->(k, v) { v.is_a?(String) ? "\"#{v}\"" : v } # only double-quote string fields + } + } end - - it 'writes the given headers and data correctly' do + it 'applies all mappings in the correct order' do writer = SmarterCSV::Writer.new(file_path, options) - writer << data + writer << { name: 'Alice', age: 42, active: true, balance: 234.235 } + writer << { name: 'Joe', age: 53, active: false, balance: 32100 } writer.finalize - output = File.read(file_path) - expect(output).to include("\"name\",\"age\",\"city\"#{row_sep}") - expect(output).to include("\"John\",\"30\",\"New York\"#{row_sep}") + output = File.read(file_path) + expect(output).to include("name,age,active,balance#{row_sep}") + expect(output).to include("\"Alice\",42,\"✅\",\"$234.24\"#{row_sep}") + expect(output).to include("\"Joe\",53,\"❌\",\"$32100\"#{row_sep}") end end - end - context 'Edge Cases' do - it 'handles empty hash' do - writer = SmarterCSV::Writer.new(file_path) - writer << [{}] - writer.finalize - - expect(File.read(file_path)).to eq("#{row_sep}#{row_sep}") - end - - it 'handles empty array' do - writer = SmarterCSV::Writer.new(file_path) - writer << [] - writer.finalize - - expect(File.read(file_path)).to eq("#{row_sep}") - end + it 'uses default serialization for fields without a converter' do + partial_options = { + headers: [:name, :age, :active], + value_converters: { + age: ->(v) { v.to_s } + } + } - it 'handles special characters in data' do - writer = SmarterCSV::Writer.new(file_path) - writer << [{ a: "hello#{row_sep}world", b: 'quote"test' }] + writer = SmarterCSV::Writer.new(file_path, partial_options) + writer << { name: 'Bob', age: 50, active: false } writer.finalize - expect(File.read(file_path)).to eq("a,b#{row_sep}\"hello#{row_sep}world\",\"quote\"test\"#{row_sep}") - end - end - - context 'Error Handling' do - it 'raises an error for invalid input data' do - expect do - writer = SmarterCSV::Writer.new(file_path) - writer << "this is invalid" - end.to raise_error SmarterCSV::InvalidInputData + output = File.read(file_path) + expect(output).to include("Bob,50,false#{row_sep}") end - it 'handles file access issues' do - allow(File).to receive(:open).and_raise(Errno::EACCES) - - expect do - SmarterCSV::Writer.new(file_path) - end.to raise_error(Errno::EACCES) - end + it 'handles rows where only some fields use converters' do + partial_options = { + headers: [:name, :age, :active], + value_converters: { + active: ->(v) { v ? 'True' : 'False' } + } + } - it 'handles tempfile issues' do - allow(Tempfile).to receive(:new).and_raise(Errno::ENOENT) + writer = SmarterCSV::Writer.new(file_path, partial_options) + writer << { name: 'Charlie', age: 29, active: true } + writer.finalize - expect do - SmarterCSV::Writer.new(file_path) - end.to raise_error(Errno::ENOENT) + output = File.read(file_path) + expect(output).to include("Charlie,29,True#{row_sep}") end end end