Skip to content

Commit 4a70610

Browse files
jsxs0kou
andauthored
Fix: Allow \r in unquoted fields when row separator doesn't contain \r (#346)
Fixes #60 This has been bugging me for a while - the CSV parser was rejecting `\r` characters in unquoted fields even when the row separator was something completely different like `\n` or a custom separator. For example, this would fail unnecessarily: ```ruby CSV.parse("field1,field\rwith\rcr,field3\n", row_sep: "\n") ``` The problem was in `prepare_unquoted` where we were hardcoding `"\r\n"` instead of checking what the actual row separator was. **What changed:** - Now we only exclude characters that are actually part of the row separator - If your row separator is `\n`, then `\r` is allowed in unquoted fields - If your row separator is `\r\n`, then both `\r` and `\n` are still properly excluded - Quoted fields work exactly the same as before **Testing:** - Updated the tests that were expecting the old behavior - Added comprehensive tests for different row separator scenarios - All existing tests still pass This makes the parser more flexible while keeping it safe for the cases where `\r` should actually be restricted. --------- Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
1 parent 0bcb71f commit 4a70610

File tree

4 files changed

+53
-25
lines changed

4 files changed

+53
-25
lines changed

lib/csv/parser.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -675,7 +675,7 @@ def prepare_quoted
675675
def prepare_unquoted
676676
return if @quote_character.nil?
677677

678-
no_unquoted_values = "\r\n".encode(@encoding)
678+
no_unquoted_values = Regexp.escape(@row_separator).encode(@encoding)
679679
no_unquoted_values << @escaped_first_column_separator
680680
unless @liberal_parsing
681681
no_unquoted_values << @escaped_quote_character

test/csv/parse/test_general.rb

Lines changed: 43 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -138,28 +138,57 @@ def test_non_regex_edge_cases
138138
end
139139
end
140140

141-
def test_malformed_csv_cr_first_line
141+
def test_unquoted_cr_with_lf_row_separator
142+
data = "field1,field\rwith\rcr,field3\nrow2,data,here\n"
143+
expected = [
144+
["field1", "field\rwith\rcr", "field3"],
145+
["row2", "data", "here"]
146+
]
147+
assert_equal(expected, CSV.parse(data, row_sep: "\n"))
148+
end
149+
150+
def test_unquoted_cr_with_custom_row_separator
151+
data = "field1,field\rwith\rcr,field3|row2,data,here|"
152+
expected = [
153+
["field1", "field\rwith\rcr", "field3"],
154+
["row2", "data", "here"]
155+
]
156+
assert_equal(expected, CSV.parse(data, row_sep: "|"))
157+
end
158+
159+
def test_unquoted_cr_with_crlf_row_separator
160+
data = "field1,field\r2,field3\r\nrow2,data,here\r\n"
142161
error = assert_raise(CSV::MalformedCSVError) do
143-
CSV.parse_line("1,2\r,3", row_sep: "\n")
162+
CSV.parse(data, row_sep: "\r\n")
144163
end
145164
assert_equal("Unquoted fields do not allow new line <\"\\r\"> in line 1.",
146165
error.message)
147166
end
148167

149-
def test_malformed_csv_cr_middle_line
150-
csv = <<-CSV
151-
line,1,abc
152-
line,2,"def\nghi"
168+
def test_quoted_cr_with_custom_row_separator
169+
data = "field1,\"field\rwith\rcr\",field3|row2,data,here|"
170+
expected = [
171+
["field1", "field\rwith\rcr", "field3"],
172+
["row2", "data", "here"]
173+
]
174+
assert_equal(expected, CSV.parse(data, row_sep: "|"))
175+
end
153176

154-
line,4,some\rjunk
155-
line,5,jkl
156-
CSV
177+
def test_unquoted_cr_in_middle_line
178+
csv = "line,1,abc\nline,2,\"def\nghi\"\nline,4,some\rjunk\nline,5,jkl\n"
179+
result = CSV.parse(csv)
180+
expected = [
181+
["line", "1", "abc"],
182+
["line", "2", "def\nghi"],
183+
["line", "4", "some\rjunk"],
184+
["line", "5", "jkl"]
185+
]
186+
assert_equal(expected, result)
187+
end
157188

158-
error = assert_raise(CSV::MalformedCSVError) do
159-
CSV.parse(csv)
160-
end
161-
assert_equal("Unquoted fields do not allow new line <\"\\r\"> in line 4.",
162-
error.message)
189+
def test_empty_rows_with_cr
190+
result = CSV.parse("\n" + "\r")
191+
assert_equal([[], ["\r"]], result)
163192
end
164193

165194
def test_malformed_csv_unclosed_quote

test/csv/parse/test_invalid.rb

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,8 @@
1-
# -*- coding: utf-8 -*-
21
# frozen_string_literal: false
32

43
require_relative "../helper"
54

65
class TestCSVParseInvalid < Test::Unit::TestCase
7-
def test_no_column_mixed_new_lines
8-
error = assert_raise(CSV::MalformedCSVError) do
9-
CSV.parse("\n" +
10-
"\r")
11-
end
12-
assert_equal("New line must be <\"\\n\"> not <\"\\r\"> in line 2.",
13-
error.message)
14-
end
15-
166
def test_ignore_invalid_line
177
csv = CSV.new(<<-CSV, headers: true, return_headers: true)
188
head1,head2,head3

test/csv/parse/test_liberal_parsing.rb

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,15 @@ def test_space_quote
8080
CSV.parse(input, liberal_parsing: true))
8181
end
8282

83+
def test_unquoted_cr_with_custom_row_separator
84+
data = "field1,field\rwith\rcr,field3|row2,data,here|"
85+
expected = [
86+
["field1", "field\rwith\rcr", "field3"],
87+
["row2", "data", "here"]
88+
]
89+
assert_equal(expected, CSV.parse(data, row_sep: "|", liberal_parsing: true))
90+
end
91+
8392
def test_double_quote_outside_quote
8493
data = %Q{a,""b""}
8594
error = assert_raise(CSV::MalformedCSVError) do

0 commit comments

Comments
 (0)