Skip to content

Commit 476c110

Browse files
committed
Add new features and refactor code for v0.1.0 release
New features: - Add parallel execution and support for multiple file processing - Enable input and output from/to directories - Allow custom join separators when merging multiple values in columns - Add option to disable value quoting in output files - Add option to suppress headers in output files - Improve interface clarity and usability Refactoring: - Refactor Convertor class to store state in instance variables, replacing static methods for state passing - Change static functions in Convertor to class methods - Convert recursive logic in processRoot and processItem to iteration - Implement static factory methods for instantiating Convertor - Migrate command line options processing and usage text to PicoCLI
1 parent 0c190d5 commit 476c110

File tree

8 files changed

+643
-435
lines changed

8 files changed

+643
-435
lines changed

LICENSE.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# MIT License
2+
3+
Copyright (c) 2025 peter277 ([https://github.com/peter277](https://github.com/peter277))
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.
22+
23+
# Acknowledgments
24+
25+
This project was initially based on [xml2csv](https://github.com/fordfrog/xml2csv),
26+
created by Miroslav Šulc and released under the MIT License.
27+
28+
Significant modifications and extensions have been made in this version.

README.md

Lines changed: 89 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# xml2table
22

3-
Simple XML to flat file (e.g. CSV, TSV) conversion utility.
3+
Simple XML to flat file (e.g. CSV, TSV) conversion utility, modified and extended from [xml2csv](https://github.com/fordfrog/xml2csv) project.
44

55
## What it does exactly?
66

@@ -45,63 +45,94 @@ located.
4545

4646
Here is the usage information that xml2table outputs if run without parameters:
4747

48-
Usage: xml2table --columns <columns> --input <file> --output <file> --item-name <xpath>
49-
50-
General command line switches:
51-
52-
--columns <columns>
53-
List of columns that should be output to the CSV file. These names must
54-
correspond to the element names within the item element.
55-
--input <file>
56-
Path to the input XML file.
57-
--item-name
58-
XPath which refers to XML element which will be converted to a row. It cannot
59-
end with slash (/).
60-
--join
61-
Join values of multiple elements into single value using (, ) as a separator.
62-
By default value of the first element is saved to CSV.
63-
--output <file>
64-
Path to the output CSV file. Output file content is always in UTF-8 encoding.
65-
--separator <character>
66-
Character that should be used to separate fields. Default value is (;).
67-
--trim
68-
Trim values. By default values are not trimmed.
69-
70-
Filtering rows:
71-
72-
--filter-column <name>
73-
Column on which the filter should be applied. When specifying filter command
74-
line switches, you must use this switch as the first one as it initializes
75-
new filter. You can specify more filters, each one beginning with this
76-
switch. You can filter the rows even on columns that are not part of the
77-
output. Filtering is performed before remapping.
78-
..filter.values <file>
79-
Path to file containing values that the filter should use. Empty rows are
80-
added to the values too.
81-
--filter-exclude
82-
Excludes all rows where the column value matches one of the specified values.
83-
--filter-include
84-
Includes all rows where the column value matches one of the specified values.
85-
This is the default behavior if --filter-exclude|--filter-include is not
86-
specified.
87-
88-
Remapping (replacing) values:
89-
90-
--remap-column <name>
91-
Column in which original values should be replaced with values from map
92-
file. When specifying remapping command line switches, you must use this
93-
switch as the first one as it initializes new remapping. You can specify
94-
more remappings, each one beginning with this switch. Remapping is performed
95-
after filtering.
96-
--remap-map <file>
97-
Path to file containing original value and new value pairs. The file uses
98-
CSV format. Values can be escaped either using single-quote (') or
99-
double-quote ("). Quotes within values can be escaped either doubling them
100-
("" and '') or backslash-escaping them (\" and \').
101-
102-
Characters encoding:
103-
104-
Application expects all files being in UTF-8 encoding.
48+
Usage: xml2table [-hV] ([--parallel[=<threads>]] (--input-file=<file>...
49+
[--input-file=<file>...]... | --input-dir=<dir>)
50+
(--output-file=<file> | --output-dir=<dir>))
51+
(--row-item-name=<XPath> --columns=<child XPath>[,<child
52+
XPath>...] [--columns=<child XPath>[,<child XPath>...]]...
53+
[--separator=<string>] [--no-quote] [--no-header] [--trim]
54+
[--join-values] [--join-separator=<string>])
55+
[--filter-column=<name> --filter-values=<file>
56+
[--filter-exclude]]... [--remap-column=<name>
57+
--remap-map=<file>]...
58+
59+
Convert XML to flat files. The application reads and writes files using UTF-8
60+
encoding.
61+
62+
-h, --help Show this help message and exit.
63+
-V, --version Print version information and exit.
64+
65+
File processing options:
66+
67+
--parallel[=<threads>] Enable parallel execution.
68+
Optionally specify number of threads to run in
69+
parallel (max: lesser of available processors
70+
and number of input files). If used without a
71+
value, uses max threads. If omitted, runs
72+
single-threaded.
73+
--input-file=<file>... Path to the input XML file(s).
74+
--input-dir=<dir> Path to input directory containing XML files.
75+
Mutually exclusive with --input-file option.
76+
--output-file=<file> Path to the output file.
77+
--output-dir=<dir> Path to output directory. Will be created if it
78+
does not exist. Output file name will be the
79+
same as input file name with the extension
80+
replaced by .txt. Mutually exclusive with
81+
--output-file option.
82+
83+
General options:
84+
85+
--row-item-name=<XPath>
86+
Parent XPath referring to the XML element that
87+
will be traversed using child XPath
88+
specifications from --columns and converted into
89+
a row. It cannot end with a slash (/).
90+
--columns=<child XPath>[,<child XPath>...]
91+
List of columns that should be output to the flat
92+
file. Columns are specified as child XPath
93+
expressions relative to --row-item-name.
94+
--separator=<string> String that should be used to separate output
95+
columns.
96+
Default: ,
97+
--no-quote Do not quote values in flat file output. By
98+
default all values are quoted.
99+
--no-header Do not output header line with column names. By
100+
default header line is output.
101+
--trim Trim leading and trailing whitespace from output
102+
values. By default values are not trimmed.
103+
--join-values Join multiple values matched by a child XPath for
104+
a column into a single string using a separator
105+
(default: ||). By default, the first matched
106+
value for the column is selected and stored.
107+
--join-separator=<string>
108+
Separator used to join multiple values matched by
109+
a child XPath for a column when the
110+
--join-values option is enabled.
111+
Default: ||
112+
113+
Filtering options:
114+
115+
--filter-column=<name> Name of the column to filter on. You can specify
116+
multiple filters by using this option group
117+
multiple times.
118+
--filter-values=<file> Path to file containing values that the filter
119+
should use. Empty rows are added to the values
120+
too.
121+
--filter-exclude Invert filter to exclude matching rows instead of
122+
the default of including them.
123+
124+
Remapping (value replacement) options:
125+
126+
--remap-column=<name> Name of the column to remap. You can specify
127+
multiple remap rules by using this option group
128+
multiple times. Remapping is done after
129+
filtering.
130+
--remap-map=<file> Path to file containing original value and new
131+
value pairs. The file uses CSV format. Values
132+
can be escaped either using single-quote (') or
133+
double-quote ("). Quotes within values can be
134+
escaped by either doubling them ("" and '') or
135+
backslash-escaping them (\" and \').
105136

106137
## License and Acknowledgements
107138

0 commit comments

Comments
 (0)