|
1 | 1 | # xml2table
|
2 | 2 |
|
3 |
| -Simple XML to flat file (e.g. CSV, TSV) conversion utility. |
| 3 | +Simple XML to flat file (e.g. CSV, TSV) conversion utility, modified and extended from [xml2csv](https://github.com/fordfrog/xml2csv) project. |
4 | 4 |
|
5 | 5 | ## What it does exactly?
|
6 | 6 |
|
@@ -45,63 +45,94 @@ located.
|
45 | 45 |
|
46 | 46 | Here is the usage information that xml2table outputs if run without parameters:
|
47 | 47 |
|
48 |
| - Usage: xml2table --columns <columns> --input <file> --output <file> --item-name <xpath> |
49 |
| - |
50 |
| - General command line switches: |
51 |
| - |
52 |
| - --columns <columns> |
53 |
| - List of columns that should be output to the CSV file. These names must |
54 |
| - correspond to the element names within the item element. |
55 |
| - --input <file> |
56 |
| - Path to the input XML file. |
57 |
| - --item-name |
58 |
| - XPath which refers to XML element which will be converted to a row. It cannot |
59 |
| - end with slash (/). |
60 |
| - --join |
61 |
| - Join values of multiple elements into single value using (, ) as a separator. |
62 |
| - By default value of the first element is saved to CSV. |
63 |
| - --output <file> |
64 |
| - Path to the output CSV file. Output file content is always in UTF-8 encoding. |
65 |
| - --separator <character> |
66 |
| - Character that should be used to separate fields. Default value is (;). |
67 |
| - --trim |
68 |
| - Trim values. By default values are not trimmed. |
69 |
| - |
70 |
| - Filtering rows: |
71 |
| - |
72 |
| - --filter-column <name> |
73 |
| - Column on which the filter should be applied. When specifying filter command |
74 |
| - line switches, you must use this switch as the first one as it initializes |
75 |
| - new filter. You can specify more filters, each one beginning with this |
76 |
| - switch. You can filter the rows even on columns that are not part of the |
77 |
| - output. Filtering is performed before remapping. |
78 |
| - ..filter.values <file> |
79 |
| - Path to file containing values that the filter should use. Empty rows are |
80 |
| - added to the values too. |
81 |
| - --filter-exclude |
82 |
| - Excludes all rows where the column value matches one of the specified values. |
83 |
| - --filter-include |
84 |
| - Includes all rows where the column value matches one of the specified values. |
85 |
| - This is the default behavior if --filter-exclude|--filter-include is not |
86 |
| - specified. |
87 |
| - |
88 |
| - Remapping (replacing) values: |
89 |
| - |
90 |
| - --remap-column <name> |
91 |
| - Column in which original values should be replaced with values from map |
92 |
| - file. When specifying remapping command line switches, you must use this |
93 |
| - switch as the first one as it initializes new remapping. You can specify |
94 |
| - more remappings, each one beginning with this switch. Remapping is performed |
95 |
| - after filtering. |
96 |
| - --remap-map <file> |
97 |
| - Path to file containing original value and new value pairs. The file uses |
98 |
| - CSV format. Values can be escaped either using single-quote (') or |
99 |
| - double-quote ("). Quotes within values can be escaped either doubling them |
100 |
| - ("" and '') or backslash-escaping them (\" and \'). |
101 |
| - |
102 |
| -Characters encoding: |
103 |
| - |
104 |
| - Application expects all files being in UTF-8 encoding. |
| 48 | + Usage: xml2table [-hV] ([--parallel[=<threads>]] (--input-file=<file>... |
| 49 | + [--input-file=<file>...]... | --input-dir=<dir>) |
| 50 | + (--output-file=<file> | --output-dir=<dir>)) |
| 51 | + (--row-item-name=<XPath> --columns=<child XPath>[,<child |
| 52 | + XPath>...] [--columns=<child XPath>[,<child XPath>...]]... |
| 53 | + [--separator=<string>] [--no-quote] [--no-header] [--trim] |
| 54 | + [--join-values] [--join-separator=<string>]) |
| 55 | + [--filter-column=<name> --filter-values=<file> |
| 56 | + [--filter-exclude]]... [--remap-column=<name> |
| 57 | + --remap-map=<file>]... |
| 58 | + |
| 59 | + Convert XML to flat files. The application reads and writes files using UTF-8 |
| 60 | + encoding. |
| 61 | + |
| 62 | + -h, --help Show this help message and exit. |
| 63 | + -V, --version Print version information and exit. |
| 64 | + |
| 65 | + File processing options: |
| 66 | + |
| 67 | + --parallel[=<threads>] Enable parallel execution. |
| 68 | + Optionally specify number of threads to run in |
| 69 | + parallel (max: lesser of available processors |
| 70 | + and number of input files). If used without a |
| 71 | + value, uses max threads. If omitted, runs |
| 72 | + single-threaded. |
| 73 | + --input-file=<file>... Path to the input XML file(s). |
| 74 | + --input-dir=<dir> Path to input directory containing XML files. |
| 75 | + Mutually exclusive with --input-file option. |
| 76 | + --output-file=<file> Path to the output file. |
| 77 | + --output-dir=<dir> Path to output directory. Will be created if it |
| 78 | + does not exist. Output file name will be the |
| 79 | + same as input file name with the extension |
| 80 | + replaced by .txt. Mutually exclusive with |
| 81 | + --output-file option. |
| 82 | + |
| 83 | + General options: |
| 84 | + |
| 85 | + --row-item-name=<XPath> |
| 86 | + Parent XPath referring to the XML element that |
| 87 | + will be traversed using child XPath |
| 88 | + specifications from --columns and converted into |
| 89 | + a row. It cannot end with a slash (/). |
| 90 | + --columns=<child XPath>[,<child XPath>...] |
| 91 | + List of columns that should be output to the flat |
| 92 | + file. Columns are specified as child XPath |
| 93 | + expressions relative to --row-item-name. |
| 94 | + --separator=<string> String that should be used to separate output |
| 95 | + columns. |
| 96 | + Default: , |
| 97 | + --no-quote Do not quote values in flat file output. By |
| 98 | + default all values are quoted. |
| 99 | + --no-header Do not output header line with column names. By |
| 100 | + default header line is output. |
| 101 | + --trim Trim leading and trailing whitespace from output |
| 102 | + values. By default values are not trimmed. |
| 103 | + --join-values Join multiple values matched by a child XPath for |
| 104 | + a column into a single string using a separator |
| 105 | + (default: ||). By default, the first matched |
| 106 | + value for the column is selected and stored. |
| 107 | + --join-separator=<string> |
| 108 | + Separator used to join multiple values matched by |
| 109 | + a child XPath for a column when the |
| 110 | + --join-values option is enabled. |
| 111 | + Default: || |
| 112 | + |
| 113 | + Filtering options: |
| 114 | + |
| 115 | + --filter-column=<name> Name of the column to filter on. You can specify |
| 116 | + multiple filters by using this option group |
| 117 | + multiple times. |
| 118 | + --filter-values=<file> Path to file containing values that the filter |
| 119 | + should use. Empty rows are added to the values |
| 120 | + too. |
| 121 | + --filter-exclude Invert filter to exclude matching rows instead of |
| 122 | + the default of including them. |
| 123 | + |
| 124 | + Remapping (value replacement) options: |
| 125 | + |
| 126 | + --remap-column=<name> Name of the column to remap. You can specify |
| 127 | + multiple remap rules by using this option group |
| 128 | + multiple times. Remapping is done after |
| 129 | + filtering. |
| 130 | + --remap-map=<file> Path to file containing original value and new |
| 131 | + value pairs. The file uses CSV format. Values |
| 132 | + can be escaped either using single-quote (') or |
| 133 | + double-quote ("). Quotes within values can be |
| 134 | + escaped by either doubling them ("" and '') or |
| 135 | + backslash-escaping them (\" and \'). |
105 | 136 |
|
106 | 137 | ## License and Acknowledgements
|
107 | 138 |
|
|
0 commit comments