Releases: AbsaOSS/cobrix
Releases · AbsaOSS/cobrix
Minor bugfix release
- #481 ASCII control characters are now ignored instead of being replaced with spaces.
A new string trimming policy (keep_all
) allows keeping all control characters in strings (including0x00
). - #484 Fix parsing of ASCII files so that only full records are parsed. The old behavior
can be restored with.option("allow_partial_records", "true")
.
Minor bugfix release
- #474 Fix numeric decoder of unsigned DISPLAY format. The decoder made more strict and does not allow sign
overpunching for unsigned numbers. - #477 Fixed NotSerializableException when using non-default logger implementations
(Thanks @joaquin021).
Minor feature release
- Improved schema flattening method
SparkUtils.flattenSchema()
for dataframes that have arrays. Array size metadata is used to determine maximum array elements, making it much faster for dataframes produced from mainframe files. - #324 Allow removing of FILLERs from AST when parsing using 'parseSimple()'. The signature of the method has
changed. The boolean arguments now reflect more clearly what they do. - #466 Added
maxElements
andminElements
to Spark schema metadata for
array fields created from fields withOCCURS
. This allows knowing the maximum number of elements in arrays when flattening the schema.
Minor bugfix release
- #459 Fixed signed overpunch for ASCII files.
Minor bugfix release
- #451 Fixed COMP-9 (Cobrix extension for little-endian binary fields).
Minor bugfix release
Minor feature release
- #430 Added support for 'twisted' RDW headers when big-endian or little-endian RDWs use unexpected RDW bytes.
Minor feature release
- #420 Add experimental support for fixed blocked (FB) record format.
- #422 Fixed decoding of 'broken pipe' (
¦
) character from EBCDIC. - #424 Fixed an ASCII reader corner case.
Feature Release
- #412 Add support for variable block (VB aka VBVR) record format.
Options to adjust BDW settings are added:is_bdw_big_endian
- specifies if BDW is big-endian (false by default)bdw_adjustment
- Specifies how the value of a BDW is different from the block payload. For example, if the side in BDW headers includes BDW record itself, use.option("bdw_adjustment", "-4")
.- Options
is_record_sequence
andis_xcom
are deprecated. Use.option("record_format", "V")
instead.
- #417 Multisegment ASCII text files have now direct support using
record_format = D
.
Feature Release
- #405 Fix extracting records that contain redefines of the top level GROUPs.
- #406 Use 'collapse_root' retention policy by default. This is the breaking,
change, to restore the original behavior add.option("schema_retention_policy", "keep_original")
. - #407 The layout positions summary generated by the parser now contains level
numbers for root level GROUPs. This is a breaking change if you have unit tests that depend on the formatting of the layout
positions output.