-
Notifications
You must be signed in to change notification settings - Fork 6
feat: Add support for List data types #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
28f225d
to
0253f70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends the Arrow integration to support List data types while refactoring nested type handling. Key changes include renaming ArrowNestedType to ArrowTypeStruct, adding a new ArrowTypeList and corresponding builders/arrays, and updating relevant tests to cover both primitive and nested lists.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
Arrow/Tests/ArrowTests/TableTests.swift | Updated closure signature and return type for struct builders |
Arrow/Tests/ArrowTests/IPCTests.swift | Updated schema construction to use the new ArrowTypeStruct |
Arrow/Tests/ArrowTests/ArrayTests.swift | Added tests for ListArray functionality with both primitive and nested lists |
Arrow/Sources/Arrow/ProtoUtil.swift | Refactored type creation for struct and added support for list fields |
Arrow/Sources/Arrow/ArrowWriter.swift | Updated list handling by replacing ArrowNestedType with ArrowTypeStruct |
Arrow/Sources/Arrow/ArrowType.swift | Renamed ArrowNestedType → ArrowTypeStruct and added ArrowTypeList |
Arrow/Sources/Arrow/ArrowReaderHelper.swift & ArrowReader.swift | Added list data loading functions |
Arrow/Sources/Arrow/ArrowBufferBuilder.swift | Replaced nested type usage and added ListBufferBuilder |
Arrow/Sources/Arrow/ArrowArrayBuilder.swift | Introduced ListArrayBuilder and updated builder resolution |
Arrow/Sources/Arrow/ArrowArray.swift | Added ListArray implementation with subscript and asString |
let innerListType = ArrowTypeList(ArrowType(ArrowType.ArrowInt32)) | ||
let outerListBuilder = try ListArrayBuilder(innerListType) | ||
|
||
let innerListBuilder = outerListBuilder.valueBuilder as! ListArrayBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider replacing the force-cast with a safe cast or adding a runtime type check to avoid a potential crash if the type does not match.
let innerListBuilder = outerListBuilder.valueBuilder as! ListArrayBuilder | |
guard let innerListBuilder = outerListBuilder.valueBuilder as? ListArrayBuilder else { | |
XCTFail("Failed to cast valueBuilder to ListArrayBuilder") | |
return | |
} |
Copilot uses AI. Check for mistakes.
@abandy You may want to review this. |
Will do, thanks! |
case .list: | ||
guard let listType = arrowType as? ArrowTypeList else { | ||
throw ArrowError.invalid("Expected ArrowTypeList for \(arrowType.id)") | ||
} | ||
return try ListArrayBuilder(listType.elementType) | ||
case .strct: | ||
guard let structType = arrowType as? ArrowTypeStruct else { | ||
throw ArrowError.invalid("Expected ArrowStructType for \(arrowType.id)") | ||
} | ||
return try StructArrayBuilder(structType.fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use struct
-> list
order like others?
case .list: | |
guard let listType = arrowType as? ArrowTypeList else { | |
throw ArrowError.invalid("Expected ArrowTypeList for \(arrowType.id)") | |
} | |
return try ListArrayBuilder(listType.elementType) | |
case .strct: | |
guard let structType = arrowType as? ArrowTypeStruct else { | |
throw ArrowError.invalid("Expected ArrowStructType for \(arrowType.id)") | |
} | |
return try StructArrayBuilder(structType.fields) | |
case .strct: | |
guard let structType = arrowType as? ArrowTypeStruct else { | |
throw ArrowError.invalid("Expected ArrowStructType for \(arrowType.id)") | |
} | |
return try StructArrayBuilder(structType.fields) | |
case .list: | |
guard let listType = arrowType as? ArrowTypeList else { | |
throw ArrowError.invalid("Expected ArrowTypeList for \(arrowType.id)") | |
} | |
return try ListArrayBuilder(listType.elementType) |
if length >= self.offsets.length { | ||
self.resize(length + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use self.length
here?
if length >= self.offsets.length { | |
self.resize(length + 1) | |
if self.length >= self.offsets.length { | |
self.resize(self.length + 1) |
XCTAssertEqual(fourthList![3] as? Int32, 9) | ||
} | ||
|
||
func testNestedListArray() throws { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func testNestedListArray() throws { | |
func testListArrayNested() throws { |
outerListBuilder.bufferBuilder.append([nil, nil]) | ||
innerListBuilder.append([Int32(1), Int32(2)]) | ||
innerListBuilder.append([Int32(3), Int32(4), Int32(5)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. This interface may not be good. Users may think about "what does [nil, nil]
mean?".
How about the following interface?
outerListBuilder.append(2) -- The number of elements in this list
innerListBuilder.append([Int32(1), Int32(2)])
innerListBuilder.append([Int32(3), Int32(4), Int32(5)])
@@ -338,14 +338,14 @@ public class Date64BufferBuilder: AbstractWrapperBufferBuilder<Date, Int64> { | |||
|
|||
public final class StructBufferBuilder: BaseBufferBuilder, ArrowBufferBuilder { | |||
public typealias ItemType = [Any?] | |||
var info: ArrowNestedType? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested type should be able to handle a list without making an individual type for list and struct. A list should be able to be expressed as nested type with only a single field. Is there a reason you are choosing to move away from Nested types?
@@ -405,3 +407,69 @@ public class StructArray: ArrowArray<[Any?]> { | |||
return output | |||
} | |||
} | |||
|
|||
public class ListArray: ArrowArray<[Any?]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a generic here: like ListArray<T>: ArrowArray<[T?]>
?
Rationale within the changes
This PR refactors and extends support for nested types in the Arrow integration. The current implementation of
ArrowNestedType
is tailored primarily for data structs, as seen inStructBufferBuilder
. However, it lacks broader support and certain expected functionalities, such asloadStructArrayBuilder
.To address this, the following improvements have been made:
ArrowNestedType
toArrowTypeStruct
to align with naming conventions used elsewhere in the codebase.ArrowTypeList
, including nested lists.For simplicity, instead of introducing a dedicated subtype for lists, this PR uses an interface of
[Any?]?
. If this approach proves insufficient, there are more explicit alternatives that can be explored.NOTE: Work on
ArrowCExporter
andArrowCImporter
has been intentionally deferred. These components require a deeper understanding of memory ownership and child parsing, and I believe it's better to be addressed in a future PR, unless it's strict necessary.What's Changed
ArrowNestedType -> ArrowTypeStruct
.ArrowTypeList
, including nested lists.ListArray
with basic.asString
formatting.ListArrayBuilder
.ArrowArrayBuilder
to support the.list
type.loadStructArrayBuilder
andloadListArrayBuilder
.ListBufferBuilder
.ArrowReader.loadListData
.makeListHolder
.Are these changes tested?
Tests are included in
ArrayTests.swift
. It's also working on internal applications, including integration withArrowFlight
.Closes #16.