Skip to content

feat: Add support for List data types #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mgrazianoc
Copy link
Contributor

@mgrazianoc mgrazianoc commented Jun 16, 2025

Rationale within the changes

This PR refactors and extends support for nested types in the Arrow integration. The current implementation of ArrowNestedType is tailored primarily for data structs, as seen in StructBufferBuilder. However, it lacks broader support and certain expected functionalities, such as loadStructArrayBuilder.

To address this, the following improvements have been made:

  • Renamed ArrowNestedType to ArrowTypeStruct to align with naming conventions used elsewhere in the codebase.
  • Introduced initial support for ArrowTypeList, including nested lists.

For simplicity, instead of introducing a dedicated subtype for lists, this PR uses an interface of [Any?]?. If this approach proves insufficient, there are more explicit alternatives that can be explored.

NOTE: Work on ArrowCExporter and ArrowCImporter has been intentionally deferred. These components require a deeper understanding of memory ownership and child parsing, and I believe it's better to be addressed in a future PR, unless it's strict necessary.

What's Changed

  1. Renamed ArrowNestedType -> ArrowTypeStruct.
  2. Added support for ArrowTypeList, including nested lists.
  3. Implemented ListArray with basic .asString formatting.
  4. Added ListArrayBuilder.
  5. Extended ArrowArrayBuilder to support the .list type.
  6. Implemented loadStructArrayBuilder and loadListArrayBuilder.
  7. Introduced ListBufferBuilder.
  8. Added ArrowReader.loadListData.
  9. Added makeListHolder.

Are these changes tested?

Tests are included in ArrayTests.swift. It's also working on internal applications, including integration with ArrowFlight.

Closes #16.

@mgrazianoc mgrazianoc force-pushed the mgraziano/list-support branch from 28f225d to 0253f70 Compare June 17, 2025 17:35
@kou kou requested a review from Copilot June 18, 2025 08:12
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends the Arrow integration to support List data types while refactoring nested type handling. Key changes include renaming ArrowNestedType to ArrowTypeStruct, adding a new ArrowTypeList and corresponding builders/arrays, and updating relevant tests to cover both primitive and nested lists.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
Arrow/Tests/ArrowTests/TableTests.swift Updated closure signature and return type for struct builders
Arrow/Tests/ArrowTests/IPCTests.swift Updated schema construction to use the new ArrowTypeStruct
Arrow/Tests/ArrowTests/ArrayTests.swift Added tests for ListArray functionality with both primitive and nested lists
Arrow/Sources/Arrow/ProtoUtil.swift Refactored type creation for struct and added support for list fields
Arrow/Sources/Arrow/ArrowWriter.swift Updated list handling by replacing ArrowNestedType with ArrowTypeStruct
Arrow/Sources/Arrow/ArrowType.swift Renamed ArrowNestedType → ArrowTypeStruct and added ArrowTypeList
Arrow/Sources/Arrow/ArrowReaderHelper.swift & ArrowReader.swift Added list data loading functions
Arrow/Sources/Arrow/ArrowBufferBuilder.swift Replaced nested type usage and added ListBufferBuilder
Arrow/Sources/Arrow/ArrowArrayBuilder.swift Introduced ListArrayBuilder and updated builder resolution
Arrow/Sources/Arrow/ArrowArray.swift Added ListArray implementation with subscript and asString

let innerListType = ArrowTypeList(ArrowType(ArrowType.ArrowInt32))
let outerListBuilder = try ListArrayBuilder(innerListType)

let innerListBuilder = outerListBuilder.valueBuilder as! ListArrayBuilder
Copy link
Preview

Copilot AI Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider replacing the force-cast with a safe cast or adding a runtime type check to avoid a potential crash if the type does not match.

Suggested change
let innerListBuilder = outerListBuilder.valueBuilder as! ListArrayBuilder
guard let innerListBuilder = outerListBuilder.valueBuilder as? ListArrayBuilder else {
XCTFail("Failed to cast valueBuilder to ListArrayBuilder")
return
}

Copilot uses AI. Check for mistakes.

@kou kou changed the title GH-16: Add support for List data types feat: Add support for List data types Jun 18, 2025
@kou
Copy link
Member

kou commented Jun 18, 2025

@abandy You may want to review this.

@abandy
Copy link
Contributor

abandy commented Jun 18, 2025

@abandy You may want to review this.

Will do, thanks!

Comment on lines +318 to +327
case .list:
guard let listType = arrowType as? ArrowTypeList else {
throw ArrowError.invalid("Expected ArrowTypeList for \(arrowType.id)")
}
return try ListArrayBuilder(listType.elementType)
case .strct:
guard let structType = arrowType as? ArrowTypeStruct else {
throw ArrowError.invalid("Expected ArrowStructType for \(arrowType.id)")
}
return try StructArrayBuilder(structType.fields)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use struct -> list order like others?

Suggested change
case .list:
guard let listType = arrowType as? ArrowTypeList else {
throw ArrowError.invalid("Expected ArrowTypeList for \(arrowType.id)")
}
return try ListArrayBuilder(listType.elementType)
case .strct:
guard let structType = arrowType as? ArrowTypeStruct else {
throw ArrowError.invalid("Expected ArrowStructType for \(arrowType.id)")
}
return try StructArrayBuilder(structType.fields)
case .strct:
guard let structType = arrowType as? ArrowTypeStruct else {
throw ArrowError.invalid("Expected ArrowStructType for \(arrowType.id)")
}
return try StructArrayBuilder(structType.fields)
case .list:
guard let listType = arrowType as? ArrowTypeList else {
throw ArrowError.invalid("Expected ArrowTypeList for \(arrowType.id)")
}
return try ListArrayBuilder(listType.elementType)

Comment on lines +398 to +399
if length >= self.offsets.length {
self.resize(length + 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use self.length here?

Suggested change
if length >= self.offsets.length {
self.resize(length + 1)
if self.length >= self.offsets.length {
self.resize(self.length + 1)

XCTAssertEqual(fourthList![3] as? Int32, 9)
}

func testNestedListArray() throws {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func testNestedListArray() throws {
func testListArrayNested() throws {

Comment on lines +482 to +484
outerListBuilder.bufferBuilder.append([nil, nil])
innerListBuilder.append([Int32(1), Int32(2)])
innerListBuilder.append([Int32(3), Int32(4), Int32(5)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. This interface may not be good. Users may think about "what does [nil, nil] mean?".

How about the following interface?

outerListBuilder.append(2) -- The number of elements in this list
innerListBuilder.append([Int32(1), Int32(2)])
innerListBuilder.append([Int32(3), Int32(4), Int32(5)])

@@ -338,14 +338,14 @@ public class Date64BufferBuilder: AbstractWrapperBufferBuilder<Date, Int64> {

public final class StructBufferBuilder: BaseBufferBuilder, ArrowBufferBuilder {
public typealias ItemType = [Any?]
var info: ArrowNestedType?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested type should be able to handle a list without making an individual type for list and struct. A list should be able to be expressed as nested type with only a single field. Is there a reason you are choosing to move away from Nested types?

@@ -405,3 +407,69 @@ public class StructArray: ArrowArray<[Any?]> {
return output
}
}

public class ListArray: ArrowArray<[Any?]> {
Copy link
Contributor

@abandy abandy Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a generic here: like ListArray<T>: ArrowArray<[T?]>?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Swift] Support list (Array) type
3 participants