Skip to content

Commit 700d645

Browse files
yaooqinndongjoon-hyun
authored andcommitted
[SPARK-52666][SQL] Map User Defined Type to correct MutableValue in SpecificInternalRow
### What changes were proposed in this pull request? Map User Defined Type to correct MutableValue in SpecificInternalRow to Fix: ```java Caused by: java.lang.IllegalArgumentException: Spark type: ... doesn't match the type: ... in column vector at org.apache.spark.sql.execution.datasources.parquet.ParquetColumnVector.<init>(ParquetColumnVector.java:80) at org.apache.spark.sql.execution.datasources.parquet.ParquetColumnVector.<init>(ParquetColumnVector.java:139) ``` ### Why are the changes needed? Add UDT missing features ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? - New tests in UT - Manual test ``` spark-sql (default)> select submissionTime from bbb; 25/07/03 09:55:57 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 4) (10.242.151.176 executor 0): java.lang.ClassCastException: class org.apache.spark.sql.catalyst.expressions.MutableAny cannot be cast to class org.apache.spark.sql.catalyst.expressions.MutableLong (org.apache.spark.sql.catalyst.expressions.MutableAny and org.apache.spark.sql.catalyst.expressions.MutableLong are in unnamed module of loader 'app') at org.apache.spark.sql.catalyst.expressions.SpecificInternalRow.setLong(SpecificInternalRow.scala:304) at org.apache.spark.sql.execution.datasources.orc.OrcDeserializer$RowUpdater.setLong(OrcDeserializer.scala:282) spark-sql (default)> select submissionTime from bbb; Thu Jul 03 11:18:39 CST 2025 ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #51356 from yaooqinn/SPARK-52666. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent 4ec992c commit 700d645

File tree

2 files changed

+12
-1
lines changed

2 files changed

+12
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SpecificInternalRow.scala

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@
1717

1818
package org.apache.spark.sql.catalyst.expressions
1919

20+
import scala.annotation.tailrec
21+
2022
import org.apache.spark.sql.types._
2123

2224
/**
@@ -192,6 +194,7 @@ final class MutableAny extends MutableValue {
192194
*/
193195
final class SpecificInternalRow(val values: Array[MutableValue]) extends BaseGenericInternalRow {
194196

197+
@tailrec
195198
private[this] def dataTypeToMutableValue(dataType: DataType): MutableValue = dataType match {
196199
// We use INT for DATE and YearMonthIntervalType internally
197200
case IntegerType | DateType | _: YearMonthIntervalType => new MutableInt
@@ -203,6 +206,7 @@ final class SpecificInternalRow(val values: Array[MutableValue]) extends BaseGen
203206
case BooleanType => new MutableBoolean
204207
case ByteType => new MutableByte
205208
case ShortType => new MutableShort
209+
case udt: UserDefinedType[_] => dataTypeToMutableValue(udt.sqlType)
206210
case _ => new MutableAny
207211
}
208212

sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ import java.util.Arrays
2222

2323
import org.apache.spark.rdd.RDD
2424
import org.apache.spark.sql.catalyst.CatalystTypeConverters
25-
import org.apache.spark.sql.catalyst.expressions.{Cast, CodegenObjectFactoryMode, ExpressionEvalHelper, Literal}
25+
import org.apache.spark.sql.catalyst.expressions.{Cast, CodegenObjectFactoryMode, ExpressionEvalHelper, Literal, SpecificInternalRow}
2626
import org.apache.spark.sql.execution.datasources.parquet.ParquetTest
2727
import org.apache.spark.sql.functions._
2828
import org.apache.spark.sql.internal.SQLConf
@@ -312,4 +312,11 @@ class UserDefinedTypeSuite extends QueryTest with SharedSparkSession with Parque
312312
}
313313
}
314314
}
315+
316+
test("SPARK-52666: Map UDT to correct MutableValue in SpecificInternalRow") {
317+
val udt = new YearUDT()
318+
val row = new SpecificInternalRow(Seq(udt))
319+
row.setInt(0, udt.serialize(Year.of(2018)))
320+
assert(row.getInt(0) == 2018)
321+
}
315322
}

0 commit comments

Comments
 (0)