Skip to content

UNKNOWN types like citext sometimes incorrectly treated as arrays #770

@bradnewman

Description

@bradnewman

Version

I've reproduced the problem with vertx-pg-client 3.9.1 and reactive-pg-client 0.11.4, but I believe this issue applies equally to newer versions including the master/4.x branch.

Context

I know that citext is not listed in as a "supported data type", but it is generally working quite well, and the documentation does say that unknown types are treated as Strings.

However, there appears to be a heuristic in DataTypeCodec.java to detect that the data of unknown type is an array, and then parse it as such.

My problem is that when this heuristic is wrong, an exception might be thrown or the data that is returned may be mangled such that the application code cannot determine the actual value of the data as it exists in the database.

Do you have a reproducer?

Before I encountered this issue, I assumed that this code could succeed regardless of the value passed as input:

void citextTest(String input) {
    var rows = client.preparedQuery("SELECT CAST($1 AS citext)")
                     .rxExecute(Tuple.of(input))
                     .blockingGet();
    assertEquals(input, rows.iterator().next().getValue(0));
}

However, values that start with { incorrectly trigger the heuristic:

citextTest("{one,\"two\"}");

In this case, the query yields an array of Strings equivalent to new String[] { "one", "two" }. Because DataTypeCodec removes the quotes, there is not any recourse for the application code to correctly "stitch" the elements of the array back into the original value.

Worse still, other values can actually result in a exceptions. For example:

citextTest("{one,\"two\"");

This throws:

java.lang.IndexOutOfBoundsException: index: 27, length: -3 (expected: range(0, 496))
	at io.netty.buffer.AbstractByteBuf.checkRangeBounds(AbstractByteBuf.java:1425)
	at io.netty.buffer.AbstractByteBuf.checkIndex0(AbstractByteBuf.java:1432)
	at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1419)
	at io.netty.buffer.UnsafeByteBufUtil.getBytes(UnsafeByteBufUtil.java:481)
	at io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:130)
	at io.netty.buffer.ByteBufUtil.decodeString(ByteBufUtil.java:834)
	at io.netty.buffer.AbstractByteBuf.toString(AbstractByteBuf.java:1247)
	at io.vertx.pgclient.impl.codec.DataTypeCodec.textDecodeArrayElement(DataTypeCodec.java:1473)
	at io.vertx.pgclient.impl.codec.DataTypeCodec.textDecodeArray(DataTypeCodec.java:1454)
	at io.vertx.pgclient.impl.codec.DataTypeCodec.defaultDecodeText(DataTypeCodec.java:640)
	at io.vertx.pgclient.impl.codec.DataTypeCodec.decodeText(DataTypeCodec.java:603)
	at io.vertx.pgclient.impl.codec.RowResultDecoder.decodeRow(RowResultDecoder.java:47)
	at io.vertx.sqlclient.impl.RowDecoder.handleRow(RowDecoder.java:52)
	at io.vertx.pgclient.impl.codec.PgDecoder.decodeDataRow(PgDecoder.java:188)
	at io.vertx.pgclient.impl.codec.PgDecoder.channelRead(PgDecoder.java:90)

Here is a gist with the full test class: https://gist.github.com/bradnewman/6f5dff3a157ee644fc95b8fc37b836cb

Note that since citext is an extension, you need to load it before PostgreSQL will recognize it as a type:

CREATE EXTENSION IF NOT EXISTS citext;

Workaround

For now, I've updated all queries that retrieve citext-valued data to cast that data to varchar. Since the oid for varchar is recognized by vertx-pg-client, it bypasses the flawed heuristic and the data is treated as a String, regardless of contents.

Resolution

Ideally, it would be great if vertx-pg-client could have full support for citext. But I realize that since it does not have a stable oid, this might require significant changes. Alternatively, simply removing the heuristic that attempts to handle arrays of unknown types would be satisfactory for my needs, though it might break other's usage.

Avoiding the possibility of an exception and/or improving the heuristic would not be sufficient for my needs, because a non-array citext value could always contain data that just happens to look exactly like PostgreSQL's representation of an array. Since the parsing as an array is potentially destructive, the application code cannot detect that it received a String array where a String was expected and reconstruct the correct value of the data from the array.

Maybe the parsing as an array could be deferred: the Row could be created with the raw String, and then the Row.getStringArray() methods (and others) could parse the data as an array on demand. Thus, an application that knows whether the column is textual or an array could call the appropriate strongly-typed getter method and retrieve the expected value.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions