Connection parameters to mTLS enabled Milvus Missing; and Data Mismatch between Spark Dataframe and Milvus

Hello,

I am trying to connect to a Milvus instance which has mTLS enabled, and hosted on kubernetes
When trying to connect to it we noticed that the configuration options for providing client certs is missing.

So we enhanced the code by adding parameters for setting client certs and CA certs via changes to the MilvusOptions Format file at : [MilvusOptions.scala](https://github.com/zilliztech/spark-milvus/blob/main/src/main/scala/zilliztech/spark/milvus/MilvusOptions.scala) and 
configuring this while initiating the connection at : [MilvusConnection.scala](https://github.com/zilliztech/spark-milvus/blob/main/src/main/scala/zilliztech/spark/milvus/MilvusConnection.scala).

```
  val secure: Boolean = config.getBoolean(MILVUS_SECURE, false)
  val caCert: String = config.getOrDefault(MILVUS_CA_CERT, "")
  val clientKey: String = config.getOrDefault(MILVUS_CLIENT_KEY, "")
  val clientCert: String = config.getOrDefault(MILVUS_CLIENT_CERT, "")
```

Post this was able to successfully connect to the Milvus Based Instance.
With the new configurations the certificates were successfully being configured and the mTLS connection was successful.

But once I overcame this I ran into another issue with datamismatch.
For Functional testing I tried running the quickstart examples : [quickstart.py](https://github.com/zilliztech/spark-milvus/blob/main/examples/py/quickstart.py) where a sample data from Spark Dataframe is being loaded to Milvus.

```
**Spark DataFrame** 
data = [(1, "a", [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0]),
    (2, "b", [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0]),
    (3, "c", [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0]),
    (4, "d", [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0])]


**When ingested Data In Milvus Collection** 
    (1, "a", [0.0, 1.875, 0.0, 2.0, 0.0, 2.125, 0.0, 2.25]),
    (2, "b", [0.0, 1.875, 0.0, 2.0, 0.0, 2.125, 0.0, 2.25]),
    (3, "c", [0.0, 1.875, 0.0, 2.0, 0.0, 2.125, 0.0, 2.25]),
    (4, "d", [0.0, 1.875, 0.0, 2.0, 0.0, 2.125, 0.0, 2.25])
```


On deeper exploration I found that, this is happening because of the DataType mismatch between Spark and Milvus.
This is arising when the Spark DataFrame Field marked as Vector field contains an Array of Double Values instead of Float Values, during which an implicit casting is being done leading to the data mismatch.

This is happening because in : [MilvusCollection.scala](https://github.com/zilliztech/spark-milvus/blob/main/src/main/scala/zilliztech/spark/milvus/MilvusCollection.scala) both Arrays of Type Double and Float are mapped to a Float Vector in Milvus.
This is because Milvus does not have a Double datatype inherently.

So to get around this I created a UDF which gets applied when the Spark DataFrame column marked as Vector Field contains values of type Double
This forces a explicit type casting of the column from Type Double Array to Float Array.
This is resulting in a successful data ingestion to Milvus.
After this change data in Milvus :
```
    (1, "a", [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0]),
    (2, "b", [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0]),
    (3, "c", [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0]),
    (4, "d", [1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0])
```

Can you please help me understand if my approach is right?

Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Connection parameters to mTLS enabled Milvus Missing; and Data Mismatch between Spark Dataframe and Milvus #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Connection parameters to mTLS enabled Milvus Missing; and Data Mismatch between Spark Dataframe and Milvus #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions