Skip to content

Commit b808e4c

Browse files
authored
Schema Registry 6: Docs (#472)
* Correct serialization.md * Reorder Serializers API table of contents. Put Quix serializers at the bottom as it is headed towarda deprecation. * Add Schema Registry to Serializers API table of contents * Add dedicated Schema Registry documentation
1 parent 0cd5b7f commit b808e4c

File tree

4 files changed

+134
-26
lines changed

4 files changed

+134
-26
lines changed

docs/advanced/schema-registry.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Schema Registry
2+
3+
Serializers and deserializers for JSON Schema, Avro, and Protobuf support integration with a Schema Registry.
4+
5+
The current implementation wraps Confluent's serializers and deserializers, which are tightly coupled with the Schema Registry.
6+
7+
To integrate your existing Schema Registry, pass `SchemaRegistryClientConfig` to your serializers and deserializers. Additional optional configuration can be provided via `SchemaRegistrySerializationConfig`.
8+
9+
```python
10+
from quixstreams.models import (
11+
SchemaRegistryClientConfig,
12+
SchemaRegistrySerializationConfig,
13+
)
14+
15+
schema_registry_client_config = SchemaRegistryClientConfig(
16+
url='localhost:8081',
17+
basic_auth_user_info='username:password',
18+
)
19+
20+
# optional
21+
schema_registry_serialization_config = SchemaRegistrySerializationConfig(
22+
auto_register_schemas=False,
23+
)
24+
```
25+
26+
**Note:** For the full list of available options, refer to the [Serializers API](./api-reference/serialization.md).
27+
28+
## JSON Schema
29+
30+
For both the serializer and deserializer, a `schema` must be provided.
31+
32+
```python
33+
from quixstreams.models import JSONDeserializer, JSONSerializer
34+
35+
MY_SCHEMA = {
36+
"title": "MyObject",
37+
"type": "object",
38+
"properties": {
39+
"name": {"type": "string"},
40+
"id": {"type": "number"},
41+
},
42+
"required": ["id"],
43+
}
44+
45+
deserializer = JSONDeserializer(
46+
schema=MY_SCHEMA,
47+
schema_registry_client_config=schema_registry_client_config,
48+
schema_registry_serialization_config=schema_registry_serialization_config,
49+
)
50+
serializer = JSONSerializer(
51+
schema=MY_SCHEMA,
52+
schema_registry_client_config=schema_registry_client_config,
53+
schema_registry_serialization_config=schema_registry_serialization_config,
54+
)
55+
```
56+
57+
## Avro
58+
59+
The serializer requires a `schema`, but the deserializer can automatically fetch the required schema from the Schema Registry.
60+
61+
```python
62+
from quixstreams.models.serialize.avro import AvroDeserializer, AvroSerializer
63+
64+
MY_SCHEMA = {
65+
"type": "record",
66+
"name": "testschema",
67+
"fields": [
68+
{"name": "name", "type": "string"},
69+
{"name": "id", "type": "int", "default": 0},
70+
],
71+
}
72+
73+
deserializer = AvroDeserializer(
74+
schema_registry_client_config=schema_registry_client_config,
75+
schema_registry_serialization_config=schema_registry_serialization_config,
76+
)
77+
serializer = AvroSerializer(
78+
schema=MY_SCHEMA,
79+
schema_registry_client_config=schema_registry_client_config,
80+
schema_registry_serialization_config=schema_registry_serialization_config,
81+
)
82+
```
83+
84+
## Protobuf
85+
86+
For both the serializer and deserializer, `msg_type` must be provided.
87+
88+
```python
89+
from quixstreams.models.serialize.protobuf import ProtobufDeserializer, ProtobufSerializer
90+
91+
from my_input_models_pb2 import InputProto
92+
from my_output_models_pb2 import OutputProto
93+
94+
deserializer = ProtobufDeserializer(
95+
msg_type=InputProto,
96+
schema_registry_client_config=schema_registry_client_config,
97+
schema_registry_serialization_config=schema_registry_serialization_config,
98+
)
99+
serializer = ProtobufSerializer(
100+
msg_type=OutputProto,
101+
schema_registry_client_config=schema_registry_client_config,
102+
schema_registry_serialization_config=schema_registry_serialization_config,
103+
)
104+
```
105+
106+
See the [Serialization and Deserialization](./serialization.md) page to learn more about how to integrate the serializer and deserializer with your application.

docs/advanced/serialization.md

Lines changed: 22 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Serialization and Deserialization
22

3-
Quix Streams supports multiple serialization formats to exchange data between Kafka topics:
3+
Quix Streams supports multiple serialization formats for exchanging data between Kafka topics:
44

55
- `bytes`
66
- `string`
@@ -10,20 +10,22 @@ Quix Streams supports multiple serialization formats to exchange data between Ka
1010
- `avro`
1111
- `protobuf`
1212

13-
The serialization settings are defined per-topic using these parameters of `Application.topic()` function:
13+
The serialization settings are defined per topic using these parameters of the `Application.topic()` function:
1414

1515
- `key_serializer`
1616
- `value_serializer`
1717
- `key_deserializer`
1818
- `value_deserializer`
1919

20-
By default, message values are serialized with `json` and message keys are serialized with `bytes` (i.e. passed as they are received from Kafka).
20+
By default, message values are serialized with `json`, and message keys are serialized with `bytes` (i.e., passed as they are received from Kafka).
2121

22-
Note: The legacy `quix` serializer and legacy `quix_events` and `quix_timeseries` deserializers are still supported but may be deprecated in future. New stream processing applications should avoid using these 3 formats.
22+
**Note:** JSON Schema, Avro, and Protobuf serialization formats support integration with a Schema Registry. See the [Schema Registry](./schema-registry.md) page to learn more.
23+
24+
**Note:** The legacy `quix` serializer and legacy `quix_events` and `quix_timeseries` deserializers are still supported but may be deprecated in the future. New stream processing applications should avoid using these three formats.
2325

2426
## Configuring Serialization
25-
To set a serializer, you may either pass a string shorthand for it, or an instance of `quixstreams.models.serializers.Serializer` and `quixstreams.models.serializers.Deserializer` directly
26-
to the `Application.topic()`.
27+
28+
To set a serializer, you can either pass a string shorthand for it or an instance of `quixstreams.models.serializers.Serializer` and `quixstreams.models.serializers.Deserializer` directly to the `Application.topic()` function.
2729

2830
**Example:**
2931

@@ -37,7 +39,6 @@ input_topic = app.topic('input', value_deserializer='json', key_deserializer='st
3739
output_topic = app.topic('output', value_serializer='json', key_deserializer='bytes')
3840
```
3941

40-
4142
Passing `Serializer` and `Deserializer` instances directly:
4243

4344
```python
@@ -49,15 +50,13 @@ input_topic = app.topic('input', value_deserializer=JSONDeserializer())
4950
output_topic = app.topic('output', value_serializer=JSONSerializer())
5051
```
5152

52-
You can find all available serializers in `quixstreams.models.serializers` module.
53+
You can find all available serializers in the `quixstreams.models.serializers` module.
5354

54-
## Jsonschema support
55+
## JSON Schema Support
5556

56-
The json serializer and deserializer support validation of the data against a jsonschema.
57+
The JSON serializer and deserializer support data validation against a JSON Schema.
5758

5859
```python
59-
from jsonschema import Draft202012Validator
60-
6160
from quixstreams import Application
6261
from quixstreams.models import JSONDeserializer, JSONSerializer
6362

@@ -76,15 +75,14 @@ output_topic = app.topic('output', value_serializer=JSONSerializer(schema=MY_SCH
7675
```
7776

7877
## Avro
79-
Apache Avro is a row-based binary serialization format data. Avro stores the schema in JSON format alongside the data, enabling efficient processing and schema evolution.
8078

81-
You can learn more the Apache Avro format [here](https://avro.apache.org/docs/).
82-
The Avro serializer and deserializer need to be passed explicitly.
79+
Apache Avro is a row-based binary serialization format. Avro stores the schema in JSON format alongside the data, enabling efficient processing and schema evolution.
8380

84-
In the current version, the schema must be provided manually.
81+
You can learn more about the Apache Avro format [here](https://avro.apache.org/docs/).
82+
The Avro serializer and deserializer need to be passed explicitly and must include the schema.
8583

86-
> ***WARNING***: Avro serializer and deserializer require the `fastavro` library.
87-
> You can install quixstreams with the necessary dependencies using
84+
> **WARNING**: The Avro serializer and deserializer require the `fastavro` library.
85+
> You can install Quix Streams with the necessary dependencies using:
8886
> `pip install quixstreams[avro]`
8987
9088
```python
@@ -106,15 +104,14 @@ output_topic = app.topic('output', value_serializer=AvroSerializer(schema=MY_SCH
106104
```
107105

108106
## Protobuf
109-
Protocol Buffers are language-neutral, platform-neutral extensible mechanisms for serializing structured data.
110107

111-
You can learn more about the Protocol buffers format [here](https://protobuf.dev/)
112-
The Protobuf serializer and deserializer need to be passed explicitly.
108+
Protocol Buffers are language-neutral, platform-neutral extensible mechanisms for serializing structured data.
113109

114-
In the current version, the schema must be provided manually.
110+
You can learn more about the Protocol Buffers format [here](https://protobuf.dev/).
111+
The Protobuf serializer and deserializer need to be passed explicitly and must include the schema.
115112

116-
> ***WARNING***: The protobuf serializer and deserializer requires the protobuf library.
117-
> You can install quixstreams with the necessary dependencies using
113+
> **WARNING**: The Protobuf serializer and deserializer require the `protobuf` library.
114+
> You can install Quix Streams with the necessary dependencies using:
118115
> `pip install quixstreams[protobuf]`
119116
120117
```python
@@ -129,4 +126,4 @@ input_topic = app.topic('input', value_deserializer=ProtobufDeserializer(msg_typ
129126
output_topic = app.topic('output', value_serializer=ProtobufSerializer(msg_type=OutputProto))
130127
```
131128

132-
By default the protobuf deserializer will deserialize the message to a python dictionary. Doing it has a big performance impact. You can disable this behavior by initializing the deserializer with `to_dict` set to `False`. The protobuf message object will then be used as the message value limiting the available StreamingDataframe API.
129+
By default, the Protobuf deserializer will deserialize the message to a Python dictionary. Doing this has a big performance impact. You can disable this behavior by initializing the deserializer with `to_dict` set to `False`. The Protobuf message object will then be used as the message value, limiting the available StreamingDataframe API.

docs/build/build.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,12 @@
8080
"serialization.md": {
8181
k: None
8282
for k in [
83-
"quixstreams.models.serializers.quix",
8483
"quixstreams.models.serializers.simple_types",
84+
"quixstreams.models.serializers.json",
85+
"quixstreams.models.serializers.avro",
86+
"quixstreams.models.serializers.protobuf",
87+
"quixstreams.models.serializers.schema_registry",
88+
"quixstreams.models.serializers.quix",
8589
]
8690
},
8791
"kafka.md": {

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ nav:
3838
- Advanced Usage:
3939
- Checkpointing: advanced/checkpointing.md
4040
- Serialization Formats: advanced/serialization.md
41+
- Schema Registry: advanced/schema-registry.md
4142
- Stateful Processing: advanced/stateful-processing.md
4243
- Managing Kafka Topics: advanced/topics.md
4344
- Using Producer & Consumer: advanced/producer-consumer-lowlevel.md

0 commit comments

Comments
 (0)