You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/connectors/sources/README.md
+79-15Lines changed: 79 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -9,41 +9,42 @@ from quixstreams import Application
9
9
from quixstreams.sources import CSVSource
10
10
11
11
defmain():
12
-
app = Application()
13
-
source = CSVSource(path="input.csv")
14
-
15
-
sdf = app.dataframe(source=source)
16
-
sdf.print(metadata=True)
17
-
18
-
app.run()
19
-
12
+
app = Application()
13
+
source = CSVSource(path="input.csv")
14
+
15
+
sdf = app.dataframe(source=source)
16
+
sdf.print(metadata=True)
17
+
18
+
app.run()
19
+
20
20
if__name__=="__main__":
21
-
main()
21
+
main()
22
22
```
23
23
24
24
## Supported sources
25
25
26
-
Quix streams provide a source out of the box.
26
+
Quix Streams provides the following sources out of the box:
27
27
28
28
*[CSVSource](./csv-source.md): A source that reads data from a single CSV file.
29
29
*[KafkaReplicatorSource](./kafka-source.md): A source that replicates a topic from a Kafka broker to your application broker.
30
30
*[QuixEnvironmentSource](./quix-source.md): A source that replicates a topic from a Quix Cloud environment to your application broker.
31
31
32
-
You can also implement your own, have a look at [Creating a Custom Source](custom-sources.md) for documentation on how to do that.
32
+
To create a custom source, read [Creating a Custom Source](custom-sources.md).
33
33
34
34
## Multiprocessing
35
35
36
36
For good performance, each source runs in a subprocess. Quix Streams automatically manages the subprocess's setting up, monitoring, and tearing down.
37
37
38
38
For multiplatform support, Quix Streams starts the source process using the [spawn](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) approach. As a side effect, each Source instance must be pickleable. If a source needs to handle unpickleable objects, it's best to initialize those in the source subprocess (in the `BaseSource.start` or `Source.run` methods).
39
39
40
-
## Topics
40
+
## Customize Topic Configuration
41
41
42
-
Sources work by sending data to Kafka topics. Then StreamingDataFrames consume these topics.
42
+
Sources work by sending data to intermediate Kafka topics, which StreamingDataFrames then consume and process.
43
43
44
-
Each source provides a default topic based on its configuration. You can override the default topic by specifying a topic using the `app.dataframe()` method.
44
+
By default, each Source provides a default topic based on its configuration.
45
+
To customize the topic config, pass a new `Topic` object to the `app.dataframe()` method together with the Source instance.
45
46
46
-
**Example**
47
+
**Example:**
47
48
48
49
Provide a custom topic with four partitions to the source.
49
50
@@ -54,9 +55,14 @@ from quixstreams.models.topics import TopicConfig
54
55
55
56
defmain():
56
57
app = Application()
58
+
# Create a CSVSource
57
59
source = CSVSource(path="input.csv")
60
+
61
+
# Define a topic for the CSVSource with a custom config
0 commit comments