Skip to content

Commit 118512d

Browse files
authored
New README (#401)
1 parent 23a5882 commit 118512d

File tree

2 files changed

+64
-125
lines changed

2 files changed

+64
-125
lines changed

README.md

Lines changed: 64 additions & 125 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,28 @@
1-
![Quix - React to data, fast](https://github.com/quixio/quix-streams/blob/main/images/quixstreams-banner.jpg)
1+
![Quix - React to data, fast](./images/quixstreams-banner.png)
22

3-
[![Docs](https://img.shields.io/badge/-Docs-red?logo=read-the-docs)](https://quix.io/docs/quix-streams/introduction.html)
3+
[![GitHub Version](https://img.shields.io/github/tag-pre/quixio/quix-streams.svg?label=Version&color=008dff)](https://github.com/quixio/quix-streams/releases)
4+
![PyPI License](https://img.shields.io/pypi/l/quixstreams?label=Licence&color=008dff)
5+
[![Docs](https://img.shields.io/badge/docs-quix.io-0345b2?label=Docs&color=008dff)](https://quix.io/docs/quix-streams/introduction.html) \
46
[![Community Slack](https://img.shields.io/badge/Community%20Slack-blueviolet?logo=slack)](https://quix.io/slack-invite)
5-
[![Linkedin](https://img.shields.io/badge/LinkedIn-0A66C2.svg?logo=linkedin)](https://www.linkedin.com/company/70925173/)
6-
[![Quix on Twitter](https://img.shields.io/twitter/url?label=Twitter&style=social&url=https%3A%2F%2Ftwitter.com%2Fquix_io)](https://twitter.com/quix_io)
7+
[![YouTube](https://img.shields.io/badge/-YouTube-FF0000?logo=youtube)](https://www.youtube.com/@QuixStreams)
8+
[![LinkedIn](https://img.shields.io/badge/LinkedIn-0A66C2.svg?logo=linkedin)](https://www.linkedin.com/company/70925173/)
9+
[![X](https://img.shields.io/twitter/url?label=X&style=social&url=https%3A%2F%2Ftwitter.com%2Fquix_io)](https://twitter.com/quix_io)
710

8-
# Quix Streams
11+
# 100% Python Stream Processing for Kafka
912

10-
Quix Streams is a cloud native library for processing data in Kafka using pure Python. It’s designed to give you the power of a distributed system in a lightweight library by combining the low-level scalability and resiliency features of Kafka with an easy to use Python interface (to ease newcomers to stream processing).
13+
Quix Streams is a cloud-native library for processing data in Kafka using pure Python. It’s designed to give you the power of a distributed system in a lightweight library by combining Kafka's low-level scalability and resiliency features with an easy-to-use Python interface (to ease newcomers to stream processing).
1114

12-
Quix Streams has the following benefits:
13-
14-
- Pure Python (no JVM, no wrappers, no cross-language debugging).
15-
- No orchestrator, no server-side engine.
15+
It has the following benefits:
1616
- Streaming DataFrame API (similar to pandas DataFrame) for tabular data transformations.
17-
- Easily integrates with the entire Python ecosystem (pandas, scikit-learn, TensorFlow, PyTorch etc).
18-
- Support for many serialization formats, including JSON (and Quix-specific).
19-
- Support for stateful operations using RocksDB.
20-
- Support for aggregations over tumbling and hopping time windows.
21-
- "At-least-once" and "exactly-once" Kafka processing guarantees.
22-
- Designed to run and scale resiliently via container orchestration (like Kubernetes).
23-
- Easily runs locally and in Jupyter Notebook for convenient development and debugging.
24-
- Seamless integration with the fully managed [Quix Cloud](https://quix.io/product) platform.
25-
26-
Use Quix Streams to build event-driven, machine learning/AI or physics-based applications that depend on real-time data from Kafka.
17+
- Custom stateful operations via a state object.
18+
- Custom reducing and aggregating over tumbling and hopping time windows.
19+
- Exactly-once processing semantics via Kafka transactions.
20+
- Pure Python with no need for a server-side engine.
2721

22+
Use Quix Streams to build simple Kafka producer/consumer applications or leverage stream processing to build complex event-driven systems, real-time data pipelines and AI/ML products.
2823

29-
## Getting started 🏄
3024

25+
## Getting Started 🏄
3126

3227
### Install Quix Streams
3328

@@ -40,90 +35,44 @@ Python 3.8+, Apache Kafka 0.10+
4035

4136
See [requirements.txt](https://github.com/quixio/quix-streams/blob/main/requirements.txt) for the full list of requirements
4237

43-
## Documentation
38+
### Documentation
4439
[Quix Streams Docs](https://quix.io/docs/quix-streams/introduction.html)
4540

46-
### Example Application
41+
### Example
4742

4843
Here's an example of how to <b>process</b> data from a Kafka Topic with Quix Streams:
4944

5045
```python
51-
from quixstreams import Application, State
46+
from quixstreams import Application
5247

53-
# Define an application
48+
# A minimal application reading temperature data in Celsius from the Kafka topic,
49+
# converting it to Fahrenheit and producing alerts to another topic.
50+
51+
# Define an application that will connect to Kafka
5452
app = Application(
5553
broker_address="localhost:9092", # Kafka broker address
56-
consumer_group="consumer-group-name", # Kafka consumer group
5754
)
5855

59-
# Define the input and output topics. By default, "json" serialization will be used
60-
input_topic = app.topic("my_input_topic")
61-
output_topic = app.topic("my_output_topic")
62-
63-
64-
def count(data: dict, state: State):
65-
# Get a value from state for the current Kafka message key
66-
total = state.get('total', default=0)
67-
total += 1
68-
# Set a value back to the state
69-
state.set('total', total)
70-
# Update your message data with a value from the state
71-
data['total'] = total
72-
73-
74-
# Create a StreamingDataFrame instance
75-
# StreamingDataFrame is a primary interface to define the message processing pipeline
76-
sdf = app.dataframe(topic=input_topic)
77-
78-
# Print the incoming messages
79-
sdf = sdf.update(lambda value: print('Received a message:', value))
56+
# Define the Kafka topics
57+
temperature_topic = app.topic("temperature-celsius", value_deserializer="json")
58+
alerts_topic = app.topic("temperature-alerts", value_serializer="json")
8059

81-
# Select fields from incoming messages
82-
sdf = sdf[["field_1", "field_2", "field_3"]]
60+
# Create a Streaming DataFrame connected to the input Kafka topic
61+
sdf = app.dataframe(topic=temperature_topic)
8362

84-
# Filter only messages with "field_0" > 10 and "field_2" != "test"
85-
sdf = sdf[(sdf["field_1"] > 10) & (sdf["field_2"] != "test")]
63+
# Convert temperature to Fahrenheit by transforming the input message (with an anonymous or user-defined function)
64+
sdf = sdf.apply(lambda value: {"temperature_F": (value["temperature"] * 9/5) + 32})
8665

87-
# Filter messages using custom functions
88-
sdf = sdf[sdf.apply(lambda value: 0 < (value['field_1'] + value['field_3']) < 1000)]
66+
# Filter values above the threshold
67+
sdf = sdf[sdf["temperature_F"] > 150]
8968

90-
# Generate a new value based on the current one
91-
sdf = sdf.apply(lambda value: {**value, 'new_field': 'new_value'})
92-
93-
# Update a value based on the entire message content
94-
sdf['field_4'] = sdf.apply(lambda value: value['field_1'] + value['field_3'])
95-
96-
# Use a stateful function to persist data to the state store and update the value in place
97-
sdf = sdf.update(count, stateful=True)
98-
99-
# Print the result before producing it
100-
sdf = sdf.update(lambda value, ctx: print('Producing a message:', value))
101-
102-
# Produce the result to the output topic
103-
sdf = sdf.to_topic(output_topic)
104-
105-
if __name__ == "__main__":
106-
# Run the streaming application
107-
app.run(sdf)
69+
# Produce alerts to the output topic
70+
sdf = sdf.to_topic(alerts_topic)
10871

72+
# Run the streaming application
73+
app.run(sdf)
10974
```
11075

111-
112-
### How It Works
113-
There are two primary components:
114-
- `StreamingDataFrame` - a predefined declarative pipeline to process and transform incoming messages.
115-
- `Application` - to manage the Kafka-related setup & teardown and message lifecycle (consuming, committing). It processes each message with the dataframe you provide it.
116-
117-
Under the hood, the `Application` will:
118-
- Consume a message.
119-
- Deserialize it.
120-
- Process it with your `StreamingDataFrame`.
121-
- Produce it to the output topic.
122-
- Automatically commit the topic offset and state updates after the message is processed.
123-
- React to Kafka rebalancing updates and manage the topic partitions.
124-
- Manage the State store.
125-
- Handle OS signals and gracefully exit the application.
126-
12776
### Tutorials
12877

12978
To see Quix Streams in action, check out the Quickstart and Tutorials in the docs:
@@ -134,64 +83,54 @@ To see Quix Streams in action, check out the Quickstart and Tutorials in the doc
13483
- [**Tutorial - Purchase Filtering**](https://quix.io/docs/quix-streams/tutorials/purchase-filtering/tutorial.html)
13584

13685

137-
### Using the [Quix Cloud](https://quix.io/)
86+
### Key Concepts
87+
There are two primary objects:
88+
- `StreamingDataFrame` - a predefined declarative pipeline to process and transform incoming messages.
89+
- `Application` - to manage the Kafka-related setup, teardown and message lifecycle (consuming, committing). It processes each message with the dataframe you provide for it to run.
90+
91+
Under the hood, the `Application` will:
92+
- Consume and deserialize messages.
93+
- Process them with your `StreamingDataFrame`.
94+
- Produce it to the output topic.
95+
- Automatically checkpoint processed messages and state for resiliency.
96+
- Scale using Kafka's built-in consumer groups mechanism.
13897

139-
This library doesn't have any dependency on any commercial products, but if you use it together with Quix Cloud you will get some advantages out of the box during your development process such as:
140-
- Auto-configuration.
141-
- Monitoring.
142-
- Data explorer.
143-
- Data persistence.
144-
- Pipeline visualization.
145-
- Metrics.
14698

147-
and more.
99+
### Deployment
100+
You can run Quix Streams pipelines anywhere Python is installed.
148101

149-
Quix Streams provides a seamless integration with Quix Cloud, and it can automatically configure the `Application` using Quix SDK Token.
102+
Deploy to your own infrastructure or to [Quix Cloud](https://quix.io/product) on AWS, Azure, GCP or on-premise for a fully managed platform.
103+
You'll get self-service DevOps, CI/CD and monitoring, all built with best in class engineering practices learned from Formula 1 Racing.
150104

151105
Please see the [**Connecting to Quix Cloud**](https://quix.io/docs/quix-streams/quix-platform.html) page
152106
to learn how to use Quix Streams and Quix Cloud together.
153107

154-
### What's Next
108+
## Roadmap 📍
155109

156-
This library is being actively developed.
110+
This library is being actively developed by a full-time team.
157111

158112
Here are some of the planned improvements:
159113

160114
- [x] [Windowed aggregations over Tumbling & Hopping windows](https://quix.io/docs/quix-streams/v2-0-latest/windowing.html)
161-
- [x] [State recovery based on Kafka changelog topics](https://quix.io/docs/quix-streams/advanced/stateful-processing.html#fault-tolerance-recovery)
115+
- [x] [Stateful operations and recovery based on Kafka changelog topics](https://quix.io/docs/quix-streams/advanced/stateful-processing.html)
162116
- [x] [Group-by operation](https://quix.io/docs/quix-streams/groupby.html)
163-
- [X] ["Exactly Once" delivery guarantees for Kafka message processing (AKA transactions)](https://quix.io/docs/quix-streams/configuration.html#processing-guarantees)
117+
- [x] ["Exactly Once" delivery guarantees for Kafka message processing (AKA transactions)](https://quix.io/docs/quix-streams/configuration.html#processing-guarantees)
164118
- [ ] Joins
165119
- [ ] Windowed aggregations over Sliding windows
166120
- [ ] Support for Avro and Protobuf formats
167121
- [ ] Schema Registry support
168122

169123

170-
To find out when the next version is ready, make sure you watch this repo
171-
and join our [Quix Community on Slack](https://quix.io/slack-invite)!
172-
173-
## Contribution Guide
174-
175-
Contributing is a great way to learn and we especially welcome those who haven't contributed to an OSS project before.
176-
<br>
177-
We're very open to any feedback or code contributions to this OSS project ❤️.
178-
179-
Before contributing, please read our [Contributing](https://github.com/quixio/quix-streams/blob/main/CONTRIBUTING.md) file for how you can best give feedback and contribute.
180-
181-
## Need help?
182-
183-
If you run into any problems, please create an [issue](https://github.com/quixio/quix-streams/issues) or ask in #quix-help in our [Quix Community on Slack](https://quix.io/slack-invite).
184-
185-
## Community 👭
186-
187-
Join the [Quix Community on Slack](https://quix.io/slack-invite), a vibrant group of Python developers, data enthusiasts and newcomers to Apache Kafka, who are learning and leveraging Quix Streams for real-time data processing.
188-
189-
## License
124+
## Get Involved 🤝
190125

191-
Quix Streams is licensed under the Apache 2.0 license. View a copy of the License file [here](https://github.com/quixio/quix-streams/blob/main/LICENSE).
126+
- Please use [GitHub issues](https://github.com/quixio/quix-streams/issues) to report bugs and suggest new features.
127+
- Join the [Quix Community on Slack](https://quix.io/slack-invite), a vibrant group of Kafka Python developers, data engineers and newcomers to Apache Kafka, who are learning and leveraging Quix Streams for real-time data processing.
128+
- Watch and subscribe to [@QuixStreams on YouTube](https://www.youtube.com/@QuixStreams) for code-along tutorials from scratch and interesting community highlights.
129+
- Follow us on [X](https://x.com/Quix_io) and [LinkedIn](https://www.linkedin.com/company/70925173) where we share our latest tutorials, forthcoming community events and the occasional meme.
130+
- If you have any questions or feedback - write to us at support@quix.io!
192131

193-
## Stay in touch 👋
194132

195-
You can follow us on [Twitter](https://twitter.com/quix_io) and [Linkedin](https://www.linkedin.com/company/70925173) where we share our latest tutorials, forthcoming community events and the occasional meme.
133+
## License 📗
196134

197-
If you have any questions or feedback - write to us at support@quix.io!
135+
Quix Streams is licensed under the Apache 2.0 license.
136+
View a copy of the License file [here](https://github.com/quixio/quix-streams/blob/main/LICENSE).

images/quixstreams-banner.png

245 KB
Loading

0 commit comments

Comments
 (0)