Skip to content

Workload Configuration

Srujun Thanmay Gupta edited this page Mar 15, 2018 · 15 revisions

Source Streams Configuration

Use this to define properties about source streams. Each message will be generated by sampling the respective distributions for the keys and values.

  • name: used as reference in transformations
  • num_keys: number of keys used in this stream
  • key_dist: distribution of key usage, either UniformIntegerDistribution or ZipfDistribution
  • key_dist_params: params supplied to the distribution class
  • msg_dist: distribution of message length, currently only UniformIntegerDistribution
  • msg_dist_params: params supplied to the distribution class
  • rate: rate of messages produced per second

Transformations

Each transformation specifies the following information:

  • input: if 1-to-n transformation, the name of the source/transformation that this transformation uses
  • inputs: if m-to-n transformation, list of input sources/transformations that this transformation uses
  • operator: the type of transformation to run, either a String or the fully qualified class name of the operator
  • params: parameters used by the operator function, described in the section below
  • cpu_load: a double value between 0.0 and 1.0 representing how much load to put on the CPU (default 0.0)
  • mem_load: need to define... (default 0.0)
  • disk_load: need to define... (default 0.0)

Stateless Transformations

Filter

1-to-1 transformation to remove messages from incoming stream

Parameters:

  • p: probability of dropping the message (0 <= p < 1)

Split

1-to-n transformation that sends the input stream to n output streams. If ratio is 1 or unspecified, the input stream messages are copied to respective the output streams.

Parameters:

  • n: number of output streams
  • ratio: ratio of output message size to input message size

Modify

1-to-1 transformation to change the size of individual messages or the rate of messages passing in a stream.

Parameters:

  • size_ratio: ratio of output message size to input message size
  • rate_ratio: ratio of messages output per input message (1 implies that the rate is the same)

Merge

n-to-1 transformation that simply takes messages from all the input streams and sends it to a single output stream.

Parameters: none

Parition/GroupBy

need to define...

Stateful Transformations

These transformations use either local (in-memory) store or a remote store to persist information across messages.

Joins

Join 2 streams into a single output stream based on key-matches. The output message value is the concatenation of the two messages.

Parameters:

  • ttl: time-to-live for received messages that have not yet joined with another message. Value should be a string formatted as "<integer><unit>" where unit is one of [ms, s, m]. Examples: "2s", "350ms", etc.

Windowed operations

Various types of window-based operations Window types:

  • tumbling: non-overlapping, fixed size and contiguous intervals
  • session: messages are grouped into a session defined by a sessionGap. Session is closed and results are emitted if no new messages arrive for the window for the gap duration.
  • sliding: currently unsupported

Parameters:

  • type: either tumbling or session
  • duration: the window duration. Value should be a string formatted as "<integer><unit>" where unit is one of [ms, s, m]. Examples: "2s", "350ms", etc.
  • is_keyed: boolean, currently defaults to true
  • trigger: currently unsupported
  • accumulation: currently unsupported

Sample Configuration

{
    "sources": {
        "s1": {
            "num_keys": 10,
            "key_dist": "org.apache.commons.math4.distribution.ZipfDistribution",
            "key_dist_params": {
                "exponent": 1.2
            },
            "msg_dist": "org.apache.commons.math4.distribution.UniformIntegerDistribution",
            "msg_dist_params": {
                "lower": 100,
                "upper": 1000
            },
            "rate": 10
        }
    },
    "transformations": {
        "t1": {
            "operator": "filter",
            "input": "s1",
            "params": {
                "p": 0.5
            }
        },
        "t2": {
            "operator": "split",
            "input": "t1",
            "params": {
                "n": 2,
                "ratio": 0.5
            },
            "outputs": ["t2__1", "t2__2"]
        },
        "t3": {
            "operator": "filter",
            "input": "t2__1",
            "params": {
                "p": 0.75
            }
        },
        "t4": {
            "operator": "modify",
            "input": "t2__2",
            "params": {
                "rate_ratio": 0.5
            }
        }
    },
    "sinks": [
        "t3", "t4"
    ]
}

Reference

Clone this wiki locally