High-Order Constructs #16
dmatveev
started this conversation in
Architecture
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently (in v4.9) G-API lacks high-order constructs which become inherently important in the modern AI pipelines. Examples include Stable Diffusion and Whisper. The goal of this discussion is to document the possible API with examples and pros/cons.
Example Flows
Stable Diffusion
Classic Stable Diffusion 1.5 pipeline employs a text encoder, UNet, and VAE decoder networks. The problem here with UNet which is executed in a loop; also there's a Scheduler which refines UNet results after every iteration. In pseudocode, the flow looks like this:
(NB: Is
vae_decoder
applied to output ofunet
but notscheduler
at the final iteration?)Here
new_state
is a single tensor.Whisper
OpenVINO-based implementation of OpenAI Whisper pipeline employs kv-cache for the Whisper decoder: 1, 2. With this approach, the Whisper pipeline for a 30s audo input may look like:
BTW, this pattern may be seen in many modern encoder-decoder scenarios.
G-API Operators Definition
Technically, G-API is a very high-level form of SSA: Every new Operation, when applied to Data, produces new Data which seen as immutable.
With the exception of Stateful kernels, G-API graph form may be seen as a pure functional programming language.
In really pure functional languages, there's no /statements/ as in imperative languages like C, but there are /expressions/ which always have a value. This property may be important as G-API tracks the graph structure purely based on
value
/producer_of(value)
relationships (and is, in fact, defined on vectors of computation's input and output values).The following subchapters will cover the minimum required basis to support the above scenarios.
IF
: https://github.com/orgs/g-api-org/discussions/17 (NOT seen in the above examples, but a quite common this).MAP
: TBDFOLD
: TBDChallenges
infer<Generic>
may be problematic to pass inputs/outputs around. We do distinguish input and output tensor dictionaries, the output one is in fact used to populate the information about outputs of the network.Beta Was this translation helpful? Give feedback.
All reactions