Skip to content

Commit 8d8cbb8

Browse files
Reisenkbowers-jump
andauthored
Merge v2 Aggregation Logic (#162)
* Code generators for efficient sorting algorithms usable on and off chain with off-chain standalone tests * New pricing model usable on and off chain with off-chain standalone tests * Draft hookup of the new pricing model * Updated to reflect new aggregation logic * Docker doesn't understand the restrict keyword * Solana uses its own custom replacement for stdint * Solana uses its own custom replacement for stdint * Hack to work around docker linkage limitation * Solana's replacement for stdint isn't complete * Minor ops tweak * More minor op count tweaks * Use STYLE macro semantics like other includes * Renamed to be consistent with other files in this module * Updated for file name change * Generic utilities for efficient code usable on and off chain with standalone off chain test suite * oracle: update test_oracle with new expected values * Corrected path * tests: update qset results with new test values * oracle: remove unused sort implementation * oracle: add comment to sort explaining network principle * oracle: remove old aggregation logic Co-authored-by: Kevin J Bowers <kbowers@jumptrading.com>
1 parent d86f94b commit 8d8cbb8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+2292
-371
lines changed

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ set( PC_SRC
3939
pc/request.cpp;
4040
pc/rpc_client.cpp;
4141
pc/user.cpp;
42-
program/src/oracle/sort.c
42+
program/src/oracle/model/price_model.c
4343
)
4444

4545
set( PC_HDR

program/src/oracle/model/clean

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#!/bin/sh
2+
rm -rfv bin
3+
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
#include "price_model.h"
2+
#include "../util/avg.h" /* For avg_2_int64 */
3+
4+
#define SORT_NAME int64_sort_ascending
5+
#define SORT_KEY_T int64_t
6+
#include "../sort/tmpl/sort_stable.c"
7+
8+
int64_t *
9+
price_model_core( uint64_t cnt,
10+
int64_t * quote,
11+
int64_t * _p25,
12+
int64_t * _p50,
13+
int64_t * _p75,
14+
void * scratch ) {
15+
16+
/* Sort the quotes. The sorting implementation used here is a highly
17+
optimized mergesort (merge with an unrolled insertion sorting
18+
network small n base cases). The best case is ~0.5 n lg n compares
19+
and the average and worst cases are ~n lg n compares.
20+
21+
While not completely data oblivious, this has quite low variance in
22+
operation count practically and this is _better_ than quicksort's
23+
average case and quicksort's worst case is a computational
24+
denial-of-service and timing attack vulnerable O(n^2). Unlike
25+
quicksort, this is also stable (but this stability does not
26+
currently matter ... it might be a factor in future models).
27+
28+
A data oblivious sorting network approach might be viable here with
29+
and would have a completely deterministic operations count. It
30+
currently isn't used as the best known practical approaches for
31+
general n have a worse algorithmic cost (O( n (lg n)^2 )) and,
32+
while the application probably doesn't need perfect obliviousness,
33+
mergesort is still moderately oblivious and the application can
34+
benefit from mergesort's lower operations cost. (The main drawback
35+
of mergesort over quicksort is that it isn't in place, but memory
36+
footprint isn't an issue here.)
37+
38+
Given the operations cost model (e.g. cache friendliness is not
39+
incorporated), a radix sort might be viable here (O(n) in best /
40+
average / worst). It currently isn't used as we expect invocations
41+
with small-ish n to be common and radix sort would be have large
42+
coefficients on the O(n) and additional fixed overheads that would
43+
make it more expensive than mergesort in this regime.
44+
45+
Note: price_model_cnt_valid( cnt ) implies
46+
int64_sort_ascending_cnt_valid( cnt ) currently.
47+
48+
Note: consider filtering out "NaN" quotes (i.e. INT64_MIN)? */
49+
50+
int64_t * sort_quote = int64_sort_ascending_stable( quote, cnt, scratch );
51+
52+
/* Extract the p25
53+
54+
There are many variants with subtle tradeoffs here. One option is
55+
to interpolate when the ideal p25 is bracketed by two samples (akin
56+
to the p50 interpolation above when the number of quotes is even).
57+
That is, for p25, interpolate between quotes floor((cnt-2)/4) and
58+
ceil((cnt-2)/4) with the weights determined by cnt mod 4. The
59+
current preference is to not do that as it is slightly more
60+
complex, doesn't exactly always minimize the current loss function
61+
and is more exposed to the confidence intervals getting skewed by
62+
bum quotes with the number of quotes is small.
63+
64+
Another option is to use the inside quote of the above pair. That
65+
is, for p25, use quote ceil((cnt-2)/4) == floor((cnt+1)/4) ==
66+
(cnt+1)>>2. The current preference is not to do this as, though
67+
this has stronger bum quote robustness, it results in p25==p50==p75
68+
when cnt==3. (In this case, the above wants to do an interpolation
69+
between quotes 0 and 1 to for the p25 and between quotes 1 and 2
70+
for the p75. But limiting to just the inside quote results in
71+
p25/p50/p75 all using the median quote.)
72+
73+
A tweak to this option, for p25, is to use floor(cnt/4) == cnt>>2.
74+
This is simple, has the same asymptotic behavior for large cnt, has
75+
good behavior in the cnt==3 case and practically as good bum quote
76+
rejection in the moderate cnt case. */
77+
78+
uint64_t p25_idx = cnt >> 2;
79+
80+
*_p25 = sort_quote[p25_idx];
81+
82+
/* Extract the p50 */
83+
84+
if( (cnt & (uint64_t)1) ) { /* Odd number of quotes */
85+
86+
uint64_t p50_idx = cnt >> 1; /* ==ceil((cnt-1)/2) */
87+
88+
*_p50 = sort_quote[p50_idx];
89+
90+
} else { /* Even number of quotes (at least 2) */
91+
92+
uint64_t p50_idx_right = cnt >> 1; /* == ceil((cnt-1)/2)> 0 */
93+
uint64_t p50_idx_left = p50_idx_right - (uint64_t)1; /* ==floor((cnt-1)/2)>=0 (no overflow/underflow) */
94+
95+
int64_t vl = sort_quote[p50_idx_left ];
96+
int64_t vr = sort_quote[p50_idx_right];
97+
98+
/* Compute the average of vl and vr (with floor / round toward
99+
negative infinity rounding and without possibility of
100+
intermediate overflow). */
101+
102+
*_p50 = avg_2_int64( vl, vr );
103+
}
104+
105+
/* Extract the p75 (this is the mirror image of the p25 case) */
106+
107+
uint64_t p75_idx = cnt - ((uint64_t)1) - p25_idx;
108+
109+
*_p75 = sort_quote[p75_idx];
110+
111+
return sort_quote;
112+
}
113+
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
#ifndef _pyth_oracle_model_model_h_
2+
#define _pyth_oracle_model_model_h_
3+
4+
#include "../util/compat_stdint.h"
5+
#include <stdalign.h>
6+
7+
#ifdef __cplusplus
8+
extern "C" {
9+
#endif
10+
11+
/* Returns the minimum and maximum number of quotes the implementation
12+
can handle */
13+
14+
static inline uint64_t
15+
price_model_quote_min( void ) {
16+
return (uint64_t)1;
17+
}
18+
19+
static inline uint64_t
20+
price_model_quote_max( void ) {
21+
return (UINT64_MAX-(uint64_t)alignof(int64_t)+(uint64_t)1) / (uint64_t)sizeof(int64_t);
22+
}
23+
24+
/* price_model_cnt_valid returns non-zero if cnt is a valid value or
25+
zero if not. */
26+
27+
static inline int
28+
price_model_cnt_valid( uint64_t cnt ) {
29+
return price_model_quote_min()<=cnt && cnt<=price_model_quote_max();
30+
}
31+
32+
/* price_model_scratch_footprint returns the number of bytes of scratch
33+
space needed for an arbitrarily aligned scratch region required by
34+
price_model to handle price_model_quote_min() to cnt quotes
35+
inclusive. */
36+
37+
static inline uint64_t
38+
price_model_scratch_footprint( uint64_t cnt ) { /* Assumes price_model_cnt_valid( cnt ) is true */
39+
/* cnt int64_t's plus worst case alignment padding, no overflow
40+
possible as cnt is valid at this point */
41+
return cnt*(uint64_t)sizeof(int64_t)+(uint64_t)alignof(int64_t)-(uint64_t)1;
42+
}
43+
44+
/* price_model_core minimizes (to quote precision in a floor / round
45+
toward negative infinity sense) the loss model of the given quotes.
46+
Assumes valid inputs (e.g. cnt is at least 1 and not unreasonably
47+
large ... typically a multiple of 3 but this is not required,
48+
quote[i] for i in [0,cnt) are the quotes of interest on input, p25,
49+
p50, p75 point to where to write model outputs, scratch points to a
50+
suitable footprint srcatch region).
51+
52+
Returns a pointer to the quotes sorted in ascending order. As such,
53+
the min and max and any other rank statistic can be extracted easily
54+
on return. This location will either be quote itself or to a
55+
location in scratch. Use price_model below for a variant that always
56+
replaces quote with the sorted quotes (potentially has extra ops for
57+
copying). Further, on return, *_p25, *_p50, *_p75 will hold the loss
58+
model minimizing values for the input quotes and the scratch region
59+
was clobbered.
60+
61+
Scratch points to a memory region of arbitrary alignment with at
62+
least price_model_scratch_footprint( cnt ) bytes and it will be
63+
clobbered on output. It is sufficient to use a normally aligned /
64+
normally allocated / normally declared array of cnt int64_t's.
65+
66+
The cost of this function is a fast and low variance (but not
67+
completely data oblivious) O(cnt lg cnt) in the best / average /
68+
worst cases. This function uses no heap / dynamic memory allocation.
69+
It is thread safe provided it passed non-conflicting quote, output
70+
and scratch arrays. It has a bounded call depth ~lg cnt <= ~64 (this
71+
could reducd to O(1) by using a non-recursive sort/select
72+
implementation under the hood if desired). */
73+
74+
int64_t * /* Returns pointer to sorted quotes (either quote or ALIGN_UP(scratch,int64_t)) */
75+
price_model_core( uint64_t cnt, /* Assumes price_model_cnt_valid( cnt ) is true */
76+
int64_t * quote, /* Assumes quote[i] for i in [0,cnt) is the i-th quote on input */
77+
int64_t * _p25, /* Assumes *_p25 is safe to write to the p25 model output */
78+
int64_t * _p50, /* Assumes *_p50 " */
79+
int64_t * _p75, /* Assumes *_p75 " */
80+
void * scratch ); /* Assumes a suitable scratch region */
81+
82+
/* Same as the above but always returns quote and quote always holds the
83+
sorted quotes on return. */
84+
85+
static inline int64_t *
86+
price_model( uint64_t cnt,
87+
int64_t * quote,
88+
int64_t * _p25,
89+
int64_t * _p50,
90+
int64_t * _p75,
91+
void * scratch ) {
92+
int64_t * tmp = price_model_core( cnt, quote, _p25, _p50, _p75, scratch );
93+
if( tmp!=quote ) for( uint64_t idx=(uint64_t)0; idx<cnt; idx++ ) quote[ idx ] = tmp[ idx ];
94+
return quote;
95+
}
96+
97+
#ifdef __cplusplus
98+
}
99+
#endif
100+
101+
#endif /* _pyth_oracle_model_model_h_ */

program/src/oracle/model/run_tests

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/sh
2+
3+
module purge || exit 1
4+
module load gcc-9.3.0 || exit 1
5+
6+
./clean || exit 1
7+
mkdir -pv bin || exit 1
8+
9+
CC="gcc -g -Wall -Werror -Wextra -Wconversion -Wstrict-aliasing=2 -Wimplicit-fallthrough=2 -pedantic -D_XOPEN_SOURCE=600 -O2 -march=native -std=c17"
10+
11+
set -x
12+
13+
$CC test_price_model.c price_model.c -o bin/test_price_model || exit 1
14+
15+
bin/test_price_model || exit 1
16+
17+
echo all tests passed
18+
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
#include <stdio.h>
2+
#include <stdlib.h>
3+
#include <string.h>
4+
#include "../util/util.h"
5+
#include "price_model.h"
6+
7+
int
8+
qcmp( void const * _p,
9+
void const * _q ) {
10+
int64_t p = *(int64_t const *)_p;
11+
int64_t q = *(int64_t const *)_q;
12+
if( p < q ) return -1;
13+
if( p > q ) return 1;
14+
return 0;
15+
}
16+
17+
int
18+
main( int argc,
19+
char ** argv ) {
20+
(void)argc; (void)argv;
21+
22+
prng_t _prng[1];
23+
prng_t * prng = prng_join( prng_new( _prng, (uint32_t)0, (uint64_t)0 ) );
24+
25+
# define N 96
26+
27+
int64_t quote0 [N];
28+
int64_t quote [N];
29+
int64_t val [3];
30+
int64_t scratch[N];
31+
32+
int ctr = 0;
33+
for( int iter=0; iter<10000000; iter++ ) {
34+
if( !ctr ) { printf( "Completed %u iterations\n", iter ); ctr = 100000; }
35+
ctr--;
36+
37+
/* Generate a random test */
38+
39+
uint64_t cnt = (uint64_t)1 + (uint64_t)(prng_uint32( prng ) % (uint32_t)N); /* In [1,N], approx uniform IID */
40+
for( uint64_t idx=(uint64_t)0; idx<cnt; idx++ ) quote0[ idx ] = (int64_t)prng_uint64( prng );
41+
42+
/* Apply the model */
43+
44+
memcpy( quote, quote0, sizeof(int64_t)*(size_t)cnt );
45+
if( price_model( cnt, quote, val+0, val+1, val+2, scratch )!=quote ) { printf( "FAIL (compose)\n" ); return 1; }
46+
47+
/* Validate the results */
48+
49+
qsort( quote0, (size_t)cnt, sizeof(int64_t), qcmp );
50+
if( memcmp( quote, quote0, sizeof(int64_t)*(size_t)cnt ) ) { printf( "FAIL (sort)\n" ); return 1; }
51+
52+
uint64_t p25_idx = cnt>>2;
53+
uint64_t p50_idx = cnt>>1;
54+
uint64_t p75_idx = cnt - (uint64_t)1 - p25_idx;
55+
uint64_t is_even = (uint64_t)!(cnt & (uint64_t)1);
56+
57+
if( val[0]!=quote[ p25_idx ] ) { printf( "FAIL (p25)\n" ); return 1; }
58+
if( val[1]!=avg_2_int64( quote[ p50_idx-is_even ], quote[ p50_idx ] ) ) { printf( "FAIL (p50)\n" ); return 1; }
59+
if( val[2]!=quote[ p75_idx ] ) { printf( "FAIL (p75)\n" ); return 1; }
60+
}
61+
62+
# undef N
63+
64+
prng_delete( prng_leave( prng ) );
65+
66+
printf( "pass\n" );
67+
return 0;
68+
}
69+

0 commit comments

Comments
 (0)