Digging into price data issue #664

bug-or-feature · 2022-06-28T13:50:58Z

bug-or-feature
Jun 28, 2022
Collaborator

Hi All

I recently spent some time chasing down a data issue. This posts describes the issue and solution, in case it helps anyone else. It also serves as an apology, as I caused the problem and it may affect others. I was experimenting with the new DO code, with vanilla futures instruments, and trying to run some backtests with some newly imported data from a new source. The price data was from Norgate. Running the backtests, I was seeing extremely slow performance. The script was running something like:

from sysdata.sim.db_futures_sim_data import dbFuturesSimData
from sysdata.config.configdata import Config
from sysproduction.strategy_code.run_dynamic_optimised_system import futures_system as do_system

my_config = Config("my_config.yaml")
my_config.risk_overlay = arg_not_supplied
do_system = do_system(config=my_config, data=dbFuturesSimData())
opt_portfolio = do_system.accounts.optimised_portfolio()
print(opt_portfolio.percent.stats())

The script would hang at around 40% on 'Optimising positions'. At first I thought it had crashed, but I left it running one day and it did eventually finish after two hours or so. The stats were awful, something like 0.16 Sharpe, and the curve not pretty:

I started digging, tweaking the config, simplifying rules, removing instruments, removing weightings etc etc. It became clear the problem must be data related, because the system worked fine if I swapped out my database data for the supplied CSV prices, whatever the config. I created a script that compared my imported Norgate prices from the database, to the supplied CSV data:

https://gist.github.com/bug-or-feature/a3b2b80156cb0d75b3b49d451d8bcefc

For each instrument, the script calculates correlations for DB and CSV sourced adjusted prices, and for returns. And also spits out the start dates for each source, and the difference between them. The output for my data looked like:

    Instrument  PriceCorr  ReturnsCorr    StartDB   StartCSV       Diff
30         MXP   0.939286     0.294692 1995-06-14 1995-09-15    93 days
8          CHF   0.986097     0.397954 2001-03-14 1972-09-14 10408 days
40     RUSSELL   0.998960     0.444966 2001-12-12 2015-03-11  4837 days
52        US30   0.998998     0.461573 2010-02-24 2010-03-03     7 days
46       SP400   0.999798     0.487061 2002-03-12 2002-03-12     0 days
13         DOW   0.999872     0.490015 2002-06-12 2002-06-12     0 days
7          CAD   0.945643     0.492778 1984-12-14 1972-09-14  4474 days
49       US10U   0.998915     0.509326 2016-02-25 2016-03-02     6 days
19   GASOILINE   0.996419     0.575736 2005-12-19 1985-01-15  7643 days
32      NIKKEI   0.998765     0.580952 1990-12-11 2011-06-10  7486 days
1      BITCOIN   0.998844     0.584972 2017-12-26 2017-12-22     4 days
44     SOYMEAL   0.998840     0.609545 1978-12-18 1970-05-05  3149 days
38    REDWHEAT   0.979976     0.630392 1980-02-19 1995-09-06  5678 days
39        RICE   0.964487     0.639884 1990-02-15 1988-08-03   561 days
12         DAX   0.999789     0.656440 1999-03-12 2000-03-13   367 days
54         VIX   0.999530     0.739878 2007-01-22 2006-01-26   361 days
45      SOYOIL   0.985347     0.765036 1978-12-18 1970-02-03  3240 days
47       SP500   0.999935     0.767525 1997-12-11 1982-09-14  5567 days
35      OATIES   0.938569     0.779829 1979-02-15 1970-04-03  3240 days
24     HEATOIL   0.999268     0.788367 1979-12-18 1980-02-19    63 days
0          AUD   0.996631     0.797860 1987-03-16 1987-06-16    92 days
4         BUND   0.999970     0.802014 1999-03-03 2006-12-01  2830 days
42         SMI   0.999774     0.802045 1999-03-11 2014-03-12  5480 days
29        MILK   0.996883     0.809922 1997-07-15 1996-04-17   454 days
26     LEANHOG   0.082388     0.844249 1980-02-15 1974-07-22  2034 days
6          CAC   0.998443     0.845401 1988-09-19 2009-04-16  7514 days
17     EUROSTX   0.999617     0.849202 1999-06-14 2014-03-13  5386 days
16     EURIBOR   0.999903     0.851188 1999-01-06 1989-04-21  3547 days
36      PALLAD   0.999124     0.855016 1982-11-29 1977-06-01  2007 days
18     FEEDCOW   0.954058     0.870019 1979-12-18 1977-02-01  1050 days
33         NZD   0.998365     0.874975 2007-03-09 2003-03-11  1459 days
31      NASDAQ   0.999970     0.878933 1999-09-13 1999-12-14    92 days
15         EUR   0.997766     0.887887 1999-03-15 1999-06-15    92 days
27     LIVECOW   0.464129     0.890083 1980-02-21 1971-10-21  3045 days
23  GOLD_micro   0.999739     0.898077 1984-06-14 1975-04-01  3362 days
2         BOBL   0.999967     0.909170 1999-03-03 2008-03-05  3290 days
22        GOLD   0.999755     0.911508 1979-12-31 1975-04-01  1735 days
9       COPPER   0.994135     0.926077 1979-12-31 1995-09-01  5723 days
11     CRUDE_W   0.999378     0.929270 1989-10-12 1990-10-16   369 days
55       WHEAT   0.999101     0.929322 1979-10-16 1973-12-19  2127 days
3          BTP   0.999973     0.931525 2009-12-03 2010-03-04    91 days
51        US20   0.998799     0.932930 1980-02-26 1978-05-31   636 days
34         OAT   0.999977     0.934905 2012-06-04 2012-09-05    93 days
37        PLAT   0.999420     0.941429 1980-01-02 1970-06-30  3473 days
28      LUMBER   0.991868     0.944928 1979-11-28 1970-02-03  3585 days
21         GBP   0.997857     0.945789 1980-03-14 1975-12-16  1550 days
50         US2   0.999959     0.954279 1990-08-28 2000-03-02  3474 days
41       SHATZ   0.999939     0.954614 1999-03-03 2007-12-05  3199 days
10        CORN   0.992853     0.962984 1982-09-22 1972-10-18  3626 days
14     EDOLLAR   0.999940     0.971155 1989-06-21 1984-03-23  1916 days
48        US10   0.999647     0.971388 1982-05-27 1982-08-30    95 days
53         US5   0.999904     0.971663 1988-05-27 1989-05-31   369 days
43     SOYBEAN   0.999377     0.973708 1979-09-21 1985-09-20  2191 days
5         BUXL   0.999947     0.976355 2005-12-05 2015-12-03  3650 days
25         JPY   0.999381     0.976779 2001-09-14 1977-06-14  8858 days
20      GAS_US   0.998417     0.977910 1990-05-22 1990-07-26    65 days

To dig deeper, you can also plot the correlations on a graph for an individual instrument. The worst one from the table above looks like:

and for CAD

The orange line in the MXP price graph reminded me of another graph I had seen showing notional position against buffered rounded positions.... and then the penny dropped. Rounding! This issue was the culprit, I must have imported those prices when that rounding issue was still active. Re-importing the data without rounding gives

Much better. If you imported any new instruments when #629 was active (between 16 April and 10 June 2022), you may want to check your data, especially those with below zero or low price numbers, like some FX

TobiasAntiGravity · 2022-06-30T09:23:13Z

TobiasAntiGravity
Jun 30, 2022

Hi Andy

Thank you for pointing this out. Your gist was a easy and great tool to look through the data!

It does not look like my data was affected by the rounding issue. But I did find an unexpected spike in several of the returns series. I have looked into it, but unable to see that the spike is a consequence of the panama stiching. It might be that it is in fact an artifact of the stiching, but thought I should post here, in case it was not.

I understand that repo data will not be equal to the current data, since rolling will have shifted the levels of earlier contract' price series. But I am unable to see how some of the jumps in the price series was calulated by the _panama_stiching function in sysobjects.adjusted_prices. (I use the following function for building the adjusted price series after multiple prices have been updated; from sysinit.futures.adjustedprices_from_mongo_multiple_to_mongo import process_adjusted_prices_single_instrument)

Here is the spike graphed;

and here is the sliced_return and sliced_prices dfs from your code, for the time intervall in question

	DB	CSV
index
2014-09-04	0.200	0.200
2014-09-05	32.300	1.175
2014-09-08	-27.650	3.475
2014-09-09	5.000	5.000
2014-09-10	-5.475	-5.475

	DB	CSV
index
2014-09-04	222.875	226.600
2014-09-05	255.175	227.775
2014-09-08	227.525	231.250
2014-09-09	232.525	236.250
2014-09-10	227.050	230.775

I do not see why the spike should occur at this position. The spike comes from the difference between the 4th and the 5th, while a roll occurs on the 7th, as seen in the multiple prices BUTTER.csv repo data:

2014-09-04 10:00:00,,20140900,271.95,20141000,,20141100
2014-09-04 12:00:00,,20140900,270.0,20141000,240.5,20141100
2014-09-04 13:00:00,,20140900,,20141000,240.5,20141100
2014-09-04 14:00:00,,20140900,271.5,20141000,239.0,20141100
2014-09-04 15:00:00,,20140900,271.5,20141000,,20141100
2014-09-04 16:00:00,279.475,20140900,271.95,20141000,240.475,20141100
2014-09-05 15:00:00,,20140900,271.95,20141000,240.475,20141100
2014-09-05 17:00:00,278.975,20140900,273.975,20141000,,20141100
2014-09-05 18:00:00,276.875,20140900,273.125,20141000,242.0,20141100
2014-09-07 22:00:00,275.0,20141000,,20141100,,20141200
2014-09-08 01:00:00,275.225,20141000,,20141100,,20141200

I am wondering if recalculation of the adjusted prices somehow introduce a skew of the prices relative to the index. I have looked closely at the _panama_stiching function and _roll_in_panama in sysobjects.adjusted_prices, but cannot see that such a skew could occur. Never the less I thought the observation might be of interest to others, and so I am posting it here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Digging into price data issue #664

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Digging into price data issue #664

Uh oh!

bug-or-feature Jun 28, 2022 Collaborator

Replies: 1 comment

Uh oh!

TobiasAntiGravity Jun 30, 2022

bug-or-feature
Jun 28, 2022
Collaborator

TobiasAntiGravity
Jun 30, 2022