Skip to content

Problem with csv ingest on ubuntu 19.04, zipline 1.3.0 (interaction with pandas problem?) #2540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mstaniak opened this issue Sep 12, 2019 · 2 comments

Comments

@mstaniak
Copy link

mstaniak commented Sep 12, 2019

Dear Zipline Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

  • Operating System: Linux janusz 5.0.0-27-generic Fills in missing slots in transform data panels. #28-Ubuntu SMP Tue Aug 20 19:53:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Python Version: 3.5.4
  • Python Bitness: 64
  • How did you install Zipline: pip
  • Python packages: a

lembic==1.1.0
bcolz==0.12.1
Bottleneck==1.2.1
certifi==2019.6.16
chardet==3.0.4
Click==7.0
contextlib2==0.5.5
cyordereddict==1.0.0
Cython==0.29.13
decorator==4.4.0
docutils==0.15.2
empyrical==0.5.3
idna==2.8
intervaltree==3.0.2
Logbook==1.5.2
lru-dict==1.1.6
lxml==4.4.1
Mako==1.1.0
MarkupSafe==1.1.1
mock==3.0.5
multipledispatch==0.6.0
networkx==1.11
numexpr==2.7.0
numpy==1.17.2
pandas==0.22.0
pandas-datareader==0.7.4
patsy==0.5.1
pymongo==3.9.0
python-dateutil==2.8.0
python-editor==1.0.4
pytz==2019.2
requests==2.22.0
requests-file==1.4.3
scipy==1.3.1
six==1.12.0
sortedcontainers==2.1.0
SQLAlchemy==1.3.8
statsmodels==0.10.1
tables==3.5.2
toolz==0.10.0
trading-calendars==1.8.1
urllib3==1.25.3
wrapt==1.11.2
zipline==1.3.0

Now that you know a little about me, let me tell you about the issue I am
having:

Description of Issue

  • I tried to ingest data from csv file (columns: date, open, high, low, close, volume, dividend, split) and I got

TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

Here is how you can reproduce this issue on your machine:

Reproduction Steps

  1. Take my configuration of packages (these are as pip installed them after running pip install zipline on a clean python environment)
  2. Use any csv file
  3. The error happens within the winsorise_uint32 function, when it tries to do
df[mask] = 0

(btw this code would be better as a part of if mv.any() indented part).
The error happens because pandas won't allow replacement like this. I guess that this code worked in some other version of pandas? Any help will be appreciated. The code fails both when mv.any() is False and when it's True.

What steps have you taken to resolve this already?

I run the code in winsorise_uint32 by hand to diagnose the problem.

I tried to change it to something like:

df[mask] = np.nan
return df.fillna(0)

but it fails later on when other code expectes integer nan and get a float nan.

Anything else?

Sincerely,
mstaniak

@mstaniak mstaniak changed the title Problem with csv ingest on ubuntu 19.04, zipline 1.3.0 Problem with csv ingest on ubuntu 19.04, zipline 1.3.0 (interaction with pandas problem?) Sep 12, 2019
@mstaniak
Copy link
Author

Btw changing the code to

    df['volume'] = np.where(df['volume'] > UINT32_MAX, UINT32_MAX, df['volume'])
    return df

fixes the problem in my environment

@taliarhodes
Copy link
Contributor

Hi @mstaniak,

Our maximum supported versions at this time for Numpy and Pandas is 1.14.1 and 0.22.0 respectively. I see you are using Numpy 1.17.2. Do you encounter the issue if you downgrade Numpy to 1.14.1?

We are looking to support newer versions of Pandas and Numpy and Python in the future though, see here: #2616

For now though, I'd make sure you're within bounds of supported versions or use the static versions defined in the requirements files.
Do let us know if you still experience errors after the downgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants