Quote ingest using apache stack: arrow / parquet

In Follow up to #486, it'd sure be nice to be able to move away
from our current `multiprocessing.shared_memory` approach for
real-time quote/tick ingest and possibly leverage an apache
standard format such as `arrow` and `parquet`.

As part of improving the `.parquet` file based tsdb IO from #486
obviously it'd be ideal to support df appends instead of only full
overwrites :joy:.

---
##### ToDo content from #486
pertaining to `StorageClient.write_ohlcv()` write on backfills and
rt ingest. rn the write is masked out mostly bc there's some
details to work out on when/how frequent the writes to parquet
files should happen, particularly whether to "append" to parquet
files: turns out there's options for appending (faster then
overwriting i guess?) to parquet, particularly using `fastparquet`,
see the below resources:

  - [ ] for python we can likely use: https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write
    - also note the `times` options with the int96 format which
      embeds nanoseconds B)
    - the `custom_metadata`: dict can only be used on overwrite :eyes:
      - can use the https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.update_file_custom_metadata
        to update metadata if needed?
  - [ ] https://stackoverflow.com/questions/39234391/how-to-append-data-to-an-existing-parquet-file
  - [ ] https://stackoverflow.com/questions/47191675/pandas-write-dataframe-to-parquet-format-with-append/74209756#74209756

  - [ ] other langs and spark related:
    - https://issues.apache.org/jira/browse/PARQUET-1022
    - https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/SaveMode.html
    - https://stackoverflow.com/questions/39234391/how-to-append-data-to-an-existing-parquet-file/42140475#42140475


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quote ingest using apache stack: arrow / parquet #536

ToDo content from #486

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quote ingest using apache stack: arrow / parquet #536

Description

ToDo content from #486

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions