Skip to content

Some things to work on #1

@Kevin-Prichard

Description

@Kevin-Prichard

2023 Aug 16

  • CSVScanner has needs:
    • provide delimiter params so that csv.reader can do its job correctly
      • extend those parameters to the argparse cli options
    • should know nothing about any db or DDL dialect: a function or method reference should be passed in for handling the result of CSVScanner.scan()
    • break out the progress indicator by shoving it into a progress_fn: Callable callback parameter [8/20]
    • provide a progress_interval: int parameter to control the frequency of the progress indicator [8/20]
    • row iteration: provide a sample_pct: number which specifies the percentage of rows to check for type or length
      • maybe use self._csv_fh.seek(n) to skip to the next apparent sample row; this might necessitate reinstantiating csv.reader to begin after the next newline
      • or, use a io.TextBuffer to skip rows behind the scenes so that the reader instance doesn't get affected
    • if possible, abstract out that this is about CSV or TSV and make scanning any data source feasible, by passing in a class that handles opening, iterating, breaking down the data source, being invoked by CSVScanner to produce a row, a block of rows, which will be processed in .scan(). Maybe too Java-like tho, maybe make CSVScanner a subclass of an ABC DataScanner.
  • csv2db.py
    • zip_walker needs to be a class ZipCollection with a base called CSVCollection or something
      • it's the head interface for instantiating CSVScanner, and outputting CSVScanner.result(), so csv.reader parameters need to go here
    • create_import_sqlite does a lot of heavy lifting by interfacing with the given DBMS, issuing create table xyz and then inserting rows. Abstractifying some of this would be healthy:
      • sql dialect
      • separate out the create and insert into at least separate methods, but probably separate classes
      • provide the same type of progress_fn: Callable callback and progress_interval: int that csv2db.py provides. progress_interval: number could be a percent, or an every-n-rows sort of event criteria [8/20]
  • regex filter file pathname & extension from cli [8/20]
  • logging: offer a log level level setting via argparse
  • all stdout should be routed to a callback: a caller using only the lib should be responsible for any console or gui output [8/20]

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions