-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
2023 Aug 16
- CSVScanner has needs:
- provide delimiter params so that
csv.reader
can do its job correctly- extend those parameters to the argparse cli options
- should know nothing about any db or DDL dialect: a function or method reference should be passed in for handling the result of CSVScanner.scan()
- break out the progress indicator by shoving it into a
progress_fn: Callable
callback parameter [8/20] - provide a
progress_interval: int
parameter to control the frequency of the progress indicator [8/20] - row iteration: provide a
sample_pct: number
which specifies the percentage of rows to check for type or length- maybe use
self._csv_fh.seek(n)
to skip to the next apparent sample row; this might necessitate reinstantiatingcsv.reader
to begin after the next newline - or, use a
io.TextBuffer
to skip rows behind the scenes so that thereader
instance doesn't get affected
- maybe use
- if possible, abstract out that this is about CSV or TSV and make scanning any data source feasible, by passing in a class that handles opening, iterating, breaking down the data source, being invoked by CSVScanner to produce a row, a block of rows, which will be processed in
.scan()
. Maybe too Java-like tho, maybe make CSVScanner a subclass of an ABC DataScanner.
- provide delimiter params so that
- csv2db.py
-
zip_walker
needs to be a class ZipCollection with a base called CSVCollection or something- it's the head interface for instantiating CSVScanner, and outputting
CSVScanner.result()
, socsv.reader
parameters need to go here
- it's the head interface for instantiating CSVScanner, and outputting
-
create_import_sqlite
does a lot of heavy lifting by interfacing with the given DBMS, issuingcreate table xyz
and then inserting rows. Abstractifying some of this would be healthy:- sql dialect
- separate out the create and insert into at least separate methods, but probably separate classes
- provide the same type of
progress_fn: Callable
callback andprogress_interval: int
thatcsv2db.py
provides.progress_interval: number
could be a percent, or an every-n-rows sort of event criteria [8/20]
-
- regex filter file pathname & extension from cli [8/20]
- logging: offer a log level level setting via argparse
- all stdout should be routed to a callback: a caller using only the lib should be responsible for any console or gui output [8/20]
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request