scripts to import bank statement PDFs into hledger files
Supported banks:
- ABN AMRO
- BNP Paribas
- ING.de
- ING.fr
- Mercedes-Benz Bank
- VTB Direktbank
Not actual banks, but bank statement-like files:
- PayPal csv reports
Payslips:
- PayFit
- Bouygues Télécom
- Thermo Fisher Scientific
To parse a single bank statement PDF you can use the parse-bank-statement.py
script. For bulk imports you can use the import-bank-statements.py script
that tries to parse all bank statements found in
<incoming_dir>/<name_of_bank>. For each bank statement file it creates a
corresponding hledger file in one of multiple configurable ledger directories.
Those directories can be changed in the configuration file
$XDG_CONFIG_HOME/bank-statement-parser/import.cfg (where $XDG_CONFIG_HOME
defaults to $HOME/.config if unset). The default <incoming_directory> is
~/accounting/incoming and can be set via the incoming_dir key in the
common section. Ledger directories can be configured by adding a new section
with a ledger_dir entry.
Here's an example configuration file with one ledger dir at
~/accounting/bank_statements:
[common]
incoming_dir = ~/accounting/incoming
[my bank statements]
ledger_dir = ~/accounting/bank_statements
If you have more than one ledger config section in your import.cfg, you have
to have a file $XDG_CONFIG_HOME/bank-statement-parser/select_ledger.py that
contains a function
def select_ledger(metadata: BankStatementMetadata) -> str:
...that maps the metadata of a bank statement (normally containing IBAN, account
owner and other identifying data) to a section name from your import.cfg.
The abovementionned scripts are compatible with Python 3.9 or later.
To parse PDF files the bank statement parser uses pdftotext, which in Debian
is part of the poppler-utils package.
The transactions on bank statements parsed by the scripts
parse-bank-statment.py and import-bank-statement.py can be automatically
cleaned up (e.g. to prettify the subject line) and assigned to the right
accounts.
To this end, the scripts optionally read two Python files and a config file
containing account names from the directory <ledger_dir>/rules/<name_of_bank>.
All parsers contain a dict with account names, containing, e.g. the
name of the account corresponding to the bank account of a bank statement
or certain recurring external accounts, e.g. for banking fees or interests.
The default accounts can be found the DEFAULT_ACCOUNTS attribute of each
parser's config type. They can be customized by specifying alternative
account names in the config file
<ledger_dir>/rules/<name_of_bank>/accounts.cfg. This file is an ini file
that has to contain an [accounts] section which contains the customized
account names. Any unspecified account names fall back to the ones defined in
DEFAULT_ACCOUNTS.
The first Python file is cleaning_rules.py, which has to contain a variable
rules containing a list of Rules that are applied one after one to each
transaction on the bank statement. Rule, here, is a class that is implicitly
imported into cleaning_rules.py. You can create a Rule as
Rule(predicate, clean) where predicate is a function taking a
Transaction as its argument an returning a bool that is True if the
clean function should be applied to the transaction. clean on the other
hand also takes a Transaction as its argument, but is expected to return a
modified transaction description. For more complicated cleaning rules, Rule
accepts an optional argument field with which you can specify that the
clean function returns another field than the Transaction's description.
See parsers/banks/cleaning_rules/ing_fr.py for some examples of already
built-in cleaning rules.
The second Python file to transform a bank statement's Transactions is the
file account_mappings.py. As the name suggests, its purpose is to assign
accounts to the Transactions. To this end, it contains a variable rules
that contains a list of functions that take a Transaction as argument and
return a str specifying the account to apply to the transaction or None
if the rule doesn't know which account to assign. The rules are then applied
in the given order to each Transaction until the first non-None result
is encountered which is then assigned as the external account of the
Transaction.
As a very simplistic example, you could have the following
account_mappings.py with a single rule that applies the account
income:salary to transactions with the word salary in their
description:
def salary(t):
if 'salary' in t.description:
return 'income:salary'
rules = [salary]Here, we've used the fact that Python functions implicitly return None
when reaching their end without encountering a return statement. Normally
this should be considered a bad coding style, as it might not be clear if
None is really the expected return value or if we simply forgot a return
statement at the end of the function. In the specific case of our account
mapping rules, we do however always expect a return None if the rule does
not match. The normal consideration, that a missing return statement at the
end of the function might be a code smell is thus not really justified here.
Therefore, the implicit return of None could be used to keep the mapping
rules short.
To make use of all properties of a Transaction object in your cleaning or
mapping rules, you can take a look at its definition in transaction.py.
The most important properties are
description: The subject line of the transactionaccount: The bank account to which the bank statement belongsexternal_account: the other side of the transaction; this is what is filled by your account mapping rulesamountcurrencyoperation_date: date of the transactionvalue_date: date when the transaction changes the balance of your accountexternal_value_date: value date of the external account
Additionally, each Transaction contains a dict named metadata which
can be filled by your cleaning rules with arbitrary metadata that can then
be used in your account mappings.
During development or debugging of cleaning and mapping rules it might be
useful to see the internal representation of Transactions. To this end,
parse-bank-statement.py has the option --raw which prints the internal
representation of the Transaction objects instead of formatting them for
hledger.
These programs are licensed under the GPL version 3 or (at your option) any later version.
The text of the GPL version 3 can be found in the LICENSES directory.