Skip to content
This repository was archived by the owner on Sep 3, 2022. It is now read-only.

SQL Modules and Variable Expansion

Graham Wheeler edited this page Jun 3, 2016 · 6 revisions

The %sql cell magic serves two purposes:

  • executing a SQL query statement immediately (if the -m/--module argument is omitted);
  • defining a SQL module for later execution (if the -m/--module argument is used).

Variable substitution is done immediately in the first case using the IPython notebook execution environment.

In the second case a Python module is created and immediately imported, using the name supplied in the -m/--module argument. This module will have several variables defined in its namespace:

  • _sql_module_arg_parser is a Python argparse argument parser that is created from any Python variable definitions at the start of the cell;
  • _sql_module_main is the main query statement (datalab.data.SqlStatement object) to be executed for the cell; i.e. the one with no DEFINE prefix;
  • _sql_module_last is the last query statement defined in the cell; this is frequently the same as _sql_module_main
  • for each SQL query that starts with a DEFINE <name> prefix, `<name>' will be bound to that query statement.

The argparser is built by finding the start of the first query statement, and treating all the code preceding it as Python code. This code is executed in new environment that has a few helper functions predefined, and then the resulting environment is introspected to create an argument parser. For example:

x = 3

will result in a -x argument being defined with a default value of 3.

The special predefined helper functions are source and datestring.

TODO: variable/UDF expansion

Clone this wiki locally