|
115 | 115 | "source": [
|
116 | 116 | "## Getting started\n",
|
117 | 117 | "\n",
|
118 |
| - "First, install the required dependencies. \n", |
119 |
| - "\n", |
120 |
| - "```bash\n", |
121 |
| - "pip install -r requirements.txt\n", |
122 |
| - "```\n", |
| 118 | + "First, install the required dependencies. " |
| 119 | + ] |
| 120 | + }, |
| 121 | + { |
| 122 | + "cell_type": "code", |
| 123 | + "execution_count": 1, |
| 124 | + "metadata": {}, |
| 125 | + "outputs": [], |
| 126 | + "source": [ |
| 127 | + "#!pip install -r requirements.txt\n", |
123 | 128 | "\n",
|
| 129 | + "# In an environment like Google Colab, please use the absolute URL to the requirements.txt file.\n", |
| 130 | + "# Note: Some inconsistencies of dependencies might get reported. They can usually be ignored.\n", |
| 131 | + "# Restart the runtime, if asked by Colab.\n", |
| 132 | + "#!pip install -r https://raw.githubusercontent.com/crate/cratedb-examples/main/topic/machine-learning/automl/requirements.txt" |
| 133 | + ] |
| 134 | + }, |
| 135 | + { |
| 136 | + "cell_type": "markdown", |
| 137 | + "metadata": {}, |
| 138 | + "source": [ |
124 | 139 | "**Note:** As of time of this writing, PyCaret requires Python 3.8, 3.9 or 3.10.\n",
|
125 | 140 | "\n",
|
126 | 141 | "Second, you will need a CrateDB instance to store and serve the data. The easiest\n",
|
|
131 | 146 | "create an `.env` file with the following content:\n",
|
132 | 147 | "\n",
|
133 | 148 | "```env\n",
|
134 |
| - "CRATE_HOST=<your-crate-host> # set this to localhost if you're running crate locally\n", |
135 |
| - "CRATE_USER=<your-crate-user> # set this to crate if you're running crate locally\n", |
136 |
| - "CRATE_PASSWORD=<your-crate-password> # set this to \"\" if you're running crate locally\n", |
137 |
| - "CRATE_SSL=true # set this to false if you're running crate locally\n", |
| 149 | + "# use this string for a connection to CrateDB Cloud\n", |
| 150 | + "CONNECTION_STRING=crate://username:password@hostname/?ssl=true \n", |
| 151 | + "\n", |
| 152 | + "# use this string for a local connection to CrateDB\n", |
| 153 | + "# CONNECTION_STRING=crate://crate@localhost/?ssl=false\n", |
138 | 154 | "```\n",
|
139 | 155 | "\n",
|
140 | 156 | "You can find your CrateDB credentials in the [CrateDB Cloud Console].\n",
|
141 | 157 | "\n",
|
142 | 158 | "[CrateDB Cloud Console]: https://cratedb.com/docs/cloud/en/latest/reference/overview.html#cluster\n",
|
143 |
| - "[deploy a cluster]: https://cratedb.com/docs/cloud/en/latest/tutorials/deploy/stripe.html#deploy-cluster\n", |
144 |
| - "\n", |
145 |
| - "### Creating demo data\n", |
| 159 | + "[deploy a cluster]: https://cratedb.com/docs/cloud/en/latest/tutorials/deploy/stripe.html#deploy-cluster" |
| 160 | + ] |
| 161 | + }, |
| 162 | + { |
| 163 | + "cell_type": "code", |
| 164 | + "execution_count": 2, |
| 165 | + "metadata": {}, |
| 166 | + "outputs": [], |
| 167 | + "source": [ |
| 168 | + "import os\n", |
146 | 169 | "\n",
|
147 |
| - "For convenience, this notebook comes with an accompanying CSV dataset which you\n", |
148 |
| - "can quickly import into the database. Upload the CSV file to your CrateDB cloud\n", |
149 |
| - "cluster, as described [here](https://cratedb.com/docs/cloud/en/latest/reference/overview.html#import).\n", |
150 |
| - "To follow this notebook, choose `pycaret_churn` for your table name.\n", |
| 170 | + "# For CrateDB Cloud, use:\n", |
| 171 | + "CONNECTION_STRING = os.environ.get(\n", |
| 172 | + " \"CRATEDB_CONNECTION_STRING\",\n", |
| 173 | + " \"crate://username:password@hostname/?ssl=true\",\n", |
| 174 | + ")\n", |
151 | 175 | "\n",
|
152 |
| - "This will automatically create a new database table and import the data." |
| 176 | + "# For an self-deployed CrateDB, e.g. via Docker, please use:\n", |
| 177 | + "# CONNECTION_STRING = os.environ.get(\n", |
| 178 | + "# \"CRATEDB_CONNECTION_STRING\",\n", |
| 179 | + "# \"crate://crate@localhost/?ssl=false\",\n", |
| 180 | + "# )" |
153 | 181 | ]
|
154 | 182 | },
|
155 | 183 | {
|
156 | 184 | "cell_type": "markdown",
|
157 | 185 | "metadata": {},
|
158 | 186 | "source": [
|
| 187 | + "### Creating demo data\n", |
| 188 | + "\n", |
| 189 | + "For convenience, this notebook comes with an accompanying CSV dataset which you\n", |
| 190 | + "can quickly import into the database. Upload the CSV file to your CrateDB cloud\n", |
| 191 | + "cluster, as described [here](https://cratedb.com/docs/cloud/en/latest/reference/overview.html#import).\n", |
| 192 | + "To follow this notebook, choose `pycaret_churn` for your table name.\n", |
| 193 | + "\n", |
| 194 | + "This will automatically create a new database table and import the data.\n", |
| 195 | + "\n", |
159 | 196 | "### Alternative data import using code\n",
|
160 | 197 | "\n",
|
161 | 198 | "If you prefer to use code to import your data, please execute the following lines which read the CSV\n",
|
|
175 | 212 | "if os.path.exists(\".env\"):\n",
|
176 | 213 | " dotenv.load_dotenv(\".env\", override=True)\n",
|
177 | 214 | "\n",
|
178 |
| - "dburi = f\"crate://{os.environ['CRATE_USER']}:{os.environ['CRATE_PASSWORD']}@{os.environ['CRATE_HOST']}:4200?ssl={os.environ['CRATE_SSL']}\"\n", |
179 |
| - "engine = sa.create_engine(dburi, echo=os.environ.get('DEBUG'))\n", |
| 215 | + "engine = sa.create_engine(CONNECTION_STRING, echo=os.environ.get('DEBUG'))\n", |
180 | 216 | "df = pd.read_csv(\"https://github.com/crate/cratedb-datasets/raw/main/machine-learning/automl/churn-dataset.csv\")\n",
|
181 | 217 | "\n",
|
182 | 218 | "with engine.connect() as conn:\n",
|
|
214 | 250 | "if os.path.exists(\".env\"):\n",
|
215 | 251 | " dotenv.load_dotenv(\".env\", override=True)\n",
|
216 | 252 | "\n",
|
217 |
| - "dburi = f\"crate://{os.environ['CRATE_USER']}:{os.environ['CRATE_PASSWORD']}@{os.environ['CRATE_HOST']}:4200?ssl={os.environ['CRATE_SSL']}\"\n", |
218 |
| - "engine = sa.create_engine(dburi, echo=os.environ.get('DEBUG'))\n", |
| 253 | + "engine = sa.create_engine(CONNECTION_STRING, echo=os.environ.get('DEBUG'))\n", |
219 | 254 | "\n",
|
220 | 255 | "with engine.connect() as conn:\n",
|
221 | 256 | " with conn.execute(sa.text(\"SELECT * FROM pycaret_churn\")) as cursor:\n",
|
|
224 | 259 | "# We set the MLFLOW_TRACKING_URI to our CrateDB instance. We'll see later why\n",
|
225 | 260 | "os.environ[\n",
|
226 | 261 | " \"MLFLOW_TRACKING_URI\"\n",
|
227 |
| - "] = f\"{dburi}&schema=mlflow\"" |
| 262 | + "] = f\"{CONNECTION_STRING}&schema=mlflow\"" |
228 | 263 | ]
|
229 | 264 | },
|
230 | 265 | {
|
|
966 | 1001 | "# - \"n_select\" defines how many models are selected.\n",
|
967 | 1002 | "# - \"exclude\" defines which models are excluded from the comparison.\n",
|
968 | 1003 | "\n",
|
| 1004 | + "# Note: This is only relevant if we are executing automated tests\n", |
969 | 1005 | "if \"PYTEST_CURRENT_TEST\" in os.environ:\n",
|
970 | 1006 | " best_models = compare_models(sort=\"AUC\", include=[\"lr\", \"knn\"], n_select=3)\n",
|
| 1007 | + "# If we are not in an automated test, compare the available models\n", |
971 | 1008 | "else:\n",
|
972 | 1009 | " # For production scenarios, it might be worth to include \"lightgbm\" again.\n",
|
973 | 1010 | " best_models = compare_models(sort=\"AUC\", exclude=[\"lightgbm\"], n_select=3)"
|
|
3406 | 3443 | "source": [
|
3407 | 3444 | "os.environ[\n",
|
3408 | 3445 | " \"MLFLOW_TRACKING_URI\"\n",
|
3409 |
| - "] = f\"crate://{os.environ['CRATE_USER']}:{os.environ['CRATE_PASSWORD']}@{os.environ['CRATE_HOST']}:4200?ssl={os.environ['CRATE_SSL']}&schema=mlflow\"" |
| 3446 | + "] = f\"{CONNECTION_STRING}&schema=mlflow\"" |
3410 | 3447 | ]
|
3411 | 3448 | },
|
3412 | 3449 | {
|
|
3484 | 3521 | ],
|
3485 | 3522 | "metadata": {
|
3486 | 3523 | "kernelspec": {
|
3487 |
| - "display_name": "crate", |
| 3524 | + "display_name": "Python 3 (ipykernel)", |
3488 | 3525 | "language": "python",
|
3489 | 3526 | "name": "python3"
|
3490 | 3527 | },
|
|
3498 | 3535 | "name": "python",
|
3499 | 3536 | "nbconvert_exporter": "python",
|
3500 | 3537 | "pygments_lexer": "ipython3",
|
3501 |
| - "version": "3.10.0" |
| 3538 | + "version": "3.11.4" |
3502 | 3539 | }
|
3503 | 3540 | },
|
3504 | 3541 | "nbformat": 4,
|
|
0 commit comments