From 1066c070a22a2cd4901b4202f4caa2ab4255d69a Mon Sep 17 00:00:00 2001 From: Shweta Adhikari Date: Wed, 6 Nov 2024 22:45:54 +0000 Subject: [PATCH 01/10] Curriculum Docs Update --- docs/analytics_onboarding/overview.md | 6 ++--- docs/analytics_tools/jupyterhub.md | 33 +++++++++++++++++------ docs/analytics_tools/knowledge_sharing.md | 8 ++++++ docs/analytics_tools/saving_code.md | 16 +++++++++++ docs/analytics_tools/tools_quick_links.md | 2 +- docs/analytics_welcome/how_we_work.md | 8 ------ docs/publishing/sections/6_metabase.md | 4 +++ 7 files changed, 57 insertions(+), 20 deletions(-) diff --git a/docs/analytics_onboarding/overview.md b/docs/analytics_onboarding/overview.md index 7c797b3b8e..70bef4e7b9 100644 --- a/docs/analytics_onboarding/overview.md +++ b/docs/analytics_onboarding/overview.md @@ -32,9 +32,9 @@ **Python Libraries:** -- [ ] **calitp-data-analysis** - Cal-ITP's internal Python library for analysis | ([Docs](calitp-data-analysis)) -- [ ] **siuba** - Recommended data analysis library | ([Docs](siuba)) -- [ ] [**shared_utils**](https://github.com/cal-itp/data-analyses/tree/main/_shared_utils) and [**here**](https://github.com/cal-itp/data-infra/tree/main/packages/calitp-data-analysis/calitp_data_analysis) - A shared utilities library for the analytics team | ([Docs](shared-utils)) +- [ ] [**calitp-data-analysis**](https://github.com/cal-itp/data-infra/tree/main/packages/calitp-data-analysis/calitp_data_analysis) - Cal-ITP's internal Python library for analysis | ([Docs](calitp-data-analysis)) +- [ ] [**siuba**](https://siuba.org/) - Recommended data analysis library | ([Docs](siuba)) +- [ ] [**shared_utils**](https://github.com/cal-itp/data-analyses/tree/main/_shared_utils) - A shared utilities library for the analytics team | ([Docs](shared-utils)) **Caltrans Employee Resources:** diff --git a/docs/analytics_tools/jupyterhub.md b/docs/analytics_tools/jupyterhub.md index 4160fb3bbb..c444028756 100644 --- a/docs/analytics_tools/jupyterhub.md +++ b/docs/analytics_tools/jupyterhub.md @@ -14,14 +14,15 @@ Analyses on JupyterHub are accomplished using notebooks, which allow users to mi 01. [Using JupyterHub](#using-jupyterhub) 02. [Logging in to JupyterHub](#logging-in-to-jupyterhub) -03. [Connecting to the Warehouse](#connecting-to-the-warehouse) -04. [Increasing the Query Limit](#increasing-the-query-limit) -05. [Increase the User Storage Limit](#increasing-the-storage-limit) -06. [Querying with SQL in JupyterHub](querying-sql-jupyterhub) -07. [Saving Code to Github](saving-code-jupyter) -08. [Environment Variables](#environment-variables) -09. [Jupyter Notebook Best Practices](notebook-shortcuts) -10. [Developing warehouse models in Jupyter](jupyterhub-warehouse) +03. [Default vs Power User](#default-user-vs-power-user) +04. [Connecting to the Warehouse](#connecting-to-the-warehouse) +05. [Increasing the Query Limit](#increasing-the-query-limit) +06. [Increase the User Storage Limit](#increasing-the-storage-limit) +07. [Querying with SQL in JupyterHub](querying-sql-jupyterhub) +08. [Saving Code to Github](saving-code-jupyter) +09. [Environment Variables](#environment-variables) +10. [Jupyter Notebook Best Practices](notebook-shortcuts) +11. [Developing warehouse models in Jupyter](jupyterhub-warehouse) (using-jupyterhub)= @@ -39,6 +40,22 @@ JupyterHub currently lives at [notebooks.calitp.org](https://notebooks.calitp.or Note: you will need to have been added to the Cal-ITP organization on GitHub to obtain access. If you have yet to be added to the organization and need to be, ask in the `#services-team` channel in Slack. +(default-user-vs-power-user)= + +### Default User vs Power User + +#### Default User + +Designed for general use and is ideal for less resource-intensive tasks. It's a good starting point for most users who don't expect to run very large, memory-hungry jobs. + +Default User profile offers quick availability since it uses less memory and can allocate a smaller node, allowing you to start tasks faster. However, if your task grows in memory usage over time, it may exceed the node's capacity, potentially causing the system to terminate your job. This makes the Default profile best for smaller or medium-sized tasks that don’t require a lot of memory. If your workload exceeds these limits, you might experience instability or crashes. + +#### Power User + +Intended for more demanding, memory-intensive tasks that require more resources upfront. This profile is suitable for workloads that have higher memory requirements or are expected to grow during execution. + +Power User profile allocates a full node or a significant portion of resources to ensure your job has enough memory and computational power, avoiding crashes or delays. However, this comes with a longer wait time as the system needs to provision a new node for you. Once it's ready, you'll have all the resources necessary for memory-intensive tasks like large datasets or simulations. The Power User profile is ideal for jobs that might be unstable or crash on the Default profile due to higher resource demands. Additionally, it offers scalability—if your task requires more resources than the initial node can provide, the system will automatically spin up additional nodes to meet the demand. + (connecting-to-the-warehouse)= ### Connecting to the Warehouse diff --git a/docs/analytics_tools/knowledge_sharing.md b/docs/analytics_tools/knowledge_sharing.md index e8fd7fc9f2..cc33f0b91a 100644 --- a/docs/analytics_tools/knowledge_sharing.md +++ b/docs/analytics_tools/knowledge_sharing.md @@ -17,6 +17,7 @@ Here are some resources data analysts have collected and referenced, that will h - [DataFrames](#dataframes) - [Ipywidgets](#ipywidgets) - [Markdown](#markdown) + - [ReviewNB](#reviewNB) (data-analysis)= @@ -188,3 +189,10 @@ def add_tooltip(chart, tooltip1, tooltip2): - [Add a table of content that links to headers throughout a markdown file.](https://stackoverflow.com/questions/2822089/how-to-link-to-part-of-the-same-document-in-markdown) - [Add links to local files.](https://stackoverflow.com/questions/32563078/how-link-to-any-local-file-with-markdown-syntax?rq=1) - [Direct embed an image.](https://datascienceparichay.com/article/insert-image-in-a-jupyter-notebook/) + +(reviewNB)= + +### ReviewNB on GitHub + +- [Tool designed to facilitate reviewing Jupyter Notebooks in a collaborative setting on GitHub](https://www.reviewnb.com/) +- [Shows side-by-side diffs of Jupyter Notebooks, including changes to both code cells and markdown cells and allows reviewers to comment on specific cells](https://www.reviewnb.com/#faq) diff --git a/docs/analytics_tools/saving_code.md b/docs/analytics_tools/saving_code.md index dd663e30e7..195dc9b9ed 100644 --- a/docs/analytics_tools/saving_code.md +++ b/docs/analytics_tools/saving_code.md @@ -9,11 +9,16 @@ Doing work locally and pushing directly from the command line is a similar workf ## Table of Contents 1. What's a typical [project workflow](#project-workflow)? + 2. Someone is collaborating on my branch, how do we [stay in sync](#pulling-and-pushing-changes)? + - The `main` branch is ahead, and I want to [sync my branch with `main`](#rebase-and-merge) - [Rebase](#rebase) or [merge](#merge) - Options to [Resolve Merge Conflicts](#resolve-merge-conflicts) + - [Other Common Issues](#other-common-github-issues-encountered-during-saving-codes) + 3. [Other Common GitHub Commands](#other-common-github-commands) + - [External Git Resources](#external-git-resources) - [Committing in the Github User Interface](#pushing-drag-drop) @@ -111,6 +116,17 @@ If you discover merge conflicts and they are within a single notebook that only `git checkout --theirs path/to/notebook.ipynb` - From here, just add the file and commit with a message as you normally would and the conflict should be fixed in your Pull Request. +(other-common-github-issues-encountered-during-saving-codes) + +### Other Common Issues + +- Untracked Files: + Sometimes, files are created or modified locally but are not added to Git before committing, so they are not tracked or pushed to GitHub. Use `git add ` to track files before committing. +- Incorrect Branches: + Committing to the wrong branch (e.g., main instead of a feature branch) can cause problems, especially if changes are not meant to be merged into the main codebase. Always ensure you're on the correct branch using git branch and switch branches with `git checkout ` before committing. +- Merge Conflicts from Overlapping Work: + When multiple analysts work on the same files or sections of code, merge conflicts can occur. Creating feature branches and pulling regularly to stay updated with main can help avoid these conflicts. + (other-common-github-commands)= ## Other Common GitHub Commands diff --git a/docs/analytics_tools/tools_quick_links.md b/docs/analytics_tools/tools_quick_links.md index 56f59f153f..ac16852f54 100644 --- a/docs/analytics_tools/tools_quick_links.md +++ b/docs/analytics_tools/tools_quick_links.md @@ -7,7 +7,7 @@ | Tool | Purpose | | -------------------------------------------------------------------------------------------------- | --------------------------------------- | | [**Analytics Repo**](https://github.com/cal-itp/data-analyses) | Analytics team code repository. | -| [**Analytics Project Board**](https://github.com/cal-itp/data-analyses/projects/1) | Analytics team work management. | +| [**Analytics Project Board**](https://github.com/cal-itp/data-analyses/projects/1) | Analytics team list of active issues. | | [**notebooks.calitp.org**](https://notebooks.calitp.org/) | JupyterHub cloud-based notebooks | | [**dashboards.calitp.org**](https://dashboards.calitp.org/) | Metabase dashboards & Business Insights | | [**dbt-docs.calitp.org**](https://dbt-docs.calitp.org/) | DBT warehouse documentation | diff --git a/docs/analytics_welcome/how_we_work.md b/docs/analytics_welcome/how_we_work.md index 4e2ad7a028..3f9e9875f9 100644 --- a/docs/analytics_welcome/how_we_work.md +++ b/docs/analytics_welcome/how_we_work.md @@ -27,14 +27,6 @@ The section below outlines our team's primary meetings and their purposes, as we | #**data-office-hours** | Discussion | A place to bring questions, issues, and observations for team discussion. | | #**data-warehouse-devs** | Discussion | For people building dbt models - focused on data warehouse performance considerations, etc. | -## Collaboration Tools - -(analytics-project-board)= - -### GitHub Analytics Project Board - -**You can access The Analytics Project Board [using this link](https://github.com/cal-itp/data-analyses/projects/1)**. - #### How We Track Work ##### Screencast - Navigating the Board diff --git a/docs/publishing/sections/6_metabase.md b/docs/publishing/sections/6_metabase.md index 840680c243..06b781b9f0 100644 --- a/docs/publishing/sections/6_metabase.md +++ b/docs/publishing/sections/6_metabase.md @@ -9,3 +9,7 @@ An [Airflow DAG](https://github.com/cal-itp/data-infra/tree/main/airflow/dags) n Any tweaks to the data processing steps are easily done in scripts and notebooks, and it ensures that the visualizations in the dashboard remain updated with little friction. Ex: [Payments Dashboard](https://dashboards.calitp.org/dashboard/3-payments-performance-dashboard?transit_provider=mst) + +## Metabase Training Guide 2024 + +Please see the [Cal-ITP Metabase Training Guide](https://docs.google.com/document/d/1ag9qmSDWF9d30lGyKcvAAjILt1sCIJhK7wuUYkfAals/edit?tab=t.0#heading=h.xdjzmfck1e7) to see how to utilize the data warehouse to create meaningful and effective visuals and analyses. From c3521e8b8a79f9fd4c66ca19d03a54d3a81e95bf Mon Sep 17 00:00:00 2001 From: Shweta Adhikari Date: Tue, 19 Nov 2024 00:27:12 +0000 Subject: [PATCH 02/10] curriculum_docs_update --- docs/analytics_onboarding/overview.md | 3 +++ docs/analytics_tools/knowledge_sharing.md | 3 +-- docs/analytics_tools/tools_quick_links.md | 6 ------ docs/analytics_welcome/how_we_work.md | 12 ------------ docs/analytics_welcome/overview.md | 4 ++++ 5 files changed, 8 insertions(+), 20 deletions(-) diff --git a/docs/analytics_onboarding/overview.md b/docs/analytics_onboarding/overview.md index 70bef4e7b9..07f98e8306 100644 --- a/docs/analytics_onboarding/overview.md +++ b/docs/analytics_onboarding/overview.md @@ -38,10 +38,13 @@ **Caltrans Employee Resources:** +- [ ] [**Organizational Chart**](https://pmp.onramp.dot.ca.gov/downloads/pmp/files/Splash%20Page/org-charts-10-2024/DDS_OrgChart_October2024-signed.pdf) - Data and Digital Services Organizational Chart - [ ] [**OnRamp**](https://onramp.dot.ca.gov/) - Caltrans employee intranet - [ ] [**Service Now (SNOW)**](https://cdotprod.service-now.com/sp) - Caltrans IT Service Management Portal for IT issues and requesting specific software - [ ] [**Cal Employee Connect**](https://connect.sco.ca.gov/) - State Controller's Office site for paystubs and tax information - [ ] [**Geospatial Enterprise Engagement Platform - GIS Account Request Form**](https://sv03tmcpo.ct.dot.ca.gov/portal/apps/sites/#/geep/pages/account-request) (optional) - User request form for ArcGIS Online and ArcGIS Portal accounts +- [ ] [**Planning Handbook**](https://transportationplanning.onramp.dot.ca.gov/caltrans-transportation-planning-handbook) - Caltrans Transportation Planning Handbook +- [ ] [**California Public Employees Retirement System**](https://www.calpers.ca.gov/) - System that manages pension and health benefits   (get-help)= diff --git a/docs/analytics_tools/knowledge_sharing.md b/docs/analytics_tools/knowledge_sharing.md index cc33f0b91a..ef0840e67c 100644 --- a/docs/analytics_tools/knowledge_sharing.md +++ b/docs/analytics_tools/knowledge_sharing.md @@ -2,7 +2,7 @@ # Helpful Links -Here are some resources data analysts have collected and referenced, that will hopefully help you out in your work. Have something you want to share? Create a new markdown file, add it [to the example report folder](https://github.com/cal-itp/data-analyses/tree/main/example_report), and [message Amanda.](https://app.slack.com/client/T014965JTHA/C013N8GELLF/user_profile/U02PCTPSZ8A) +Here are some resources data analysts have collected and referenced, that will hopefully help you out in your work. - [Data Analysis](#data-analysis) - [Python](#python) @@ -160,7 +160,6 @@ def add_tooltip(chart, tooltip1, tooltip2): ### Maps -- [Examples of folium, branca, and color maps.](https://nbviewer.org/github/python-visualization/folium/blob/v0.2.0/examples/Colormaps.ipynb) - [Quick interactive maps with Geopandas.gdf.explore()](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.explore.html) (dataframes)= diff --git a/docs/analytics_tools/tools_quick_links.md b/docs/analytics_tools/tools_quick_links.md index ac16852f54..ac84f46745 100644 --- a/docs/analytics_tools/tools_quick_links.md +++ b/docs/analytics_tools/tools_quick_links.md @@ -14,9 +14,3 @@ | [**analysis.calitp.org**](https://analysis.calitp.org/) | Analytics portfolio landing page | | [**Google BigQuery**](https://console.cloud.google.com/bigquery) | Our warehouse and SQL Querying | | [**Google Cloud Storage**](https://console.cloud.google.com/storage/browser/calitp-analytics-data) | Cloud file storage | - -  - -```{admonition} Still need access to a tool on this page? -Ask in the `#services-team` channel in the Cal-ITP Slack. -``` diff --git a/docs/analytics_welcome/how_we_work.md b/docs/analytics_welcome/how_we_work.md index 3f9e9875f9..acccbafaa8 100644 --- a/docs/analytics_welcome/how_we_work.md +++ b/docs/analytics_welcome/how_we_work.md @@ -27,18 +27,6 @@ The section below outlines our team's primary meetings and their purposes, as we | #**data-office-hours** | Discussion | A place to bring questions, issues, and observations for team discussion. | | #**data-warehouse-devs** | Discussion | For people building dbt models - focused on data warehouse performance considerations, etc. | -#### How We Track Work - -##### Screencast - Navigating the Board - -The screencast below introduces: - -- Creating new GitHub issues to track your work -- Adding your issues to our analytics project board -- Viewing all of your issues on the board (e.g. clicking your avatar to filter) - -
- (analytics-repo)= ### GitHub Analytics Repo diff --git a/docs/analytics_welcome/overview.md b/docs/analytics_welcome/overview.md index 7091cc083b..83388c56eb 100644 --- a/docs/analytics_welcome/overview.md +++ b/docs/analytics_welcome/overview.md @@ -15,6 +15,10 @@ After you've read through this section, continue reading through the remaining s ______________________________________________________________________ +- [Data and Digital Services Organizational Chart](https://pmp.onramp.dot.ca.gov/downloads/pmp/files/Splash%20Page/org-charts-10-2024/DDS_OrgChart_October2024-signed.pdf) + +______________________________________________________________________ + **Other Analytics Sections**: - [Technical Onboarding](technical-onboarding) From 7088deb63cac6b21e3e075d5d2356c7875143aea Mon Sep 17 00:00:00 2001 From: Shweta Adhikari Date: Thu, 21 Nov 2024 22:59:09 +0000 Subject: [PATCH 03/10] curriculum docs update --- docs/analytics_tools/knowledge_sharing.md | 15 +++++++++++++++ docs/publishing/sections/7_gcs.md | 14 ++++++++++---- 2 files changed, 25 insertions(+), 4 deletions(-) diff --git a/docs/analytics_tools/knowledge_sharing.md b/docs/analytics_tools/knowledge_sharing.md index ef0840e67c..f45c547af1 100644 --- a/docs/analytics_tools/knowledge_sharing.md +++ b/docs/analytics_tools/knowledge_sharing.md @@ -11,6 +11,7 @@ Here are some resources data analysts have collected and referenced, that will h - [Merging](#merging) - [Dates](#dates) - [Monetary Values](#monetary-values) + - [Tidy Data](#tidy-data) - [Visualizations](#visualization) - [Charts](#charts) - [Maps](#maps) @@ -129,6 +130,20 @@ def adjust_prices(df): return df ``` +(tidy-data)= + +### Tidy Data + +Tidy Data follows a set of principles that ensure the data is easy to work with, especially when using tools like pandas and matplotlib. Primary rules of tidy data are: + +- Each variable must have its own column. +- Each observation must have its own row. +- Each value must have its own cell. + +Tidy data ensures consistency, making it easier to work with tools like pandas, matplotlib, or seaborn. It also simplifies data manipulation, as functions like `groupby()`, `pivot()`, and `melt()` work more intuitively when the data is structured properly. Additionally, tidy data enables vectorized operations in pandas, allowing for efficient analysis on entire columns or rows at once. + +Learn more about Tidy Data [here.](https://vita.had.co.nz/papers/tidy-data.pdf) + (visualization)= ## Visualization diff --git a/docs/publishing/sections/7_gcs.md b/docs/publishing/sections/7_gcs.md index 20ca914d4b..afcef5d155 100644 --- a/docs/publishing/sections/7_gcs.md +++ b/docs/publishing/sections/7_gcs.md @@ -2,8 +2,14 @@ # GCS -NOTE: If you are planning on publishing to [CKAN](publishing-ckan) and you are -using the dbt exposure publishing framework, your data will already be saved in -GCS as part of the upload process. +### Public Data Access in GCS -TBD. +Some data stored in Cloud Storage is configured to be publicly accessible, meaning anyone on the internet can read it at any time. In Google Cloud Storage, you can make data publicly accessible either at the bucket level or the object level. At the bucket level, you can grant public access to all objects within the bucket by modifying the bucket policy. Alternatively, you can provide public access to specific objects. + +Notes: + +- Always ensure that sensitive information is not exposed when configuring public access in Google Cloud Storage. Publicly accessible data should be carefully reviewed to prevent the accidental sharing of confidential or private information. +- External users can't browse the public bucket on the web, only download individual files. If you have many files to share, it's best to use the [Command Line Interface.](https://cloud.google.com/storage/docs/access-public-data#command-line) +- There is a [function](https://github.com/cal-itp/data-analyses/blob/f62b150768fb1547c6b604cb53d122531104d099/_shared_utils/shared_utils/publish_utils.py#L16) in shared_utils that handles writing files to the public bucket, regardless of the file type (e.g., Parquet, GeoJSON, etc.) + +NOTE: If you are planning on publishing to [CKAN](publishing-ckan) and you are using the dbt exposure publishing framework, your data will already be saved in GCS as part of the upload process. From bbc2408ae1a9a9fcbebfde583aa39781f82657bd Mon Sep 17 00:00:00 2001 From: Shweta Adhikari Date: Thu, 21 Nov 2024 23:26:28 +0000 Subject: [PATCH 04/10] curriculum docs update --- docs/publishing/sections/7_gcs.md | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/docs/publishing/sections/7_gcs.md b/docs/publishing/sections/7_gcs.md index afcef5d155..20ca914d4b 100644 --- a/docs/publishing/sections/7_gcs.md +++ b/docs/publishing/sections/7_gcs.md @@ -2,14 +2,8 @@ # GCS -### Public Data Access in GCS +NOTE: If you are planning on publishing to [CKAN](publishing-ckan) and you are +using the dbt exposure publishing framework, your data will already be saved in +GCS as part of the upload process. -Some data stored in Cloud Storage is configured to be publicly accessible, meaning anyone on the internet can read it at any time. In Google Cloud Storage, you can make data publicly accessible either at the bucket level or the object level. At the bucket level, you can grant public access to all objects within the bucket by modifying the bucket policy. Alternatively, you can provide public access to specific objects. - -Notes: - -- Always ensure that sensitive information is not exposed when configuring public access in Google Cloud Storage. Publicly accessible data should be carefully reviewed to prevent the accidental sharing of confidential or private information. -- External users can't browse the public bucket on the web, only download individual files. If you have many files to share, it's best to use the [Command Line Interface.](https://cloud.google.com/storage/docs/access-public-data#command-line) -- There is a [function](https://github.com/cal-itp/data-analyses/blob/f62b150768fb1547c6b604cb53d122531104d099/_shared_utils/shared_utils/publish_utils.py#L16) in shared_utils that handles writing files to the public bucket, regardless of the file type (e.g., Parquet, GeoJSON, etc.) - -NOTE: If you are planning on publishing to [CKAN](publishing-ckan) and you are using the dbt exposure publishing framework, your data will already be saved in GCS as part of the upload process. +TBD. From cbd00f22c17d8c02cb06c1c3da8a0ca5cac633eb Mon Sep 17 00:00:00 2001 From: Shweta Adhikari Date: Thu, 21 Nov 2024 23:35:09 +0000 Subject: [PATCH 05/10] curriculum docs update --- docs/publishing/sections/7_gcs.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/docs/publishing/sections/7_gcs.md b/docs/publishing/sections/7_gcs.md index 20ca914d4b..e8bc1e2230 100644 --- a/docs/publishing/sections/7_gcs.md +++ b/docs/publishing/sections/7_gcs.md @@ -2,8 +2,16 @@ # GCS -NOTE: If you are planning on publishing to [CKAN](publishing-ckan) and you are -using the dbt exposure publishing framework, your data will already be saved in -GCS as part of the upload process. +### Public Data Access in GCS + +Some data stored in Cloud Storage is configured to be publicly accessible, meaning anyone on the internet can read it at any time. In Google Cloud Storage, you can make data publicly accessible either at the bucket level or the object level. At the bucket level, you can grant public access to all objects within the bucket by modifying the bucket policy. Alternatively, you can provide public access to specific objects. + +Notes: + +- Always ensure that sensitive information is not exposed when configuring public access in Google Cloud Storage. Publicly accessible data should be carefully reviewed to prevent the accidental sharing of confidential or private information. +- External users can't browse the public bucket on the web, only download individual files. If you have many files to share, it's best to use the [Command Line Interface.](https://cloud.google.com/storage/docs/access-public-data#command-line) +- There is a [function](https://github.com/cal-itp/data-analyses/blob/f62b150768fb1547c6b604cb53d122531104d099/_shared_utils/shared_utils/publish_utils.py#L16) in shared_utils that handles writing files to the public bucket, regardless of the file type (e.g., Parquet, GeoJSON, etc.) + +NOTE: If you are planning on publishing to [CKAN](publishing-ckan) and you are using the dbt exposure publishing framework, your data will already be saved in GCS as part of the upload process. TBD. From 4dd5c07f650c71321d4c90f3a70b0e8765dc8a8d Mon Sep 17 00:00:00 2001 From: Shweta Adhikari Date: Wed, 27 Nov 2024 00:20:26 +0000 Subject: [PATCH 06/10] Org Chart Link Updated --- docs/analytics_onboarding/overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/analytics_onboarding/overview.md b/docs/analytics_onboarding/overview.md index 07f98e8306..222d8760e6 100644 --- a/docs/analytics_onboarding/overview.md +++ b/docs/analytics_onboarding/overview.md @@ -38,7 +38,7 @@ **Caltrans Employee Resources:** -- [ ] [**Organizational Chart**](https://pmp.onramp.dot.ca.gov/downloads/pmp/files/Splash%20Page/org-charts-10-2024/DDS_OrgChart_October2024-signed.pdf) - Data and Digital Services Organizational Chart +- [ ] [**Organizational Chart**](https://pmp.onramp.dot.ca.gov/organizational-chart) - Data and Digital Services Organizational Chart - [ ] [**OnRamp**](https://onramp.dot.ca.gov/) - Caltrans employee intranet - [ ] [**Service Now (SNOW)**](https://cdotprod.service-now.com/sp) - Caltrans IT Service Management Portal for IT issues and requesting specific software - [ ] [**Cal Employee Connect**](https://connect.sco.ca.gov/) - State Controller's Office site for paystubs and tax information From c27b79ab493b93d1c1e1e5502eaea7d11a169b44 Mon Sep 17 00:00:00 2001 From: Shweta Adhikari Date: Wed, 27 Nov 2024 00:51:55 +0000 Subject: [PATCH 07/10] Changes made to Portfolio page in docs --- docs/analytics_tools/saving_code.md | 2 +- .../publishing/sections/5_analytics_portfolio_site.md | 11 ++++++++++- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/docs/analytics_tools/saving_code.md b/docs/analytics_tools/saving_code.md index 195dc9b9ed..642ed43030 100644 --- a/docs/analytics_tools/saving_code.md +++ b/docs/analytics_tools/saving_code.md @@ -123,7 +123,7 @@ If you discover merge conflicts and they are within a single notebook that only - Untracked Files: Sometimes, files are created or modified locally but are not added to Git before committing, so they are not tracked or pushed to GitHub. Use `git add ` to track files before committing. - Incorrect Branches: - Committing to the wrong branch (e.g., main instead of a feature branch) can cause problems, especially if changes are not meant to be merged into the main codebase. Always ensure you're on the correct branch using git branch and switch branches with `git checkout ` before committing. + Committing to the wrong branch (e.g., main instead of a feature branch) can cause problems, especially if changes are not meant to be merged into the main codebase. Always ensure you're on the correct branch using git branch and switch branches with `git switch -c ` before committing. - Merge Conflicts from Overlapping Work: When multiple analysts work on the same files or sections of code, merge conflicts can occur. Creating feature branches and pulling regularly to stay updated with main can help avoid these conflicts. diff --git a/docs/publishing/sections/5_analytics_portfolio_site.md b/docs/publishing/sections/5_analytics_portfolio_site.md index 12eacdf3cf..3e2e1e5ca8 100644 --- a/docs/publishing/sections/5_analytics_portfolio_site.md +++ b/docs/publishing/sections/5_analytics_portfolio_site.md @@ -11,6 +11,7 @@ Netlify is the platform turns our Jupyter Notebooks uploaded to GitHub into a fu To setup your netlify key: - Ask in Slack/Teams for a Netlify key if you don't have one yet. +- If you already have your Netlify key set up, find it by typing `cat ~/.bash_profile` into the root of your repo. - Install netlify: `npm install -g netlify-cli` - Navigate to your main directory - Edit your bash profile using Nano: @@ -47,7 +48,7 @@ Create a `README.md` file in the repo where your work lies. This also forms the Each `.yml` file creates a new site on the [Portfolio's Index Page](https://analysis.calitp.org/), so every project needs its own file. DLA Grant Analysis, SB125 Route Illustrations, and Active Transportation Program all have their own `.yml` file. -All the `.yml` files live here at [data-analyses/portfolio/sites](https://github.com/cal-itp/data-analyses/tree/main/portfolio/sites). +All the `.yml` files live here at [data-analyses/portfolio/sites](https://github.com/cal-itp/data-analyses/tree/main/portfolio/sites). Navigate to this folder to create the .yml file. Here's how to create a `yml` file: @@ -55,6 +56,8 @@ Here's how to create a `yml` file: - Name your `.yml` file. For now we will use `my_report.yml` as an example. +- `.yml` file should contain the title, directory, README.md path and notebook path. + - The structure of your `.yml` file depends on the type of your analysis: - If you have one parameterized notebook with **one parameter**: @@ -206,3 +209,9 @@ build_my_reports: git add portfolio/my_report/district_*/ portfolio/my_report/*.yml portfolio/my_report/*.md git add portfolio/sites/my_report.yml ``` + +### Delete Portfolio/ Refresh Index Page + +When creating a new portfolio and there’s an old version with existing files or content on your portfolio site or in your local environment, it’s important to clean up the old files before adding new content. + +Use python `portfolio/portfolio.py clean my_report` before deploying your report. From 306bedffc11f219cc5039525d99d7d7a34ffa385 Mon Sep 17 00:00:00 2001 From: Shweta Adhikari Date: Wed, 27 Nov 2024 00:55:34 +0000 Subject: [PATCH 08/10] TBD removed --- docs/publishing/sections/7_gcs.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/publishing/sections/7_gcs.md b/docs/publishing/sections/7_gcs.md index e8bc1e2230..afcef5d155 100644 --- a/docs/publishing/sections/7_gcs.md +++ b/docs/publishing/sections/7_gcs.md @@ -13,5 +13,3 @@ Notes: - There is a [function](https://github.com/cal-itp/data-analyses/blob/f62b150768fb1547c6b604cb53d122531104d099/_shared_utils/shared_utils/publish_utils.py#L16) in shared_utils that handles writing files to the public bucket, regardless of the file type (e.g., Parquet, GeoJSON, etc.) NOTE: If you are planning on publishing to [CKAN](publishing-ckan) and you are using the dbt exposure publishing framework, your data will already be saved in GCS as part of the upload process. - -TBD. From 7f834c2525d515bd1b3a1150c16da7aec14ec907 Mon Sep 17 00:00:00 2001 From: Shweta Adhikari Date: Wed, 27 Nov 2024 18:43:14 +0000 Subject: [PATCH 09/10] content changed portfolio site --- docs/analytics_tools/tools_quick_links.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/analytics_tools/tools_quick_links.md b/docs/analytics_tools/tools_quick_links.md index ac84f46745..ac16852f54 100644 --- a/docs/analytics_tools/tools_quick_links.md +++ b/docs/analytics_tools/tools_quick_links.md @@ -14,3 +14,9 @@ | [**analysis.calitp.org**](https://analysis.calitp.org/) | Analytics portfolio landing page | | [**Google BigQuery**](https://console.cloud.google.com/bigquery) | Our warehouse and SQL Querying | | [**Google Cloud Storage**](https://console.cloud.google.com/storage/browser/calitp-analytics-data) | Cloud file storage | + +  + +```{admonition} Still need access to a tool on this page? +Ask in the `#services-team` channel in the Cal-ITP Slack. +``` From 55a5a9f9a3471e04d1de51c3005b382255e86075 Mon Sep 17 00:00:00 2001 From: Shweta Adhikari Date: Wed, 27 Nov 2024 18:45:46 +0000 Subject: [PATCH 10/10] content changed portfolio site --- docs/publishing/sections/5_analytics_portfolio_site.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/publishing/sections/5_analytics_portfolio_site.md b/docs/publishing/sections/5_analytics_portfolio_site.md index 3e2e1e5ca8..cba63cbab6 100644 --- a/docs/publishing/sections/5_analytics_portfolio_site.md +++ b/docs/publishing/sections/5_analytics_portfolio_site.md @@ -212,6 +212,6 @@ build_my_reports: ### Delete Portfolio/ Refresh Index Page -When creating a new portfolio and there’s an old version with existing files or content on your portfolio site or in your local environment, it’s important to clean up the old files before adding new content. +When redeploying your portfolio with new content and there’s an old version with existing files or content on your portfolio site or in your local environment, it’s important to clean up the old files before adding new content. Use python `portfolio/portfolio.py clean my_report` before deploying your report.