Skip to content

v1.0.0

Compare
Choose a tag to compare
@KevinJBoyer KevinJBoyer released this 04 Mar 16:40
· 72 commits to main since this release
e5977bc

Chat engines

This release contains the imagine-la for our initial pilot with Imagine LA in March 2025.

What's Changed

  • fix: Show only two nested headers by @KevinJBoyer in #117
  • perf: Use raw responses of chat context in chat history by @ccheng26 in #116
  • feat: Add Markdown tree support utilities for hierarchical chunking by @yoomlam in #118
  • feat: Add hierarchical chunking of markdown tree by @yoomlam in #119
  • feat: Add support for retrieval in other languages by @KevinJBoyer in #120
  • feat: Group web source citations by @ccheng26 in #122
  • feat: Support batch processing by @KevinJBoyer in #123
  • fix: Include all headers in batch CSV generation by @KevinJBoyer in #126
  • feat: Show document.dataset in UI; fix: citation markdown rendering by @yoomlam in #125
  • fix: Have superscript citation link to source URL by @yoomlam in #128
  • feat: Add tree-based chunking to EDD ingestion by @yoomlam in #121
  • refactor: Extract CA EDD system prompt into the chat engine by @yoomlam in #130
  • refactor: Use FormattingConfig in format.py by @yoomlam in #129
  • feat: Improve batch processing by @KevinJBoyer in #131
  • refactor: Add ChatHistory class by @yoomlam in #133
  • refactor: Add finalize_result() to remap and replace citation ids in response by @yoomlam in #134
  • fix: Add quotes around Makefile ingest arguments by @yoomlam in #135
  • fix: Handle blank sublist for EDD ingest by @yoomlam in #136
  • feat: Add minimal API v0 by @yoomlam in #132
  • refactor: More robust _fix_input_markdown() by @yoomlam in #138
  • Notebook: Add edd-chunking.ipynb to debug markdown chunking issues by @yoomlam in #137
  • refactor: Exclude user's latest message from chat_history by @yoomlam in #139
  • feat: Ingest ImageLA content hub by @KevinJBoyer in #141
  • feat: Add conversational memory to v0 API by @yoomlam in #140
  • feat: Add chat_message DB table by @yoomlam in #142
  • API Spec by @KevinJBoyer in #124
  • feat: Add Imagine LA engine by @KevinJBoyer in #143
  • feat: Write document questions and answers to csv by @ccheng26 in #127
  • feat: add question generation for imagine la and bem dataset by @ccheng26 in #144
  • fix: Remove use of getopt for ingestion by @yoomlam in #145
  • feat: Investigation of multilingual LLMs by @KevinJBoyer in #109
  • fix: ImagineLA scraper: handle '/' suffix for root_url by @yoomlam in #147
  • feat: Analyze precision and recall for initial eval dataset by @KevinJBoyer in #150
  • feat: Enable resuming ingestion of the EDD website by @yoomlam in #146
  • fix: Assign subsections to their own markdown headings by @yoomlam in #148
  • feat: Tree-based chunk splitting into subsections by @yoomlam in #149
  • fix: Handle table markdown formatting in assertion by @yoomlam in #152
  • feat: Literal ai feedback by @ccheng26 in #151
  • feat: Whitelist Imagine LA's dev site by @KevinJBoyer in #158
  • feat: Citation footnotes open accordions by @ccheng26 in #155
  • feat: Scrape LA county policy manual website by @yoomlam in #156
  • refactor: remove BEM-specific code and generalize PDF processing by @fg-nava in #159
  • refactor: remove Guru card functionality by @fg-nava in #160
  • fix: Setup Terraform before AWS credentials by @KevinJBoyer in #163
  • fix: Update checkout actions to to fix action linting by @KevinJBoyer in #164
  • feat: Ingest LA County Policy Manual by @yoomlam in #161
  • feat: Use separate Literal project for API by @KevinJBoyer in #166
  • feat: Provide direct link to text by @ccheng26 in #162
  • feat: Pull dataset values from DB by @KevinJBoyer in #167
  • fix: Switch to sequential processing in batch_process to resolve thread-safety issues by @fg-nava in #169
  • feat: Export markdown files during ingestion by @yoomlam in #171
  • chore: Rename dataset to LA "DPSS Policy" by @yoomlam in #174
  • feat: LA DPSS policy: add program name to document.name by @yoomlam in #175
  • fix: Use cl.make_async when calling synchronous batch_process by @fg-nava in #176
  • fix: Refine memory handling and timeout params for batch processing by @fg-nava in #177
  • feat: Add IRS family tax credit webpages as a new dataset by @yoomlam in #178
  • feat: Add scraper and ingest code for Public Charge dataset by @ccheng26 in #173
  • fix: Use regex pattern for CORS allowed origins by @fg-nava in #182
  • revert: Revert batch process and related files to d61791c by @fg-nava in #180
  • fix: Unblock Chainlit's websocket during batch processing by @yoomlam in #181
  • fix: Limit origins to localhost:5173 and remove pattern from regex by @fg-nava in #184
  • fix: revise headers in public charge content by @ccheng26 in #183
  • refactor: Simplify command line for scraping datasets by @yoomlam in #186
  • dev experience: Add app-shell target to Makefile by @yoomlam in #188
  • fix: Cleanup BOM characters from fieldnames in batch processing by @fg-nava in #187
  • refactor: Add general ingest_runner by @yoomlam in #189
  • feat: Ingest CA FTB dataset by @yoomlam in #190
  • refactor: Generalize scrapy-runner by @yoomlam in #192
  • feat: Ingest CA WIC dataset by @yoomlam in #191
  • Update system prompt for ImagineLA chat engine by @yoomlam in #194
  • refactor: Make ingest-runner consistent with scrapy-runner by @yoomlam in #193
  • fix: Remove unused ImagineLA chat engine settings from UI by @yoomlam in #195
  • feat: Add Covered CA dataset by @yoomlam in #197
  • fix: Update Imagine LA Content Hub scraper and ingester by @yoomlam in #198
  • feat: Distinguish two separate system prompts in the UI by @yoomlam in #199
  • feat: Add Step 2 logic and show Policy Updates in UI by @yoomlam in #200
  • feat: Add alert_message to API and Batch Processing by @yoomlam in #201
  • feat: Add MessageAttributes to Literal AI message metadata by @yoomlam in #203
  • feat: Keep intro sentence with subsequent list by @yoomlam in #202
  • refactor: Remove create_splits_using_headings() and unused code by @yoomlam in #172
  • feat: Use query as the LiteralAI thread name by @yoomlam in #204
  • feat: Add benefit_program to ImagineLA_MessageAttributes by @yoomlam in #205
  • feat: Ingest manual scraping of SSA pdf by @yoomlam in #207
  • fix: Update system prompt by @yoomlam in #208
  • feat: Ingest from multiple JSON scraping files by @yoomlam in #209
  • chore: Update Imagine LA's URL: benefitnavigator.web.app by @yoomlam in #210
  • refactor: Update postgres ssl logging to info level by @fg-nava in #211
  • fix: new CORS allow_origin_regex pattern by @fg-nava in #212
  • feat: API: Add source_dataset field to the Citation API response by @yoomlam in #213
  • feat: Simplify reingesting all datasources by @yoomlam in #214
  • feat: Automate stats collection to ingestion process and include in S… by @fg-nava in #215
  • feat: Setup QA evaluation infrastructure and logging system by @fg-nava in #206
  • chore: Rename Imagine LA to Benefits Information Hub by @yoomlam in #217
  • fix: Support multiple datasets and k values using nargs in evaluation CLI by @fg-nava in #218
  • chore: Set up prod environment by @KevinJBoyer in #221
  • fix: CA Public Charge dataset: fix absolute links by @yoomlam in #219
  • feat: Move '(citation-N)' after punctuation by @yoomlam in #216
  • feat: Subscribe monitoring email to CloudWatch alarms by @KevinJBoyer in #222
  • Update chat_engine.py by @KevinJBoyer in #227
  • refactor: Move metrics module into evaluation directory by @fg-nava in #223
  • refactor: Consolidate evaluation metrics directory by @fg-nava in #224
  • fix: Remove repeated text around alert_msg by @yoomlam in #228
  • fix: Enable multiple user messages per LiteralAI thread by @yoomlam in #229
  • feat: Add UserSession DB table by @yoomlam in #230
  • feat: API: Make user_id required by @yoomlam in #231
  • chore: API: return error on empty user_id by @yoomlam in #233
  • feat: DB: Add session_id foreign key constraint by @yoomlam in #232
  • fix: Temporarily allow empty user id by @KevinJBoyer in #236
  • fix: Remove manual fix for EDD datasource by @yoomlam in #237
  • chore: Add echo_cmds() to refresh-ingestion.sh by @yoomlam in #238
  • fix: Imagine LA ingest: switch region to "California" and pass DATASET_ID to echo_cmds by @fg-nava in #240
  • refactor: QA generation core functionality and evaluation system by @fg-nava in #225
  • feat: Facilitate 'all' datasets in refresh-ingestion.sh by @yoomlam in #241
  • chore: Add documents-to-confirm-eligibility to Covered CA datasource by @yoomlam in #234
  • feat: Update prompt with new canned responses by @KevinJBoyer in #242

Full Changelog: v0.0.0...v1.0.0