v1.0.0
Chat engines
This release contains the imagine-la
for our initial pilot with Imagine LA in March 2025.
What's Changed
- fix: Show only two nested headers by @KevinJBoyer in #117
- perf: Use raw responses of chat context in chat history by @ccheng26 in #116
- feat: Add Markdown tree support utilities for hierarchical chunking by @yoomlam in #118
- feat: Add hierarchical chunking of markdown tree by @yoomlam in #119
- feat: Add support for retrieval in other languages by @KevinJBoyer in #120
- feat: Group web source citations by @ccheng26 in #122
- feat: Support batch processing by @KevinJBoyer in #123
- fix: Include all headers in batch CSV generation by @KevinJBoyer in #126
- feat: Show document.dataset in UI; fix: citation markdown rendering by @yoomlam in #125
- fix: Have superscript citation link to source URL by @yoomlam in #128
- feat: Add tree-based chunking to EDD ingestion by @yoomlam in #121
- refactor: Extract CA EDD system prompt into the chat engine by @yoomlam in #130
- refactor: Use FormattingConfig in format.py by @yoomlam in #129
- feat: Improve batch processing by @KevinJBoyer in #131
- refactor: Add ChatHistory class by @yoomlam in #133
- refactor: Add finalize_result() to remap and replace citation ids in response by @yoomlam in #134
- fix: Add quotes around Makefile ingest arguments by @yoomlam in #135
- fix: Handle blank sublist for EDD ingest by @yoomlam in #136
- feat: Add minimal API v0 by @yoomlam in #132
- refactor: More robust _fix_input_markdown() by @yoomlam in #138
- Notebook: Add edd-chunking.ipynb to debug markdown chunking issues by @yoomlam in #137
- refactor: Exclude user's latest message from chat_history by @yoomlam in #139
- feat: Ingest ImageLA content hub by @KevinJBoyer in #141
- feat: Add conversational memory to v0 API by @yoomlam in #140
- feat: Add chat_message DB table by @yoomlam in #142
- API Spec by @KevinJBoyer in #124
- feat: Add Imagine LA engine by @KevinJBoyer in #143
- feat: Write document questions and answers to csv by @ccheng26 in #127
- feat: add question generation for imagine la and bem dataset by @ccheng26 in #144
- fix: Remove use of getopt for ingestion by @yoomlam in #145
- feat: Investigation of multilingual LLMs by @KevinJBoyer in #109
- fix: ImagineLA scraper: handle '/' suffix for root_url by @yoomlam in #147
- feat: Analyze precision and recall for initial eval dataset by @KevinJBoyer in #150
- feat: Enable resuming ingestion of the EDD website by @yoomlam in #146
- fix: Assign subsections to their own markdown headings by @yoomlam in #148
- feat: Tree-based chunk splitting into subsections by @yoomlam in #149
- fix: Handle table markdown formatting in assertion by @yoomlam in #152
- feat: Literal ai feedback by @ccheng26 in #151
- feat: Whitelist Imagine LA's dev site by @KevinJBoyer in #158
- feat: Citation footnotes open accordions by @ccheng26 in #155
- feat: Scrape LA county policy manual website by @yoomlam in #156
- refactor: remove BEM-specific code and generalize PDF processing by @fg-nava in #159
- refactor: remove Guru card functionality by @fg-nava in #160
- fix: Setup Terraform before AWS credentials by @KevinJBoyer in #163
- fix: Update checkout actions to to fix action linting by @KevinJBoyer in #164
- feat: Ingest LA County Policy Manual by @yoomlam in #161
- feat: Use separate Literal project for API by @KevinJBoyer in #166
- feat: Provide direct link to text by @ccheng26 in #162
- feat: Pull dataset values from DB by @KevinJBoyer in #167
- fix: Switch to sequential processing in batch_process to resolve thread-safety issues by @fg-nava in #169
- feat: Export markdown files during ingestion by @yoomlam in #171
- chore: Rename dataset to LA "DPSS Policy" by @yoomlam in #174
- feat: LA DPSS policy: add program name to document.name by @yoomlam in #175
- fix: Use cl.make_async when calling synchronous batch_process by @fg-nava in #176
- fix: Refine memory handling and timeout params for batch processing by @fg-nava in #177
- feat: Add IRS family tax credit webpages as a new dataset by @yoomlam in #178
- feat: Add scraper and ingest code for Public Charge dataset by @ccheng26 in #173
- fix: Use regex pattern for CORS allowed origins by @fg-nava in #182
- revert: Revert batch process and related files to d61791c by @fg-nava in #180
- fix: Unblock Chainlit's websocket during batch processing by @yoomlam in #181
- fix: Limit origins to localhost:5173 and remove pattern from regex by @fg-nava in #184
- fix: revise headers in public charge content by @ccheng26 in #183
- refactor: Simplify command line for scraping datasets by @yoomlam in #186
- dev experience: Add app-shell target to Makefile by @yoomlam in #188
- fix: Cleanup BOM characters from fieldnames in batch processing by @fg-nava in #187
- refactor: Add general ingest_runner by @yoomlam in #189
- feat: Ingest CA FTB dataset by @yoomlam in #190
- refactor: Generalize scrapy-runner by @yoomlam in #192
- feat: Ingest CA WIC dataset by @yoomlam in #191
- Update system prompt for ImagineLA chat engine by @yoomlam in #194
- refactor: Make ingest-runner consistent with scrapy-runner by @yoomlam in #193
- fix: Remove unused ImagineLA chat engine settings from UI by @yoomlam in #195
- feat: Add Covered CA dataset by @yoomlam in #197
- fix: Update Imagine LA Content Hub scraper and ingester by @yoomlam in #198
- feat: Distinguish two separate system prompts in the UI by @yoomlam in #199
- feat: Add Step 2 logic and show Policy Updates in UI by @yoomlam in #200
- feat: Add alert_message to API and Batch Processing by @yoomlam in #201
- feat: Add MessageAttributes to Literal AI message metadata by @yoomlam in #203
- feat: Keep intro sentence with subsequent list by @yoomlam in #202
- refactor: Remove create_splits_using_headings() and unused code by @yoomlam in #172
- feat: Use query as the LiteralAI thread name by @yoomlam in #204
- feat: Add benefit_program to ImagineLA_MessageAttributes by @yoomlam in #205
- feat: Ingest manual scraping of SSA pdf by @yoomlam in #207
- fix: Update system prompt by @yoomlam in #208
- feat: Ingest from multiple JSON scraping files by @yoomlam in #209
- chore: Update Imagine LA's URL: benefitnavigator.web.app by @yoomlam in #210
- refactor: Update postgres ssl logging to info level by @fg-nava in #211
- fix: new CORS allow_origin_regex pattern by @fg-nava in #212
- feat: API: Add source_dataset field to the Citation API response by @yoomlam in #213
- feat: Simplify reingesting all datasources by @yoomlam in #214
- feat: Automate stats collection to ingestion process and include in S… by @fg-nava in #215
- feat: Setup QA evaluation infrastructure and logging system by @fg-nava in #206
- chore: Rename Imagine LA to Benefits Information Hub by @yoomlam in #217
- fix: Support multiple datasets and k values using nargs in evaluation CLI by @fg-nava in #218
- chore: Set up prod environment by @KevinJBoyer in #221
- fix: CA Public Charge dataset: fix absolute links by @yoomlam in #219
- feat: Move '(citation-N)' after punctuation by @yoomlam in #216
- feat: Subscribe monitoring email to CloudWatch alarms by @KevinJBoyer in #222
- Update chat_engine.py by @KevinJBoyer in #227
- refactor: Move metrics module into evaluation directory by @fg-nava in #223
- refactor: Consolidate evaluation metrics directory by @fg-nava in #224
- fix: Remove repeated text around alert_msg by @yoomlam in #228
- fix: Enable multiple user messages per LiteralAI thread by @yoomlam in #229
- feat: Add UserSession DB table by @yoomlam in #230
- feat: API: Make user_id required by @yoomlam in #231
- chore: API: return error on empty user_id by @yoomlam in #233
- feat: DB: Add session_id foreign key constraint by @yoomlam in #232
- fix: Temporarily allow empty user id by @KevinJBoyer in #236
- fix: Remove manual fix for EDD datasource by @yoomlam in #237
- chore: Add echo_cmds() to refresh-ingestion.sh by @yoomlam in #238
- fix: Imagine LA ingest: switch region to "California" and pass DATASET_ID to echo_cmds by @fg-nava in #240
- refactor: QA generation core functionality and evaluation system by @fg-nava in #225
- feat: Facilitate 'all' datasets in refresh-ingestion.sh by @yoomlam in #241
- chore: Add documents-to-confirm-eligibility to Covered CA datasource by @yoomlam in #234
- feat: Update prompt with new canned responses by @KevinJBoyer in #242
Full Changelog: v0.0.0...v1.0.0