-
Notifications
You must be signed in to change notification settings - Fork 119
chore(eng-docs): add docs for knowledge gaps #3976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
b1faa09
chore(bakcend): add command to apply migrations
krisantrobus 107c7c3
chore(eng-docs): wip for enmbeddings
krisantrobus face769
chore(end): added in new CHANGELOG.md to create:package
krisantrobus c9a1189
chore(eng-docs): update embeddings doc
krisantrobus 2fd59ae
chore(eng-docs): md highlights
krisantrobus 495a519
chore(eng-docs): document NPM publish process
krisantrobus a725b3b
chore(eng-docs): typo fixes
krisantrobus c9bd6e9
Merge branch 'main' into eng-docs/update
kodiakhq[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
42 changes: 42 additions & 0 deletions
42
internal-docs/engineering/doc-site/generating-embeddings.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Generating Embeddings | ||
|
||
Embeddings are what is used for our [Doc Search](./docsearch.md) functionality. OpenAI embeddings are a technique that uses machine learning and big data to convert unstructured data into structured vector spaces. | ||
|
||
In our use case it converts plain text such as search criteria, mdx headers and GitHub discussion titles. It uses the model `text-embedding-ada-002` and outputs a similar structure to: `[-0.005330325,0.018767769,0.00020701668,-0.0011101937, ...]` | ||
|
||
## Local Development | ||
|
||
In order to develop locally you will need to startup a local instance of Supabase. The code for this is found in `/apps/backend`. Follow [this](../../../apps/backend/README.md) document to get setup. | ||
|
||
After you have it setup you should be able to access Supabase at: http://127.0.0.1:54323. If you have no tables, you have not applied migrations. You can run ```yarn workspace @twilio-paste/backend db:reset``` from the root of the project. | ||
|
||
**Note**: if you see an error for vector packages go into [20230928013336_initial_schema](../../../apps/backend/supabase/migrations/20230928013336_initial_schema.sql) and change the following **without committing**: | ||
|
||
```sql | ||
create extension if not exists "vector" with schema "public" version '0.5.0'; | ||
/* to */ | ||
create extension if not exists "vector" with schema "public"; | ||
``` | ||
|
||
### Environment Variables | ||
|
||
In order to do any GH action or assistant development on the site you will need to set environment variables in ```packages/paste-website/.env```. | ||
|
||
``` | ||
OPENAI_API_KEY="" // USE YOUR PERSONALTOKEN FOR LOCAL DEV | ||
SUPABASE_URL="http://127.0.0.1:54321" // PRINTED TO CONSOLE AFTER STARTING CONTAINER | ||
SUPABASE_KEY="" // PRINTED TO CONSOLE AFTER STARTING CONTAINER | ||
GH_SERVICE_ACC_DISCUSSIONS_TOKEN="" // IN 1Password UNDER github.com ENTRY | ||
``` | ||
|
||
### Generating Data | ||
|
||
The best way to generate data is to run the nightly embed script `generate:embeddings`. This will update the tables: `page` and `page_section`. | ||
|
||
## Table Structure | ||
|
||
While there are other tables the only ones that concern the embeddings creation are: | ||
- **page**: Stores the metadata of the entry. Key columns are the checksum (used to determine whether to update the record), path (either the url of the page or the github discussion), type (github-discussion or markdown) | ||
- **page_sections**: contains the search embeddings. Key columns are content (plain text headings/titles), embedding (the vector spaces created from OpenAI), slug (toString of content or the discussion/answer in GitHub). | ||
|
||
Both tables are related with page being the parent. They are joined by `page.id on page_section.page_id`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Publishing NPM Package | ||
|
||
Paste core uses [changesets](https://github.com/changesets/changesets) to manage versions and changelogs. It has great support for mono-repos and multipackage repositories, ideal for `@twilio-paste/core`. | ||
|
||
Changesets has a great [GitHub action](https://github.com/changesets/action) that will manage the release by creating a PR, periodically pulling changes from main. No code is published to NPM until this PR is merged, which is controlled by the team. | ||
|
||
The PR will always be called `Version Packages` and lists all the changes that have been made since the last release. The description will also update with the entries in the changesets from the PRs merged to easily see what will be getting released. | ||
|
||
There is a step in the GitHub Action [on_merge_to_main](../../.github/workflows/on_merge_to_main.yml) with the name `Create Pull Request or Publish to npm`. This defines commands to run from [package.json](../../package.json) for what operation. | ||
|
||
- version: this removes all of the temporary changeset files which are generated during development. It aggregates them all to a changelog entry. | ||
- publish: responsible for publishing the package to NPM. | ||
- commit: "chore(release): version packages" the commit message on squash and merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a mandatory file that is needed when developing the website section for new components. Good to add it in at gen level to stop getting caught out and manually having to update the components package later.