Skip to content

Commit a10a385

Browse files
authored
Update docker startup scripts and readme. (#122)
* Update docker startup scripts and readme. * Update sections.
1 parent ea408a8 commit a10a385

File tree

7 files changed

+450
-106
lines changed

7 files changed

+450
-106
lines changed

README.md

Lines changed: 60 additions & 97 deletions
Original file line numberDiff line numberDiff line change
@@ -14,22 +14,24 @@
1414
- [Use Google / FireCrawl as the default web retriever](#use-google--firecrawl-as-the-default-web-retriever)
1515
- [Usage Examples](#usage-examples)
1616
- [Build a local knowledge base using PDFs from the web](#build-a-local-knowledge-base-using-pdfs-from-the-web)
17-
- [Generate news list from updates in KB](#generate-news-list-from-updates-in-kb)
17+
- [Generate analytical research reports like OpenAI/Google's Deep Research](#generate-analytical-research-reports-like-openaigoogles-deep-research)
18+
- [Generate news list from web search results](#generate-news-list-from-web-search-results)
1819
- [Main Components](#main-components)
1920
- [Community](#community)
2021

2122

2223
# AI Search Assistant with Local Knowledge Bases
2324

2425
LeetTools is an AI search assistant that can perform highly customizable search workflows
25-
and save the search results and generated outputs to local knowledge bases. With an
26+
and generate customized format results based on both web and local knowledge bases. With an
2627
automated document pipeline that handles data ingestion, indexing, and storage, we can
27-
easily run complex search workflows that query, extract and generate content from the
28-
web or local knowledge bases.
28+
focus on implementing the workflow without worrying about the underlying infrastructure.
2929

3030
LeetTools can run with minimal resource requirements on the command line with a
31-
DuckDB-backend and configurable LLM settings. It can be easily integrated with other
32-
applications needing AI search and knowledge base support.
31+
DuckDB-backend and configurable LLM settings. It can also use other dedicated
32+
databases for different functions, e.g., we can use MongoDB for document storage,
33+
Milvus for vector search, and Neo4j for graph search. We can configure different
34+
functions in the same workflow to use different LLM providers and models.
3335

3436
Here is an illustration of the LeetTools **digest** flow where it can search the web
3537
(or local KB) and generate a digest article from the search results:
@@ -50,9 +52,32 @@ Currently LeetTools provides the following workflows:
5052

5153
# Quick Start
5254

53-
We can use any OpenAI-compatible LLM endpoint, such as local Ollama service or public
54-
provider such as Gemini or DeepSeek. We can switch the service easily by [defining
55-
environment variables or switching .env files](#use-different-llm-endpoints).
55+
**Before you start**
56+
57+
- .env file: We can use any OpenAI-compatible LLM endpoint, such as local Ollama service
58+
or public provider such as Gemini or DeepSeek. we can switch the service easily by
59+
[defining environment variables or switching .env files](#use-different-llm-endpoints).
60+
61+
- LeetHome: By default the data is saved under ${HOME}/leettools, you can set a different
62+
LeetHome environment variable to change the location:
63+
64+
```bash
65+
% export LEET_HOME=<your_leet_home>
66+
% mkdir -p ${LEET_HOME}
67+
```
68+
69+
**🚀 New: Run LeetTools Web UI with Docker 🚀**
70+
71+
LeetTools now provides a Docker container that includes the web UI. You can start the
72+
container by running the following command:
73+
74+
```bash
75+
docker/start.sh
76+
```
77+
78+
This will start the LeetTools service and the web UI. You can access the web UI at
79+
[http://localhost:3000](http://localhost:3000). The web UI app is currently under development
80+
and not open sourced yet. We plan to open source it in the near future.
5681

5782
**Run with pip**
5883

@@ -71,14 +96,6 @@ The above `flow -t answer` command will run the `answer` flow with the query "Ho
7196
GraphRAG work?" and save the scraped web pages to the knowledge base `graphrag`. The
7297
`-l info` option will show the essential log messages.
7398

74-
By default the data is saved under ${HOME}/leettools, you can set a different LeetHome
75-
environment variable to change the location:
76-
77-
```bash
78-
% export LEET_HOME=<your_leet_home>
79-
% mkdir -p ${LEET_HOME}
80-
```
81-
8299
The default API endpoint is set to the OpenAI API endpoint, which you can modify by
83100
changing the `EDS_DEFAULT_LLM_BASE_URL` environment variable:
84101

@@ -98,44 +115,11 @@ changing the `EDS_DEFAULT_LLM_BASE_URL` environment variable:
98115
% pip install -e .
99116
# add the script path to the path
100117
% export PATH=`pwd`/scripts:${PATH}
101-
102118
% export EDS_LLM_API_KEY=<your_api_key>
103119

104120
% leet flow -t answer -q "How does GraphRAG work?" -k graphrag -l info
105121
```
106122

107-
**Sample Output**
108-
109-
Here is an example output of the `answer` flow:
110-
111-
```markdown
112-
# How Does Graphrag Work?
113-
GraphRAG operates by constructing a knowledge graph from a set of documents, which
114-
involves several key steps. Initially, it ingests textual data and utilizes a large
115-
language model (LLM) to extract entities (such as people, places, and concepts) and
116-
their relationships, mapping these as nodes and edges in a graph structure[1].
117-
118-
The process begins with pre-processing and indexing, where the text is segmented into
119-
manageable units, and entities and relationships are identified. These entities are
120-
then organized into hierarchical "communities," which are clusters of related topics
121-
that allow for a more structured understanding of the data[2][3].
122-
123-
When a query is made, GraphRAG employs two types of searches: Global Search, which
124-
looks across the entire knowledge graph for broad connections, and Local Search, which
125-
focuses on specific subgraphs for detailed information[3]. This dual approach enables
126-
GraphRAG to provide comprehensive answers that consider both high-level themes and
127-
specific details, allowing it to handle complex queries effectively[3][4].
128-
129-
In summary, GraphRAG enhances traditional retrieval-augmented generation (RAG) by
130-
leveraging a structured knowledge graph, enabling it to provide nuanced responses that
131-
reflect the interconnected nature of the information it processes[1][2].
132-
## References
133-
[1] [https://www.falkordb.com/blog/what-is-graphrag/](https://www.falkordb.com/blog/what-is-graphrag/)
134-
[2] [https://medium.com/@zilliz_learn/graphrag-explained-enhancing-rag-with-knowledge-graphs-3312065f99e1](https://medium.com/@zilliz_learn/graphrag-explained-enhancing-rag-with-knowledge-graphs-3312065f99e1)
135-
[3] [https://medium.com/data-science-in-your-pocket/how-graphrag-works-8d89503b480d](https://medium.com/data-science-in-your-pocket/how-graphrag-works-8d89503b480d)
136-
[4] [https://github.com/microsoft/graphrag/discussions/511](https://github.com/microsoft/graphrag/discussions/511)
137-
```
138-
139123
# Use Different LLM and Search Providers
140124

141125
We can run LeetTools with different env files to use different LLM providers and other
@@ -250,59 +234,36 @@ We have a more [detailed example](docs/run_ollama_with_deepseek_r1.md) to show h
250234
use the local Ollama service with the DeepSeek-r1:1.5B model to build a local knowledge
251235
base.
252236

253-
## Generate news list from updates in KB
237+
## Generate analytical research reports like OpenAI/Google's Deep Research
254238

255-
We can create a knowledge base with a list of URLs or a search query, and then generate
256-
a list of news items from the KB. Here is an example:
239+
We can generate analytical research reports like OpenAI/Google's Deep Research by using
240+
the `digest` flow. Here is an example:
257241

258242
```bash
259-
# create a KB with a google search
260-
# -d 1 means to search for news from the last day
261-
# -m 30 means to scrape the top 30 search results
262-
% leet kb add-search -k genai -q "LLM GenAI Startups" -d 1 -m 30
263-
# you can add single url to the KB
264-
% leet kb add-url -k genai -r "https://www.techcrunch.com"
265-
# you can also add a list of urls, example in [docs/sample_url_list.txt](docs/sample_url_list.txt)
266-
% leet kb add-url-list -k genai -f <file_with_list_of_urls>
267-
268-
# generate a news list from the KB
269-
% leet flow -t news -q "LLM GenAI Startups" -k genai -l info -o llm_genai_news.md
270-
271-
# Next time you want to refresh the KB and generate the news list
272-
# this command will re-ingest all the docsources specified above
273-
% leet kb ingest -k genai
274-
275-
# run the news flow again with parameter you need
276-
% leet flow -t news --info
277-
====================================================================================================
278-
news: Generating a list of news items from the KB.
279-
280-
This flow generates a list of news items from the updated items in the KB:
281-
1. check the KB for recently updated documents and find news items in them.
282-
2. combine all the similar items into one.
283-
3. remove items that have been reported before.
284-
4. rank the items by the number of sources.
285-
5. generate a list of news items with references.
286-
287-
====================================================================================================
288-
Use -p name=value to specify options for news:
289-
290-
article_style : The style of the output article such as analytical research reports, humorous
291-
news articles, or technical blog posts. [default: analytical research reports]
292-
[FLOW: news]
293-
days_limit : Number of days to limit the search results. 0 or empty means no limit. In
294-
local KB, filters by the import time. [FLOW: news]
295-
news_include_old : Include all news items in the result, even if it has been reported
296-
before.Default is False. [default: False] [FLOW: news]
297-
news_source_min : Number of sources a news item has to have to be included in the result.Default
298-
is 2. Depends on the nature of the knowledge base. [default: 2] [FLOW: news]
299-
output_language : Output the result in the language. [FLOW: news]
300-
word_count : The number of words in the output section. Empty means automatics.
301-
[FLOW: news]
243+
% leet flow -e .env.fireworks -t digest -k aijob.fireworks \
244+
-p search_max_results=30 -p days_limit=360 \
245+
-q "How will agentic AI and generative AI affect our non-tech jobs?" \
246+
-l info -o outputs/aijob.fireworks.md
247+
```
248+
249+
An example of the output is available [here](docs/examples/deepseek/aijob.fireworks.md),
250+
and the tutorial to use the DeepSeek API from fireworks.ai for the above command is
251+
available [here](docs/run_deepsearch_with_firework_deepseek.md).
252+
253+
## Generate news list from web search results
302254

255+
We can create a knowledge base with a web search with a date limit, and then generate
256+
a list of news items from the KB. Here is an example:
257+
258+
```bash
259+
leet flow -t news -q "LLM GenAI Startups" -k genai -l info\
260+
-p days_limit=3 -p search_iteration=3 -p search_max_results=100 \
261+
-o llm_genai_news.md
303262
```
304263

305-
Note: scheduler support and UI view are coming soon.
264+
The query retrieves the latest web pages from the past 3 days up to 100 search result page
265+
and generates a list of news items from the search results. The output is saved to
266+
the `llm_genai_news.md` file. An example of the output is available [here](docs/examples/llm_genai_news.md).
306267

307268
# Main Components
308269

@@ -335,6 +296,8 @@ Right now we are using the following open source libraries and tools (not limite
335296
- [Ollama](https://github.com/ollama/ollama)
336297
- [Jinja2](https://jinja.palletsprojects.com/en/3.0.x/)
337298
- [BS4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
299+
- [FastAPI](https://github.com/fastapi/fastapi)
300+
- [Pydantic](https://github.com/pydantic/pydantic)
338301

339302
We plan to add more plugins for different components to support different workloads.
340303

docker/README.md

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,30 @@ We can use Docker Compose to start the LeetTools service and web UI.
44

55
## Usage
66

7+
First copy the [docker/env.template](docker/env.template) file to [docker/.env](docker/.env)
8+
and adjust the environment variables if needed.
9+
10+
```bash
11+
cp docker/env.template docker/.env
12+
```
13+
14+
The following environment variables are used by the docker-compose.yml file:
15+
16+
```bash
17+
# env variables for docker-compose.yml
18+
19+
COMPOSE_PROFILES=full
20+
21+
# fix to a stable version if needed
22+
LEETTOOLS_VERSION=latest
23+
LEETTOOLS_WEB_VERSION=latest
24+
25+
LEETTOOLS_ENV_FILE=../.env
26+
DEFAULT_LANGUAGE=en
27+
# specify the DOCUMENETS_HOME variable if you want to use a different directory
28+
DOCUMENETS_HOME=~/documents
29+
```
30+
731
To build the Docker images, you can run the following command:
832
```bash
933
docker/build.sh
@@ -12,10 +36,12 @@ docker/build.sh
1236
If you do not want to build the Docker images, you can pull the latest images from the
1337
Docker Hub by running the following command:
1438
```bash
15-
docker compose --profile full pull
39+
cd docker
40+
docker compose pull
41+
cd ..
1642
```
1743

18-
To start the LeetTools service
44+
To start the LeetTools service and web UI, you can run the following command:
1945
```bash
2046
docker/start.sh
2147
```

docker/env.template

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,13 @@
11
# env variables for docker-compose.yml
22

3+
COMPOSE_PROFILES=full
4+
35
# fix to a stable version if needed
46
LEETTOOLS_VERSION=latest
57
LEETTOOLS_WEB_VERSION=latest
68

79
LEETTOOLS_ENV_FILE=../.env
810
DEFAULT_LANGUAGE=en
11+
912
# specify the DOCUMENETS_HOME variable if you want to use a different directory
1013
# DOCUMENETS_HOME=~/documents

docker/start.sh

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,12 @@ set -e -u
44

55
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
66

7-
# check if the .env file exists in the DIR
7+
# check if the .env file exists in the docker directory
88
env_file="$DIR/.env"
99
if [ ! -f "$env_file" ]; then
10-
echo "Error: .env file not found in $DIR. Please create one from $DIR/.env.template"
11-
exit 1
10+
# copy the .env.template file to .env if it doesn't exist
11+
cp "$env_file.template" "$env_file"
12+
echo ".env file not found in $DIR for docker-compose.yml. Copied from $DIR/.env.template."
1213
fi
1314

1415
# load the .env file into the environment by exporting the variables

0 commit comments

Comments
 (0)