14
14
- [ Use Google / FireCrawl as the default web retriever] ( #use-google--firecrawl-as-the-default-web-retriever )
15
15
- [ Usage Examples] ( #usage-examples )
16
16
- [ Build a local knowledge base using PDFs from the web] ( #build-a-local-knowledge-base-using-pdfs-from-the-web )
17
- - [ Generate news list from updates in KB] ( #generate-news-list-from-updates-in-kb )
17
+ - [ Generate analytical research reports like OpenAI/Google's Deep Research] ( #generate-analytical-research-reports-like-openaigoogles-deep-research )
18
+ - [ Generate news list from web search results] ( #generate-news-list-from-web-search-results )
18
19
- [ Main Components] ( #main-components )
19
20
- [ Community] ( #community )
20
21
21
22
22
23
# AI Search Assistant with Local Knowledge Bases
23
24
24
25
LeetTools is an AI search assistant that can perform highly customizable search workflows
25
- and save the search results and generated outputs to local knowledge bases. With an
26
+ and generate customized format results based on both web and local knowledge bases. With an
26
27
automated document pipeline that handles data ingestion, indexing, and storage, we can
27
- easily run complex search workflows that query, extract and generate content from the
28
- web or local knowledge bases.
28
+ focus on implementing the workflow without worrying about the underlying infrastructure.
29
29
30
30
LeetTools can run with minimal resource requirements on the command line with a
31
- DuckDB-backend and configurable LLM settings. It can be easily integrated with other
32
- applications needing AI search and knowledge base support.
31
+ DuckDB-backend and configurable LLM settings. It can also use other dedicated
32
+ databases for different functions, e.g., we can use MongoDB for document storage,
33
+ Milvus for vector search, and Neo4j for graph search. We can configure different
34
+ functions in the same workflow to use different LLM providers and models.
33
35
34
36
Here is an illustration of the LeetTools ** digest** flow where it can search the web
35
37
(or local KB) and generate a digest article from the search results:
@@ -50,9 +52,32 @@ Currently LeetTools provides the following workflows:
50
52
51
53
# Quick Start
52
54
53
- We can use any OpenAI-compatible LLM endpoint, such as local Ollama service or public
54
- provider such as Gemini or DeepSeek. We can switch the service easily by [ defining
55
- environment variables or switching .env files] ( #use-different-llm-endpoints ) .
55
+ ** Before you start**
56
+
57
+ - .env file: We can use any OpenAI-compatible LLM endpoint, such as local Ollama service
58
+ or public provider such as Gemini or DeepSeek. we can switch the service easily by
59
+ [ defining environment variables or switching .env files] ( #use-different-llm-endpoints ) .
60
+
61
+ - LeetHome: By default the data is saved under ${HOME}/leettools, you can set a different
62
+ LeetHome environment variable to change the location:
63
+
64
+ ``` bash
65
+ % export LEET_HOME=< your_leet_home>
66
+ % mkdir -p ${LEET_HOME}
67
+ ```
68
+
69
+ ** 🚀 New: Run LeetTools Web UI with Docker 🚀**
70
+
71
+ LeetTools now provides a Docker container that includes the web UI. You can start the
72
+ container by running the following command:
73
+
74
+ ``` bash
75
+ docker/start.sh
76
+ ```
77
+
78
+ This will start the LeetTools service and the web UI. You can access the web UI at
79
+ [ http://localhost:3000 ] ( http://localhost:3000 ) . The web UI app is currently under development
80
+ and not open sourced yet. We plan to open source it in the near future.
56
81
57
82
** Run with pip**
58
83
@@ -71,14 +96,6 @@ The above `flow -t answer` command will run the `answer` flow with the query "Ho
71
96
GraphRAG work?" and save the scraped web pages to the knowledge base ` graphrag ` . The
72
97
` -l info ` option will show the essential log messages.
73
98
74
- By default the data is saved under ${HOME}/leettools, you can set a different LeetHome
75
- environment variable to change the location:
76
-
77
- ``` bash
78
- % export LEET_HOME=< your_leet_home>
79
- % mkdir -p ${LEET_HOME}
80
- ```
81
-
82
99
The default API endpoint is set to the OpenAI API endpoint, which you can modify by
83
100
changing the ` EDS_DEFAULT_LLM_BASE_URL ` environment variable:
84
101
@@ -98,44 +115,11 @@ changing the `EDS_DEFAULT_LLM_BASE_URL` environment variable:
98
115
% pip install -e .
99
116
# add the script path to the path
100
117
% export PATH=` pwd` /scripts:${PATH}
101
-
102
118
% export EDS_LLM_API_KEY=< your_api_key>
103
119
104
120
% leet flow -t answer -q " How does GraphRAG work?" -k graphrag -l info
105
121
```
106
122
107
- ** Sample Output**
108
-
109
- Here is an example output of the ` answer ` flow:
110
-
111
- ``` markdown
112
- # How Does Graphrag Work?
113
- GraphRAG operates by constructing a knowledge graph from a set of documents, which
114
- involves several key steps. Initially, it ingests textual data and utilizes a large
115
- language model (LLM) to extract entities (such as people, places, and concepts) and
116
- their relationships, mapping these as nodes and edges in a graph structure[1].
117
-
118
- The process begins with pre-processing and indexing, where the text is segmented into
119
- manageable units, and entities and relationships are identified. These entities are
120
- then organized into hierarchical "communities," which are clusters of related topics
121
- that allow for a more structured understanding of the data[2][3].
122
-
123
- When a query is made, GraphRAG employs two types of searches: Global Search, which
124
- looks across the entire knowledge graph for broad connections, and Local Search, which
125
- focuses on specific subgraphs for detailed information[3]. This dual approach enables
126
- GraphRAG to provide comprehensive answers that consider both high-level themes and
127
- specific details, allowing it to handle complex queries effectively[3][4].
128
-
129
- In summary, GraphRAG enhances traditional retrieval-augmented generation (RAG) by
130
- leveraging a structured knowledge graph, enabling it to provide nuanced responses that
131
- reflect the interconnected nature of the information it processes[1][2].
132
- ## References
133
- [1] [https://www.falkordb.com/blog/what-is-graphrag/](https://www.falkordb.com/blog/what-is-graphrag/)
134
- [2] [https://medium.com/@zilliz_learn/graphrag-explained-enhancing-rag-with-knowledge-graphs-3312065f99e1](https://medium.com/@zilliz_learn/graphrag-explained-enhancing-rag-with-knowledge-graphs-3312065f99e1)
135
- [3] [https://medium.com/data-science-in-your-pocket/how-graphrag-works-8d89503b480d](https://medium.com/data-science-in-your-pocket/how-graphrag-works-8d89503b480d)
136
- [4] [https://github.com/microsoft/graphrag/discussions/511](https://github.com/microsoft/graphrag/discussions/511)
137
- ```
138
-
139
123
# Use Different LLM and Search Providers
140
124
141
125
We can run LeetTools with different env files to use different LLM providers and other
@@ -250,59 +234,36 @@ We have a more [detailed example](docs/run_ollama_with_deepseek_r1.md) to show h
250
234
use the local Ollama service with the DeepSeek-r1:1.5B model to build a local knowledge
251
235
base.
252
236
253
- ## Generate news list from updates in KB
237
+ ## Generate analytical research reports like OpenAI/Google's Deep Research
254
238
255
- We can create a knowledge base with a list of URLs or a search query, and then generate
256
- a list of news items from the KB . Here is an example:
239
+ We can generate analytical research reports like OpenAI/Google's Deep Research by using
240
+ the ` digest ` flow . Here is an example:
257
241
258
242
``` bash
259
- # create a KB with a google search
260
- # -d 1 means to search for news from the last day
261
- # -m 30 means to scrape the top 30 search results
262
- % leet kb add-search -k genai -q " LLM GenAI Startups" -d 1 -m 30
263
- # you can add single url to the KB
264
- % leet kb add-url -k genai -r " https://www.techcrunch.com"
265
- # you can also add a list of urls, example in [docs/sample_url_list.txt](docs/sample_url_list.txt)
266
- % leet kb add-url-list -k genai -f < file_with_list_of_urls>
267
-
268
- # generate a news list from the KB
269
- % leet flow -t news -q " LLM GenAI Startups" -k genai -l info -o llm_genai_news.md
270
-
271
- # Next time you want to refresh the KB and generate the news list
272
- # this command will re-ingest all the docsources specified above
273
- % leet kb ingest -k genai
274
-
275
- # run the news flow again with parameter you need
276
- % leet flow -t news --info
277
- ====================================================================================================
278
- news: Generating a list of news items from the KB.
279
-
280
- This flow generates a list of news items from the updated items in the KB:
281
- 1. check the KB for recently updated documents and find news items in them.
282
- 2. combine all the similar items into one.
283
- 3. remove items that have been reported before.
284
- 4. rank the items by the number of sources.
285
- 5. generate a list of news items with references.
286
-
287
- ====================================================================================================
288
- Use -p name=value to specify options for news:
289
-
290
- article_style : The style of the output article such as analytical research reports, humorous
291
- news articles, or technical blog posts. [default: analytical research reports]
292
- [FLOW: news]
293
- days_limit : Number of days to limit the search results. 0 or empty means no limit. In
294
- local KB, filters by the import time. [FLOW: news]
295
- news_include_old : Include all news items in the result, even if it has been reported
296
- before.Default is False. [default: False] [FLOW: news]
297
- news_source_min : Number of sources a news item has to have to be included in the result.Default
298
- is 2. Depends on the nature of the knowledge base. [default: 2] [FLOW: news]
299
- output_language : Output the result in the language. [FLOW: news]
300
- word_count : The number of words in the output section. Empty means automatics.
301
- [FLOW: news]
243
+ % leet flow -e .env.fireworks -t digest -k aijob.fireworks \
244
+ -p search_max_results=30 -p days_limit=360 \
245
+ -q " How will agentic AI and generative AI affect our non-tech jobs?" \
246
+ -l info -o outputs/aijob.fireworks.md
247
+ ```
248
+
249
+ An example of the output is available [ here] ( docs/examples/deepseek/aijob.fireworks.md ) ,
250
+ and the tutorial to use the DeepSeek API from fireworks.ai for the above command is
251
+ available [ here] ( docs/run_deepsearch_with_firework_deepseek.md ) .
252
+
253
+ ## Generate news list from web search results
302
254
255
+ We can create a knowledge base with a web search with a date limit, and then generate
256
+ a list of news items from the KB. Here is an example:
257
+
258
+ ``` bash
259
+ leet flow -t news -q " LLM GenAI Startups" -k genai -l info\
260
+ -p days_limit=3 -p search_iteration=3 -p search_max_results=100 \
261
+ -o llm_genai_news.md
303
262
```
304
263
305
- Note: scheduler support and UI view are coming soon.
264
+ The query retrieves the latest web pages from the past 3 days up to 100 search result page
265
+ and generates a list of news items from the search results. The output is saved to
266
+ the ` llm_genai_news.md ` file. An example of the output is available [ here] ( docs/examples/llm_genai_news.md ) .
306
267
307
268
# Main Components
308
269
@@ -335,6 +296,8 @@ Right now we are using the following open source libraries and tools (not limite
335
296
- [ Ollama] ( https://github.com/ollama/ollama )
336
297
- [ Jinja2] ( https://jinja.palletsprojects.com/en/3.0.x/ )
337
298
- [ BS4] ( https://www.crummy.com/software/BeautifulSoup/bs4/doc/ )
299
+ - [ FastAPI] ( https://github.com/fastapi/fastapi )
300
+ - [ Pydantic] ( https://github.com/pydantic/pydantic )
338
301
339
302
We plan to add more plugins for different components to support different workloads.
340
303
0 commit comments