Skip to content

Commit 4bc9866

Browse files
Add documentation for ClarifaiRM retriever module (#1697)
* added the documentation for clarifai ai retriver * minor edits * add the changes with mentioned changes * added the changes * added the changes --------- Co-authored-by: arnavsinghvi11 <54859892+arnavsinghvi11@users.noreply.github.com>
1 parent 0d6b4c2 commit 4bc9866

File tree

2 files changed

+180
-0
lines changed

2 files changed

+180
-0
lines changed
Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
# ClarifaiRM
2+
3+
[Clarifai](https://clarifai.com/) is a powerful AI platform that provides vector search capabilities through its search API. DSPy has integrated ClarifaiRM to support efficient text search and retrieval through its specialized indexing and ability to handle large-scale document collections.
4+
5+
To support passage retrieval, ClarifaiRM assumes that documents have been properly ingested into a Clarifai application with the following:
6+
- Text data properly indexed and stored
7+
- Appropriate search configurations set up in the Clarifai platform
8+
- Valid authentication credentials (PAT key) with appropriate permissions
9+
10+
The ClarifaiRM module requires the `clarifai` Python package. If not already installed, you can install it using:
11+
12+
```bash
13+
pip install clarifai
14+
```
15+
16+
**Note:**
17+
18+
Before using ClarifaiRM, ensure you have:
19+
20+
1. Created a Clarifai account and application
21+
2. Ingested your documents into the application
22+
3. Obtained your User ID, App ID, and Personal Access Token (PAT)
23+
24+
## Setting up the ClarifaiRM Client
25+
26+
The constructor initializes an instance of the `ClarifaiRM` class, which requires authentication credentials and configuration to connect to your Clarifai application.
27+
28+
- `clarifai_user_id` (_str_): Your unique Clarifai user identifier.
29+
- `clarifai_app_id` (_str_): The ID of your Clarifai application where documents are stored.
30+
- `clarifai_pat` (_Optional[str]_): Your Clarifai Personal Access Token (PAT). It will look for `CLARIFAI_PAT` in environment variables if not provided.
31+
- `k` (_int_, _optional_): The number of top passages to retrieve. Defaults to 3.
32+
33+
Example of the ClarifaiRM constructor:
34+
35+
```python
36+
ClarifaiRM(
37+
clarifai_user_id: str,
38+
clarifai_app_id: str,
39+
clarifai_pat: Optional[str] = None,
40+
k: int = 3,
41+
)
42+
```
43+
44+
**Note:**
45+
46+
The PAT can be provided either directly to the constructor or through the `CLARIFAI_PAT` environment variable. For security best practices, using environment variables is recommended.
47+
48+
## Under the Hood
49+
50+
### `retrieve_hits(self, hits)`
51+
52+
**Parameters:**
53+
- `hits` (_ClarifaiHit_): A hit object from Clarifai's search response.
54+
55+
**Returns:**
56+
- `str`: The retrieved text content.
57+
58+
Internal method that retrieves text content from the hit's URL using authenticated requests.
59+
60+
### `forward(self, query_or_queries: Union[str, List[str]], k: Optional[int] = None, **kwargs) -> dspy.Prediction`
61+
62+
**Parameters:**
63+
- `query_or_queries` (_Union[str, List[str]]_): The query or list of queries to search for.
64+
- `k` (_Optional[int]_, _optional_): The number of results to retrieve. If not specified, defaults to the value set during initialization.
65+
- `**kwargs`: Additional keyword arguments passed to Clarifai's search function.
66+
67+
**Returns:**
68+
- `dspy.Prediction`: Contains the retrieved passages, each represented as a `dotdict` with a `long_text` attribute.
69+
70+
Search the Clarifai application for the top `k` passages matching the given query or queries. Uses parallel processing with ThreadPoolExecutor to efficiently retrieve multiple results.
71+
72+
## Examples
73+
74+
### Basic Usage
75+
```python
76+
import os
77+
from dspy.retrieve.clarifai_rm import ClarifaiRM
78+
import dspy
79+
80+
os.environ["CLARIFAI_PAT"] = "your_pat_key"
81+
82+
retriever_model = ClarifaiRM(
83+
clarifai_user_id="your_user_id",
84+
clarifai_app_id="your_app_id",
85+
k=5
86+
)
87+
88+
turbo = dspy.OpenAI(model="gpt-3.5-turbo")
89+
dspy.settings.configure(lm=turbo, rm=retriever_model)
90+
91+
results = retriever_model("Explore the significance of quantum computing")
92+
```
93+
94+
### Multiple Queries
95+
```python
96+
queries = [
97+
"What is machine learning?",
98+
"How does deep learning work?",
99+
"Explain neural networks"
100+
]
101+
102+
results = retriever_model(queries, k=3)
103+
```
104+
105+
### Using with DSPy Retrieve Module
106+
```python
107+
from dspy import Retrieve
108+
109+
retrieve = Retrieve(k=5)
110+
111+
class RAG(dspy.Module):
112+
def __init__(self):
113+
super().__init__()
114+
self.retrieve = Retrieve(k=3)
115+
116+
def forward(self, query):
117+
passages = self.retrieve(query)
118+
return passages
119+
120+
rag = RAG()
121+
result = rag("What are the latest developments in AI?")
122+
```
123+
124+
### Handling Results
125+
```python
126+
results = retriever_model("quantum computing advances", k=5)
127+
128+
for i, result in enumerate(results, 1):
129+
print(f"Result {i}:")
130+
print(result.long_text)
131+
print("-" * 50)
132+
133+
first_passage = results[0].long_text
134+
135+
num_results = len(results)
136+
```
137+
138+
### Integration with Other DSPy Components
139+
```python
140+
from dspy import ChainOfThought, Predict, Retrieve
141+
142+
# Create a simple QA chain
143+
class QAChain(dspy.Module):
144+
def __init__(self):
145+
super().__init__()
146+
self.retrieve = Retrieve(k=3)
147+
self.generate_answer = ChainOfThought("question, context -> answer")
148+
149+
def forward(self, question):
150+
context = self.retrieve(question)
151+
answer = self.generate_answer(question=question, context=context)
152+
return answer
153+
154+
qa = QAChain()
155+
answer = qa("What are the main applications of quantum computing?")
156+
```
157+
158+
### Error Handling Example
159+
```python
160+
try:
161+
retriever_model = ClarifaiRM(
162+
clarifai_user_id="your_user_id",
163+
clarifai_app_id="your_app_id",
164+
clarifai_pat="invalid_pat"
165+
)
166+
results = retriever_model("test query")
167+
except Exception as e:
168+
print(f"Error occurred: {e}")
169+
```
170+
171+
**Note:**
172+
173+
These examples assume you have:
174+
175+
- A properly configured Clarifai application
176+
- Valid authentication credentials
177+
- Documents already ingested into your Clarifai app
178+
- The necessary environment variables set up

docs/mkdocs.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ nav:
6666
- Retrieval Model Clients:
6767
- Azure: deep-dive/retrieval_models_clients/Azure.md
6868
- ChromadbRM: deep-dive/retrieval_models_clients/ChromadbRM.md
69+
- ClarifaiRM: deep-dive/retrieval_models_clients/ClarifaiRM.md
6970
- ColBERTv2: deep-dive/retrieval_models_clients/ColBERTv2.md
7071
- Custom RM Client: deep-dive/retrieval_models_clients/custom-rm-client.md
7172
- DatabricksRM: deep-dive/retrieval_models_clients/DatabricksRM.md
@@ -189,6 +190,7 @@ plugins:
189190
# Retrieval Model Clients
190191
'docs/deep-dive/retrieval_models_clients/Azure.md': 'deep-dive/retrieval_models_clients/Azure.md'
191192
'docs/deep-dive/retrieval_models_clients/ChromadbRM.md': 'deep-dive/retrieval_models_clients/ChromadbRM.md'
193+
'docs/deep-dive/retrieval_models_clients/ClarifaiRM.md': 'deep-dive/retrieval_models_clients/ClarifaiRM.md'
192194
'docs/deep-dive/retrieval_models_clients/ColBERTv2.md': 'deep-dive/retrieval_models_clients/ColBERTv2.md'
193195
'docs/deep-dive/retrieval_models_clients/custom-rm-client.md': 'deep-dive/retrieval_models_clients/custom-rm-client.md'
194196
'docs/deep-dive/retrieval_models_clients/DatabricksRM.md': 'deep-dive/retrieval_models_clients/DatabricksRM.md'

0 commit comments

Comments
 (0)