Skip to content

Commit 3fa1aba

Browse files
authored
Updated Cheatsheet (#8499)
* removed deprecated stuff from the cheatsheet. * added SIMBA example to cheatsheet.md
1 parent c0506d1 commit 3fa1aba

File tree

1 file changed

+18
-147
lines changed

1 file changed

+18
-147
lines changed

docs/docs/cheatsheet.md

Lines changed: 18 additions & 147 deletions
Original file line numberDiff line numberDiff line change
@@ -2,127 +2,10 @@
22
sidebar_position: 999
33
---
44

5-
6-
!!! warning "This page is outdated and may not be fully accurate in DSPy 2.5 and 2.6"
7-
8-
95
# DSPy Cheatsheet
106

117
This page will contain snippets for frequent usage patterns.
128

13-
## DSPy DataLoaders
14-
15-
Import and initializing a DataLoader Object:
16-
17-
```python
18-
import dspy
19-
from dspy.datasets import DataLoader
20-
21-
dl = DataLoader()
22-
```
23-
24-
### Loading from HuggingFace Datasets
25-
26-
```python
27-
code_alpaca = dl.from_huggingface("HuggingFaceH4/CodeAlpaca_20K")
28-
```
29-
30-
You can access the dataset of the splits by calling key of the corresponding split:
31-
32-
```python
33-
train_dataset = code_alpaca['train']
34-
test_dataset = code_alpaca['test']
35-
```
36-
37-
### Loading specific splits from HuggingFace
38-
39-
You can also manually specify splits you want to include as a parameters and it'll return a dictionary where keys are splits that you specified:
40-
41-
```python
42-
code_alpaca = dl.from_huggingface(
43-
"HuggingFaceH4/CodeAlpaca_20K",
44-
split = ["train", "test"],
45-
)
46-
47-
print(f"Splits in dataset: {code_alpaca.keys()}")
48-
```
49-
50-
If you specify a single split then dataloader will return a List of `dspy.Example` instead of dictionary:
51-
52-
```python
53-
code_alpaca = dl.from_huggingface(
54-
"HuggingFaceH4/CodeAlpaca_20K",
55-
split = "train",
56-
)
57-
58-
print(f"Number of examples in split: {len(code_alpaca)}")
59-
```
60-
61-
You can slice the split just like you do with HuggingFace Dataset too:
62-
63-
```python
64-
code_alpaca_80 = dl.from_huggingface(
65-
"HuggingFaceH4/CodeAlpaca_20K",
66-
split = "train[:80%]",
67-
)
68-
69-
print(f"Number of examples in split: {len(code_alpaca_80)}")
70-
71-
code_alpaca_20_80 = dl.from_huggingface(
72-
"HuggingFaceH4/CodeAlpaca_20K",
73-
split = "train[20%:80%]",
74-
)
75-
76-
print(f"Number of examples in split: {len(code_alpaca_20_80)}")
77-
```
78-
79-
### Loading specific subset from HuggingFace
80-
81-
If a dataset has a subset you can pass it as an arg like you do with `load_dataset` in HuggingFace:
82-
83-
```python
84-
gms8k = dl.from_huggingface(
85-
"gsm8k",
86-
"main",
87-
input_keys = ("question",),
88-
)
89-
90-
print(f"Keys present in the returned dict: {list(gms8k.keys())}")
91-
92-
print(f"Number of examples in train set: {len(gms8k['train'])}")
93-
print(f"Number of examples in test set: {len(gms8k['test'])}")
94-
```
95-
96-
### Loading from CSV
97-
98-
```python
99-
dolly_100_dataset = dl.from_csv("dolly_subset_100_rows.csv",)
100-
```
101-
102-
You can choose only selected columns from the csv by specifying them in the arguments:
103-
104-
```python
105-
dolly_100_dataset = dl.from_csv(
106-
"dolly_subset_100_rows.csv",
107-
fields=("instruction", "context", "response"),
108-
input_keys=("instruction", "context")
109-
)
110-
```
111-
112-
### Splitting a List of `dspy.Example`
113-
114-
```python
115-
splits = dl.train_test_split(dataset, train_size=0.8) # `dataset` is a List of dspy.Example
116-
train_dataset = splits['train']
117-
test_dataset = splits['test']
118-
```
119-
120-
### Sampling from List of `dspy.Example`
121-
122-
```python
123-
sampled_example = dl.sample(dataset, n=5) # `dataset` is a List of dspy.Example
124-
```
125-
1269
## DSPy Programs
12710

12811
### dspy.Signature
@@ -131,8 +14,8 @@ sampled_example = dl.sample(dataset, n=5) # `dataset` is a List of dspy.Example
13114
class BasicQA(dspy.Signature):
13215
"""Answer questions with short factoid answers."""
13316

134-
question = dspy.InputField()
135-
answer = dspy.OutputField(desc="often between 1 and 5 words")
17+
question: str = dspy.InputField()
18+
answer: str = dspy.OutputField(desc="often between 1 and 5 words")
13619
```
13720

13821
### dspy.ChainOfThought
@@ -225,13 +108,13 @@ class FactJudge(dspy.Signature):
225108
context = dspy.InputField(desc="Context for the prediction")
226109
question = dspy.InputField(desc="Question to be answered")
227110
answer = dspy.InputField(desc="Answer for the question")
228-
factually_correct = dspy.OutputField(desc="Is the answer factually correct based on the context?", prefix="Factual[Yes/No]:")
111+
factually_correct: bool = dspy.OutputField(desc="Is the answer factually correct based on the context?")
229112

230113
judge = dspy.ChainOfThought(FactJudge)
231114

232115
def factuality_metric(example, pred):
233116
factual = judge(context=example.context, question=example.question, answer=pred.answer)
234-
return int(factual=="Yes")
117+
return factual.factually_correct
235118
```
236119

237120
## DSPy Evaluation
@@ -367,18 +250,6 @@ copro_teleprompter = COPRO(prompt_model=model_to_generate_prompts, metric=your_d
367250
compiled_program_optimized_signature = copro_teleprompter.compile(your_dspy_program, trainset=trainset, eval_kwargs=eval_kwargs)
368251
```
369252

370-
### MIPRO
371-
372-
```python
373-
from dspy.teleprompt import MIPRO
374-
375-
teleprompter = MIPRO(prompt_model=model_to_generate_prompts, task_model=model_that_solves_task, metric=your_defined_metric, num_candidates=num_new_prompts_generated, init_temperature=prompt_generation_temperature)
376-
377-
kwargs = dict(num_threads=NUM_THREADS, display_progress=True, display_table=0)
378-
379-
compiled_program_optimized_bayesian_signature = teleprompter.compile(your_dspy_program, trainset=trainset, num_trials=100, max_bootstrapped_demos=3, max_labeled_demos=5, eval_kwargs=kwargs)
380-
```
381-
382253
### MIPROv2
383254

384255
Note: detailed documentation can be found [here](api/optimizers/MIPROv2.md). `MIPROv2` is the latest extension of `MIPRO` which includes updates such as (1) improvements to instruction proposal and (2) more efficient search with minibatching.
@@ -445,20 +316,6 @@ print(f"Evaluate optimized program...")
445316
evaluate(optimized_program, devset=devset[:])
446317
```
447318

448-
### Signature Optimizer with Types
449-
450-
```python
451-
from dspy.teleprompt.signature_opt_typed import optimize_signature
452-
from dspy.evaluate.metrics import answer_exact_match
453-
from dspy.functional import TypedChainOfThought
454-
455-
compiled_program = optimize_signature(
456-
student=TypedChainOfThought("question -> answer"),
457-
evaluator=Evaluate(devset=devset, metric=answer_exact_match, num_threads=10, display_progress=True),
458-
n_iterations=50,
459-
).program
460-
```
461-
462319
### KNNFewShot
463320

464321
```python
@@ -484,6 +341,20 @@ your_dspy_program_compiled = fewshot_optuna_optimizer.compile(student=your_dspy_
484341

485342
Other custom configurations are similar to customizing the `dspy.BootstrapFewShot` optimizer.
486343

344+
345+
### SIMBA
346+
347+
SIMBA, which stands for Stochastic Introspective Mini-Batch Ascent, is a prompt optimizer that accepts arbitrary DSPy programs and proceeds in a sequence of mini-batches seeking to make incremental improvements to the prompt instructions or few-shot examples.
348+
349+
```python
350+
from dspy.teleprompt import SIMBA
351+
352+
simba = SIMBA(metric=your_defined_metric, max_steps=12, max_demos=10)
353+
354+
optimized_program = simba.compile(student=your_dspy_program, trainset=trainset)
355+
```
356+
357+
487358
## DSPy `Refine` and `BestofN`
488359

489360
>`dspy.Suggest` and `dspy.Assert` are replaced by `dspy.Refine` and `dspy.BestofN` in DSPy 2.6.

0 commit comments

Comments
 (0)