2
2
sidebar_position : 999
3
3
---
4
4
5
-
6
- !!! warning "This page is outdated and may not be fully accurate in DSPy 2.5 and 2.6"
7
-
8
-
9
5
# DSPy Cheatsheet
10
6
11
7
This page will contain snippets for frequent usage patterns.
12
8
13
- ## DSPy DataLoaders
14
-
15
- Import and initializing a DataLoader Object:
16
-
17
- ``` python
18
- import dspy
19
- from dspy.datasets import DataLoader
20
-
21
- dl = DataLoader()
22
- ```
23
-
24
- ### Loading from HuggingFace Datasets
25
-
26
- ``` python
27
- code_alpaca = dl.from_huggingface(" HuggingFaceH4/CodeAlpaca_20K" )
28
- ```
29
-
30
- You can access the dataset of the splits by calling key of the corresponding split:
31
-
32
- ``` python
33
- train_dataset = code_alpaca[' train' ]
34
- test_dataset = code_alpaca[' test' ]
35
- ```
36
-
37
- ### Loading specific splits from HuggingFace
38
-
39
- You can also manually specify splits you want to include as a parameters and it'll return a dictionary where keys are splits that you specified:
40
-
41
- ``` python
42
- code_alpaca = dl.from_huggingface(
43
- " HuggingFaceH4/CodeAlpaca_20K" ,
44
- split = [" train" , " test" ],
45
- )
46
-
47
- print (f " Splits in dataset: { code_alpaca.keys()} " )
48
- ```
49
-
50
- If you specify a single split then dataloader will return a List of ` dspy.Example ` instead of dictionary:
51
-
52
- ``` python
53
- code_alpaca = dl.from_huggingface(
54
- " HuggingFaceH4/CodeAlpaca_20K" ,
55
- split = " train" ,
56
- )
57
-
58
- print (f " Number of examples in split: { len (code_alpaca)} " )
59
- ```
60
-
61
- You can slice the split just like you do with HuggingFace Dataset too:
62
-
63
- ``` python
64
- code_alpaca_80 = dl.from_huggingface(
65
- " HuggingFaceH4/CodeAlpaca_20K" ,
66
- split = " train[:80%]" ,
67
- )
68
-
69
- print (f " Number of examples in split: { len (code_alpaca_80)} " )
70
-
71
- code_alpaca_20_80 = dl.from_huggingface(
72
- " HuggingFaceH4/CodeAlpaca_20K" ,
73
- split = " train[20%:80%]" ,
74
- )
75
-
76
- print (f " Number of examples in split: { len (code_alpaca_20_80)} " )
77
- ```
78
-
79
- ### Loading specific subset from HuggingFace
80
-
81
- If a dataset has a subset you can pass it as an arg like you do with ` load_dataset ` in HuggingFace:
82
-
83
- ``` python
84
- gms8k = dl.from_huggingface(
85
- " gsm8k" ,
86
- " main" ,
87
- input_keys = (" question" ,),
88
- )
89
-
90
- print (f " Keys present in the returned dict: { list (gms8k.keys())} " )
91
-
92
- print (f " Number of examples in train set: { len (gms8k[' train' ])} " )
93
- print (f " Number of examples in test set: { len (gms8k[' test' ])} " )
94
- ```
95
-
96
- ### Loading from CSV
97
-
98
- ``` python
99
- dolly_100_dataset = dl.from_csv(" dolly_subset_100_rows.csv" ,)
100
- ```
101
-
102
- You can choose only selected columns from the csv by specifying them in the arguments:
103
-
104
- ``` python
105
- dolly_100_dataset = dl.from_csv(
106
- " dolly_subset_100_rows.csv" ,
107
- fields = (" instruction" , " context" , " response" ),
108
- input_keys = (" instruction" , " context" )
109
- )
110
- ```
111
-
112
- ### Splitting a List of ` dspy.Example `
113
-
114
- ``` python
115
- splits = dl.train_test_split(dataset, train_size = 0.8 ) # `dataset` is a List of dspy.Example
116
- train_dataset = splits[' train' ]
117
- test_dataset = splits[' test' ]
118
- ```
119
-
120
- ### Sampling from List of ` dspy.Example `
121
-
122
- ``` python
123
- sampled_example = dl.sample(dataset, n = 5 ) # `dataset` is a List of dspy.Example
124
- ```
125
-
126
9
## DSPy Programs
127
10
128
11
### dspy.Signature
@@ -131,8 +14,8 @@ sampled_example = dl.sample(dataset, n=5) # `dataset` is a List of dspy.Example
131
14
class BasicQA (dspy .Signature ):
132
15
""" Answer questions with short factoid answers."""
133
16
134
- question = dspy.InputField()
135
- answer = dspy.OutputField(desc = " often between 1 and 5 words" )
17
+ question: str = dspy.InputField()
18
+ answer: str = dspy.OutputField(desc = " often between 1 and 5 words" )
136
19
```
137
20
138
21
### dspy.ChainOfThought
@@ -225,13 +108,13 @@ class FactJudge(dspy.Signature):
225
108
context = dspy.InputField(desc = " Context for the prediction" )
226
109
question = dspy.InputField(desc = " Question to be answered" )
227
110
answer = dspy.InputField(desc = " Answer for the question" )
228
- factually_correct = dspy.OutputField(desc = " Is the answer factually correct based on the context?" , prefix = " Factual[Yes/No]: " )
111
+ factually_correct: bool = dspy.OutputField(desc = " Is the answer factually correct based on the context?" )
229
112
230
113
judge = dspy.ChainOfThought(FactJudge)
231
114
232
115
def factuality_metric (example , pred ):
233
116
factual = judge(context = example.context, question = example.question, answer = pred.answer)
234
- return int ( factual== " Yes " )
117
+ return factual.factually_correct
235
118
```
236
119
237
120
## DSPy Evaluation
@@ -367,18 +250,6 @@ copro_teleprompter = COPRO(prompt_model=model_to_generate_prompts, metric=your_d
367
250
compiled_program_optimized_signature = copro_teleprompter.compile(your_dspy_program, trainset = trainset, eval_kwargs = eval_kwargs)
368
251
```
369
252
370
- ### MIPRO
371
-
372
- ``` python
373
- from dspy.teleprompt import MIPRO
374
-
375
- teleprompter = MIPRO(prompt_model = model_to_generate_prompts, task_model = model_that_solves_task, metric = your_defined_metric, num_candidates = num_new_prompts_generated, init_temperature = prompt_generation_temperature)
376
-
377
- kwargs = dict (num_threads = NUM_THREADS , display_progress = True , display_table = 0 )
378
-
379
- compiled_program_optimized_bayesian_signature = teleprompter.compile(your_dspy_program, trainset = trainset, num_trials = 100 , max_bootstrapped_demos = 3 , max_labeled_demos = 5 , eval_kwargs = kwargs)
380
- ```
381
-
382
253
### MIPROv2
383
254
384
255
Note: detailed documentation can be found [ here] ( api/optimizers/MIPROv2.md ) . ` MIPROv2 ` is the latest extension of ` MIPRO ` which includes updates such as (1) improvements to instruction proposal and (2) more efficient search with minibatching.
@@ -445,20 +316,6 @@ print(f"Evaluate optimized program...")
445
316
evaluate(optimized_program, devset = devset[:])
446
317
```
447
318
448
- ### Signature Optimizer with Types
449
-
450
- ``` python
451
- from dspy.teleprompt.signature_opt_typed import optimize_signature
452
- from dspy.evaluate.metrics import answer_exact_match
453
- from dspy.functional import TypedChainOfThought
454
-
455
- compiled_program = optimize_signature(
456
- student = TypedChainOfThought(" question -> answer" ),
457
- evaluator = Evaluate(devset = devset, metric = answer_exact_match, num_threads = 10 , display_progress = True ),
458
- n_iterations = 50 ,
459
- ).program
460
- ```
461
-
462
319
### KNNFewShot
463
320
464
321
``` python
@@ -484,6 +341,20 @@ your_dspy_program_compiled = fewshot_optuna_optimizer.compile(student=your_dspy_
484
341
485
342
Other custom configurations are similar to customizing the ` dspy.BootstrapFewShot ` optimizer.
486
343
344
+
345
+ ### SIMBA
346
+
347
+ SIMBA, which stands for Stochastic Introspective Mini-Batch Ascent, is a prompt optimizer that accepts arbitrary DSPy programs and proceeds in a sequence of mini-batches seeking to make incremental improvements to the prompt instructions or few-shot examples.
348
+
349
+ ``` python
350
+ from dspy.teleprompt import SIMBA
351
+
352
+ simba = SIMBA(metric = your_defined_metric, max_steps = 12 , max_demos = 10 )
353
+
354
+ optimized_program = simba.compile(student = your_dspy_program, trainset = trainset)
355
+ ```
356
+
357
+
487
358
## DSPy ` Refine ` and ` BestofN `
488
359
489
360
> ` dspy.Suggest ` and ` dspy.Assert ` are replaced by ` dspy.Refine ` and ` dspy.BestofN ` in DSPy 2.6.
0 commit comments