Skip to content

Commit cbabf82

Browse files
authored
Merge pull request #48 from codelion/codelion-patch-1
Update README.md
2 parents 0ae37c5 + 31c8f06 commit cbabf82

File tree

1 file changed

+47
-55
lines changed

1 file changed

+47
-55
lines changed

README.md

Lines changed: 47 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -176,13 +176,54 @@ The learning process includes:
176176
- **Strategic Enhancement**: Develop robustness against manipulation
177177
- **Production Deployment**: Full capability with ongoing adaptation
178178

179-
## Requirements
179+
## Order Dependency in Online Learning
180+
181+
When using the adaptive classifier for true online learning (adding examples incrementally), be aware that the order in which examples are added can affect predictions. This is inherent to incremental neural network training.
182+
183+
### The Challenge
184+
185+
```python
186+
# These two scenarios may produce slightly different models:
187+
188+
# Scenario 1
189+
classifier.add_examples(["fish example"], ["aquatic"])
190+
classifier.add_examples(["bird example"], ["aerial"])
191+
192+
# Scenario 2
193+
classifier.add_examples(["bird example"], ["aerial"])
194+
classifier.add_examples(["fish example"], ["aquatic"])
195+
```
196+
197+
While we've implemented sorted label ID assignment to minimize this effect, the neural network component still learns incrementally, which can lead to order-dependent behavior.
198+
199+
### Solution: Prototype-Only Predictions
200+
201+
For applications requiring strict order independence, you can configure the classifier to rely solely on prototype-based predictions:
202+
203+
```python
204+
# Configure to use only prototypes (order-independent)
205+
config = {
206+
'prototype_weight': 1.0, # Use only prototypes
207+
'neural_weight': 0.0 # Disable neural network contribution
208+
}
209+
210+
classifier = AdaptiveClassifier("bert-base-uncased", config=config)
211+
```
212+
213+
With this configuration:
214+
- Predictions are based solely on similarity to class prototypes (mean embeddings)
215+
- Results are completely order-independent
216+
- Trade-off: May have slightly lower accuracy than the hybrid approach
217+
218+
### Best Practices
180219

181-
- Python ≥ 3.8
182-
- PyTorch ≥ 2.0
183-
- transformers ≥ 4.30.0
184-
- safetensors ≥ 0.3.1
185-
- faiss-cpu ≥ 1.7.4 (or faiss-gpu for GPU support)
220+
1. **For maximum consistency**: Use prototype-only configuration
221+
2. **For maximum accuracy**: Accept some order dependency with the default hybrid approach
222+
3. **For production systems**: Consider batching updates and retraining periodically if strict consistency is required
223+
4. **Model selection matters**: Some models (e.g., `google-bert/bert-large-cased`) may produce poor embeddings for single words. For better results with short inputs, consider:
224+
- `bert-base-uncased`
225+
- `sentence-transformers/all-MiniLM-L6-v2`
226+
- Or any model specifically trained for semantic similarity
186227

187228
## Adaptive Classification with LLMs
188229

@@ -388,55 +429,6 @@ This real-world evaluation demonstrates that adaptive classification can signifi
388429
- [RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models](https://arxiv.org/abs/2401.00396)
389430
- [LettuceDetect: A Hallucination Detection Framework for RAG Applications](https://arxiv.org/abs/2502.17125)
390431

391-
## Order Dependency in Online Learning
392-
393-
When using the adaptive classifier for true online learning (adding examples incrementally), be aware that the order in which examples are added can affect predictions. This is inherent to incremental neural network training.
394-
395-
### The Challenge
396-
397-
```python
398-
# These two scenarios may produce slightly different models:
399-
400-
# Scenario 1
401-
classifier.add_examples(["fish example"], ["aquatic"])
402-
classifier.add_examples(["bird example"], ["aerial"])
403-
404-
# Scenario 2
405-
classifier.add_examples(["bird example"], ["aerial"])
406-
classifier.add_examples(["fish example"], ["aquatic"])
407-
```
408-
409-
While we've implemented sorted label ID assignment to minimize this effect, the neural network component still learns incrementally, which can lead to order-dependent behavior.
410-
411-
### Solution: Prototype-Only Predictions
412-
413-
For applications requiring strict order independence, you can configure the classifier to rely solely on prototype-based predictions:
414-
415-
```python
416-
# Configure to use only prototypes (order-independent)
417-
config = {
418-
'prototype_weight': 1.0, # Use only prototypes
419-
'neural_weight': 0.0 # Disable neural network contribution
420-
}
421-
422-
classifier = AdaptiveClassifier("bert-base-uncased", config=config)
423-
```
424-
425-
With this configuration:
426-
- Predictions are based solely on similarity to class prototypes (mean embeddings)
427-
- Results are completely order-independent
428-
- Trade-off: May have slightly lower accuracy than the hybrid approach
429-
430-
### Best Practices
431-
432-
1. **For maximum consistency**: Use prototype-only configuration
433-
2. **For maximum accuracy**: Accept some order dependency with the default hybrid approach
434-
3. **For production systems**: Consider batching updates and retraining periodically if strict consistency is required
435-
4. **Model selection matters**: Some models (e.g., `google-bert/bert-large-cased`) may produce poor embeddings for single words. For better results with short inputs, consider:
436-
- `bert-base-uncased`
437-
- `sentence-transformers/all-MiniLM-L6-v2`
438-
- Or any model specifically trained for semantic similarity
439-
440432
## Citation
441433

442434
If you use this library in your research, please cite:

0 commit comments

Comments
 (0)