Information Density Plateauing Across Complexity Levels


## Issue Description

The `ScientificHypothesisEvaluator` in `information_density.py` is producing plateaued information density scores that don't scale properly with increasing complexity levels. This undermines the core percolation point detection functionality.

## Problem Details

- **Expected Behavior**: Information density should show clear variation across complexity levels (1-10), potentially increasing at first then declining after percolation point
- **Actual Behavior**: Information density scores plateau within a narrow range (25%-45%) regardless of complexity level
- **Test Case**: Self-attention mechanism hypotheses across complexity levels 1-10 show minimal density variation
- Affects #1 
## Root Cause Analysis

After reviewing `information_density.py`, several algorithmic issues cause the plateauing:

### 1. **Restrictive Score Capping**
```python
# Multiple locations cap scores at 1.0
return min(specificity, 1), metrics
return min(falsifiability, 1.0), metrics 
return min(grounding, 1.0), metrics
```
This prevents scores from scaling beyond 100%, limiting differentiation.

### 2. **Word Count Normalization Flattening**
```python
# Example from evaluate_specificity()
specificity = (
    (metrics['quantitative_terms'] / word_count) * 0.4 +
    (metrics['measurement_references'] / word_count) * 0.4 +
    (metrics['domain_terms'] / word_count) * 0.2
) * 10
```
Dividing by word count flattens differences as hypothesis length increases with complexity.

### 3. **Static Domain Knowledge** 
```python
def load_domain_knowledge(self, literature_texts: List[str]):
    # Domain terms are loaded once and remain static
    # No scaling mechanism for different complexity levels
```
Domain terms don't adapt to complexity requirements.

### 4. **Conservative Weighting**
```python
overall_quality = (
    specificity * 0.2 +
    falsifiability * 0.25 +
    conceptual_density * 0.2 +
    empirical_grounding * 0.2 +
    predictive_content * 0.15
)
```
Fixed linear combination may not capture non-linear complexity relationships.

### 5. **Limited Scaling Multipliers**
Many calculations use small multipliers (×5, ×10) that don't provide sufficient range for differentiation.

## Impact on System

1. **Percolation Point Detection**: Cannot detect the critical threshold where complexity overwhelms comprehensibility
2. **User Experience**: Charts show flat lines instead of meaningful complexity-density relationships  
3. **Scientific Validity**: Fails to model the theoretical percolation phenomenon
4. **Visualization**: Tightly packed density values (25-45%) provide poor visual feedback

## Reproduction Steps

1. Generate hypotheses for same topic (e.g., "self-attention mechanisms") at complexity levels 1, 3, 5, 7, 10
2. Observe information density scores remain within 25-45% range
3. Expected: Should see variation, potentially peak around complexity 5-7, then decline
4. Actual: Minimal variation across all complexity levels

## Proposed Solutions

### Solution 1: Dynamic Score Scaling
- Remove hard caps on individual metrics
- Implement adaptive scaling based on complexity level
- Use logarithmic or exponential scaling for high complexity scenarios

### Solution 2: Complexity-Aware Evaluation
- Modify algorithms to expect different patterns at different complexity levels
- Implement complexity-specific weighting schemes
- Add complexity-penalty factors for over-complex hypotheses

### Solution 3: Enhanced Domain Knowledge Modeling
- Implement dynamic domain term weighting based on complexity
- Add complexity-specific vocabulary expectations
- Scale domain knowledge requirements with complexity level

### Solution 4: Non-Linear Combination Functions
- Replace linear weighted averages with non-linear functions
- Implement complexity-dependent interaction terms
- Add diminishing returns for excessive complexity

## Technical Priority

**High** - This is a core algorithmic issue that affects the primary functionality of percolation point detection and scientific hypothesis evaluation.

## Additional Context

The information density calculation is central to the percolation theory implementation. Without proper scaling across complexity levels, the system cannot model the fundamental hypothesis that information density peaks at moderate complexity and declines beyond the percolation threshold.

## Files Affected
- `information_density.py` - Primary evaluation algorithms
- Frontend chart visualization - Depends on meaningful density variations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Information Density Plateauing Across Complexity Levels #2

Issue Description

Problem Details

Root Cause Analysis

1. Restrictive Score Capping

2. Word Count Normalization Flattening

3. Static Domain Knowledge

4. Conservative Weighting

5. Limited Scaling Multipliers

Impact on System

Reproduction Steps

Proposed Solutions

Solution 1: Dynamic Score Scaling

Solution 2: Complexity-Aware Evaluation

Solution 3: Enhanced Domain Knowledge Modeling

Solution 4: Non-Linear Combination Functions

Technical Priority

Additional Context

Files Affected

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Information Density Plateauing Across Complexity Levels #2

Description

Issue Description

Problem Details

Root Cause Analysis

1. Restrictive Score Capping

2. Word Count Normalization Flattening

3. Static Domain Knowledge

4. Conservative Weighting

5. Limited Scaling Multipliers

Impact on System

Reproduction Steps

Proposed Solutions

Solution 1: Dynamic Score Scaling

Solution 2: Complexity-Aware Evaluation

Solution 3: Enhanced Domain Knowledge Modeling

Solution 4: Non-Linear Combination Functions

Technical Priority

Additional Context

Files Affected

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions