Smart Condensing¶

Smart Condensing is Folder2MD4LLMs' intelligent approach to handling large codebases that exceed token limits. Instead of crude truncation, it uses AST analysis and priority-based compression to preserve the most important code while fitting within your specified limits.

How Smart Condensing Works¶

1. Priority Analysis¶

The system automatically categorizes files by importance:

CRITICAL: Core configuration, main entry points
HIGH: Primary source files, important modules
MEDIUM: Supporting files, utilities
LOW: Tests, documentation, examples

2. Token Budget Allocation¶

Based on priority levels, the system allocates token budget:

Conservative: 50% Critical, 30% High, 15% Medium, 5% Low
Balanced:     40% Critical, 35% High, 20% Medium, 5% Low
Aggressive:   30% Critical, 40% High, 25% Medium, 5% Low

3. Progressive Condensing¶

Files are condensed using five levels:

None: Full content preserved
Light: Remove comments and docstrings
Moderate: Simplify function bodies
Heavy: Keep signatures and key logic
Maximum: Keep only class/function signatures

Configuration¶

Basic Usage¶

# Enable smart condensing with default settings
folder2md --smart-condensing --token-limit 80000 .

# Choose condensing strategy
folder2md --smart-condensing --condense-strategy aggressive .

YAML Configuration¶

# Enable smart condensing
smart_condensing: true

# Token limit to target
token_limit: 80000

# Condensing strategy: conservative, balanced, aggressive
condense_strategy: "balanced"

# Token budget strategy
token_budget_strategy: "balanced"

# Languages to condense (empty = all supported)
condense_languages:
  - "python"
  - "javascript"
  - "typescript"
  - "java"

# Files to never condense
preserve_patterns:
  - "**/main.py"
  - "**/config.py"
  - "**/__init__.py"
  - "**/README.md"

Supported Languages¶

Smart condensing currently supports:

Python¶

Function and class signature preservation
Import statement analysis
Docstring and comment removal
Logic simplification while preserving control flow

JavaScript/TypeScript¶

Function and class declarations
Import/export analysis
Comment removal
Arrow function simplification

Java¶

Method and class signatures
Package and import statements
Annotation preservation
Access modifier retention

More Languages Coming¶

C/C++
Go
Rust
C#

Condensing Strategies¶

Conservative¶

Best for: Code review, maintaining maximum context

Minimal condensing
Preserves most implementation details
Higher token usage
Better for understanding complex logic

condense_strategy: "conservative"

Balanced (Default)¶

Best for: General use, good balance of content and size

Moderate condensing
Preserves key logic and structure
Good compromise between detail and size
Suitable for most LLM interactions

condense_strategy: "balanced"

Aggressive¶

Best for: Large codebases, overview understanding

Maximum condensing
Focuses on structure and interfaces
Lowest token usage
Good for architectural understanding

condense_strategy: "aggressive"

File Priority Classification¶

How Files Are Classified¶

The system uses several signals to determine file importance:

File naming patterns:
main.py, index.js, app.py → CRITICAL
config.*, settings.* → CRITICAL
utils.*, helpers.* → MEDIUM
test_*, *_test.* → LOW
Directory structure:
Root level files → Higher priority
src/, lib/ → HIGH
tests/, docs/ → LOW
Import analysis:
Frequently imported → Higher priority
Leaf modules → Lower priority
File size and complexity:
Balanced consideration of size vs importance

Custom Priority Patterns¶

Override automatic classification:

# Force high priority for specific patterns
high_priority_patterns:
  - "**/models/*.py"
  - "**/core/*.js"

# Force low priority for specific patterns
low_priority_patterns:
  - "**/examples/**"
  - "**/demos/**"

Condensing Levels Explained¶

Level 1: Light Condensing¶

Removes non-essential elements while preserving logic:

# Original
def calculate_total(items: List[Item]) -> float:
    """Calculate the total price of items.

    Args:
        items: List of items to calculate total for

    Returns:
        Total price as float
    """
    total = 0.0
    # Iterate through each item
    for item in items:
        # Add item price to running total
        total += item.price
    return total

# Light condensing
def calculate_total(items: List[Item]) -> float:
    total = 0.0
    for item in items:
        total += item.price
    return total

Level 3: Moderate Condensing¶

Simplifies function bodies while preserving structure:

# Moderate condensing
def calculate_total(items: List[Item]) -> float:
    # Implementation details condensed
    return sum(item.price for item in items)

Level 5: Maximum Condensing¶

Keeps only signatures and essential structure:

# Maximum condensing
def calculate_total(items: List[Item]) -> float: ...

Best Practices¶

Choose the Right Strategy¶

Start with balanced for most use cases
Use conservative when you need detailed implementation
Use aggressive for large codebases or architectural overview

Optimize File Selection¶

# Focus on important directories
include_patterns:
  - "src/**/*.py"
  - "lib/**/*.js"
  - "core/**/*.ts"

# Exclude less important files
exclude_patterns:
  - "tests/**"
  - "**/node_modules/**"
  - "**/__pycache__/**"

Preserve Critical Files¶

# Never condense these files
preserve_patterns:
  - "**/main.py"
  - "**/config.py"
  - "**/requirements.txt"
  - "package.json"

Monitor Token Usage¶

# Check token distribution
folder2md --smart-condensing --stats --dry-run .

# Adjust limits if needed
folder2md --smart-condensing --token-limit 100000 .

Troubleshooting¶

Smart Condensing Not Activating¶

# Check if languages are supported
folder2md --dry-run --verbose . | grep "condensing"

# Force language detection
folder2md --condense-languages python,javascript .

Files Not Being Condensed¶

# Check priority classification
folder2md --dry-run --verbose . | grep "priority"

# Override with patterns
folder2md --include-pattern "src/**/*.py" --smart-condensing .

Output Still Too Large¶

# Use more aggressive strategy
folder2md --condense-strategy aggressive .

# Reduce token limit
folder2md --token-limit 60000 .

# Exclude more files
folder2md --exclude-pattern "tests/**" --exclude-pattern "docs/**" .

Advanced Configuration¶

Custom Token Allocation¶

# Override default budget allocation
token_budget_strategy: "custom"
token_budget_allocation:
  critical: 0.45
  high: 0.35
  medium: 0.15
  low: 0.05

Language-Specific Settings¶

# Different settings per language
condense_settings:
  python:
    preserve_docstrings: true
    preserve_type_hints: true
  javascript:
    preserve_jsdoc: false
    minify_functions: true

Performance Tuning¶

# Parallel condensing
parallel_condensing: true

# Cache AST parsing results
cache_ast: true

# Memory limit for large files
max_ast_size: 10485760  # 10MB

Smart condensing ensures your codebase fits within LLM token limits while preserving the most important information for meaningful AI interactions.