Skip to content

Smart Condensing

Smart Condensing is Folder2MD4LLMs' intelligent approach to handling large codebases that exceed token limits. Instead of crude truncation, it uses AST analysis and priority-based compression to preserve the most important code while fitting within your specified limits.

How Smart Condensing Works

1. Priority Analysis

The system automatically categorizes files by importance:

  • CRITICAL: Core configuration, main entry points
  • HIGH: Primary source files, important modules
  • MEDIUM: Supporting files, utilities
  • LOW: Tests, documentation, examples

2. Token Budget Allocation

Based on priority levels, the system allocates token budget:

Conservative: 50% Critical, 30% High, 15% Medium, 5% Low
Balanced:     40% Critical, 35% High, 20% Medium, 5% Low
Aggressive:   30% Critical, 40% High, 25% Medium, 5% Low

3. Progressive Condensing

Files are condensed using five levels:

  1. None: Full content preserved
  2. Light: Remove comments and docstrings
  3. Moderate: Simplify function bodies
  4. Heavy: Keep signatures and key logic
  5. Maximum: Keep only class/function signatures

Configuration

Basic Usage

# Enable smart condensing with default settings
folder2md --smart-condensing --token-limit 80000 .

# Choose condensing strategy
folder2md --smart-condensing --condense-strategy aggressive .

YAML Configuration

# Enable smart condensing
smart_condensing: true

# Token limit to target
token_limit: 80000

# Condensing strategy: conservative, balanced, aggressive
condense_strategy: "balanced"

# Token budget strategy
token_budget_strategy: "balanced"

# Languages to condense (empty = all supported)
condense_languages:
  - "python"
  - "javascript"
  - "typescript"
  - "java"

# Files to never condense
preserve_patterns:
  - "**/main.py"
  - "**/config.py"
  - "**/__init__.py"
  - "**/README.md"

Supported Languages

Smart condensing currently supports:

Python

  • Function and class signature preservation
  • Import statement analysis
  • Docstring and comment removal
  • Logic simplification while preserving control flow

JavaScript/TypeScript

  • Function and class declarations
  • Import/export analysis
  • Comment removal
  • Arrow function simplification

Java

  • Method and class signatures
  • Package and import statements
  • Annotation preservation
  • Access modifier retention

More Languages Coming

  • C/C++
  • Go
  • Rust
  • C#

Condensing Strategies

Conservative

Best for: Code review, maintaining maximum context

  • Minimal condensing
  • Preserves most implementation details
  • Higher token usage
  • Better for understanding complex logic
condense_strategy: "conservative"

Balanced (Default)

Best for: General use, good balance of content and size

  • Moderate condensing
  • Preserves key logic and structure
  • Good compromise between detail and size
  • Suitable for most LLM interactions
condense_strategy: "balanced"

Aggressive

Best for: Large codebases, overview understanding

  • Maximum condensing
  • Focuses on structure and interfaces
  • Lowest token usage
  • Good for architectural understanding
condense_strategy: "aggressive"

File Priority Classification

How Files Are Classified

The system uses several signals to determine file importance:

  1. File naming patterns:
  2. main.py, index.js, app.py → CRITICAL
  3. config.*, settings.* → CRITICAL
  4. utils.*, helpers.* → MEDIUM
  5. test_*, *_test.* → LOW

  6. Directory structure:

  7. Root level files → Higher priority
  8. src/, lib/ → HIGH
  9. tests/, docs/ → LOW

  10. Import analysis:

  11. Frequently imported → Higher priority
  12. Leaf modules → Lower priority

  13. File size and complexity:

  14. Balanced consideration of size vs importance

Custom Priority Patterns

Override automatic classification:

# Force high priority for specific patterns
high_priority_patterns:
  - "**/models/*.py"
  - "**/core/*.js"

# Force low priority for specific patterns
low_priority_patterns:
  - "**/examples/**"
  - "**/demos/**"

Condensing Levels Explained

Level 1: Light Condensing

Removes non-essential elements while preserving logic:

# Original
def calculate_total(items: List[Item]) -> float:
    """Calculate the total price of items.

    Args:
        items: List of items to calculate total for

    Returns:
        Total price as float
    """
    total = 0.0
    # Iterate through each item
    for item in items:
        # Add item price to running total
        total += item.price
    return total

# Light condensing
def calculate_total(items: List[Item]) -> float:
    total = 0.0
    for item in items:
        total += item.price
    return total

Level 3: Moderate Condensing

Simplifies function bodies while preserving structure:

# Moderate condensing
def calculate_total(items: List[Item]) -> float:
    # Implementation details condensed
    return sum(item.price for item in items)

Level 5: Maximum Condensing

Keeps only signatures and essential structure:

# Maximum condensing
def calculate_total(items: List[Item]) -> float: ...

Best Practices

Choose the Right Strategy

  • Start with balanced for most use cases
  • Use conservative when you need detailed implementation
  • Use aggressive for large codebases or architectural overview

Optimize File Selection

# Focus on important directories
include_patterns:
  - "src/**/*.py"
  - "lib/**/*.js"
  - "core/**/*.ts"

# Exclude less important files
exclude_patterns:
  - "tests/**"
  - "**/node_modules/**"
  - "**/__pycache__/**"

Preserve Critical Files

# Never condense these files
preserve_patterns:
  - "**/main.py"
  - "**/config.py"
  - "**/requirements.txt"
  - "package.json"

Monitor Token Usage

# Check token distribution
folder2md --smart-condensing --stats --dry-run .

# Adjust limits if needed
folder2md --smart-condensing --token-limit 100000 .

Troubleshooting

Smart Condensing Not Activating

# Check if languages are supported
folder2md --dry-run --verbose . | grep "condensing"

# Force language detection
folder2md --condense-languages python,javascript .

Files Not Being Condensed

# Check priority classification
folder2md --dry-run --verbose . | grep "priority"

# Override with patterns
folder2md --include-pattern "src/**/*.py" --smart-condensing .

Output Still Too Large

# Use more aggressive strategy
folder2md --condense-strategy aggressive .

# Reduce token limit
folder2md --token-limit 60000 .

# Exclude more files
folder2md --exclude-pattern "tests/**" --exclude-pattern "docs/**" .

Advanced Configuration

Custom Token Allocation

# Override default budget allocation
token_budget_strategy: "custom"
token_budget_allocation:
  critical: 0.45
  high: 0.35
  medium: 0.15
  low: 0.05

Language-Specific Settings

# Different settings per language
condense_settings:
  python:
    preserve_docstrings: true
    preserve_type_hints: true
  javascript:
    preserve_jsdoc: false
    minify_functions: true

Performance Tuning

# Parallel condensing
parallel_condensing: true

# Cache AST parsing results
cache_ast: true

# Memory limit for large files
max_ast_size: 10485760  # 10MB

Smart condensing ensures your codebase fits within LLM token limits while preserving the most important information for meaningful AI interactions.