Smart Condensing¶
Smart Condensing is Folder2MD4LLMs' intelligent approach to handling large codebases that exceed token limits. Instead of crude truncation, it uses AST analysis and priority-based compression to preserve the most important code while fitting within your specified limits.
How Smart Condensing Works¶
1. Priority Analysis¶
The system automatically categorizes files by importance:
- CRITICAL: Core configuration, main entry points
- HIGH: Primary source files, important modules
- MEDIUM: Supporting files, utilities
- LOW: Tests, documentation, examples
2. Token Budget Allocation¶
Based on priority levels, the system allocates token budget:
Conservative: 50% Critical, 30% High, 15% Medium, 5% Low
Balanced: 40% Critical, 35% High, 20% Medium, 5% Low
Aggressive: 30% Critical, 40% High, 25% Medium, 5% Low
3. Progressive Condensing¶
Files are condensed using five levels:
- None: Full content preserved
- Light: Remove comments and docstrings
- Moderate: Simplify function bodies
- Heavy: Keep signatures and key logic
- Maximum: Keep only class/function signatures
Configuration¶
Basic Usage¶
# Enable smart condensing with default settings
folder2md --smart-condensing --token-limit 80000 .
# Choose condensing strategy
folder2md --smart-condensing --condense-strategy aggressive .
YAML Configuration¶
# Enable smart condensing
smart_condensing: true
# Token limit to target
token_limit: 80000
# Condensing strategy: conservative, balanced, aggressive
condense_strategy: "balanced"
# Token budget strategy
token_budget_strategy: "balanced"
# Languages to condense (empty = all supported)
condense_languages:
- "python"
- "javascript"
- "typescript"
- "java"
# Files to never condense
preserve_patterns:
- "**/main.py"
- "**/config.py"
- "**/__init__.py"
- "**/README.md"
Supported Languages¶
Smart condensing currently supports:
Python¶
- Function and class signature preservation
- Import statement analysis
- Docstring and comment removal
- Logic simplification while preserving control flow
JavaScript/TypeScript¶
- Function and class declarations
- Import/export analysis
- Comment removal
- Arrow function simplification
Java¶
- Method and class signatures
- Package and import statements
- Annotation preservation
- Access modifier retention
More Languages Coming¶
- C/C++
- Go
- Rust
- C#
Condensing Strategies¶
Conservative¶
Best for: Code review, maintaining maximum context
- Minimal condensing
- Preserves most implementation details
- Higher token usage
- Better for understanding complex logic
Balanced (Default)¶
Best for: General use, good balance of content and size
- Moderate condensing
- Preserves key logic and structure
- Good compromise between detail and size
- Suitable for most LLM interactions
Aggressive¶
Best for: Large codebases, overview understanding
- Maximum condensing
- Focuses on structure and interfaces
- Lowest token usage
- Good for architectural understanding
File Priority Classification¶
How Files Are Classified¶
The system uses several signals to determine file importance:
- File naming patterns:
main.py
,index.js
,app.py
→ CRITICALconfig.*
,settings.*
→ CRITICALutils.*
,helpers.*
→ MEDIUM-
test_*
,*_test.*
→ LOW -
Directory structure:
- Root level files → Higher priority
src/
,lib/
→ HIGH-
tests/
,docs/
→ LOW -
Import analysis:
- Frequently imported → Higher priority
-
Leaf modules → Lower priority
-
File size and complexity:
- Balanced consideration of size vs importance
Custom Priority Patterns¶
Override automatic classification:
# Force high priority for specific patterns
high_priority_patterns:
- "**/models/*.py"
- "**/core/*.js"
# Force low priority for specific patterns
low_priority_patterns:
- "**/examples/**"
- "**/demos/**"
Condensing Levels Explained¶
Level 1: Light Condensing¶
Removes non-essential elements while preserving logic:
# Original
def calculate_total(items: List[Item]) -> float:
"""Calculate the total price of items.
Args:
items: List of items to calculate total for
Returns:
Total price as float
"""
total = 0.0
# Iterate through each item
for item in items:
# Add item price to running total
total += item.price
return total
# Light condensing
def calculate_total(items: List[Item]) -> float:
total = 0.0
for item in items:
total += item.price
return total
Level 3: Moderate Condensing¶
Simplifies function bodies while preserving structure:
# Moderate condensing
def calculate_total(items: List[Item]) -> float:
# Implementation details condensed
return sum(item.price for item in items)
Level 5: Maximum Condensing¶
Keeps only signatures and essential structure:
Best Practices¶
Choose the Right Strategy¶
- Start with
balanced
for most use cases - Use
conservative
when you need detailed implementation - Use
aggressive
for large codebases or architectural overview
Optimize File Selection¶
# Focus on important directories
include_patterns:
- "src/**/*.py"
- "lib/**/*.js"
- "core/**/*.ts"
# Exclude less important files
exclude_patterns:
- "tests/**"
- "**/node_modules/**"
- "**/__pycache__/**"
Preserve Critical Files¶
# Never condense these files
preserve_patterns:
- "**/main.py"
- "**/config.py"
- "**/requirements.txt"
- "package.json"
Monitor Token Usage¶
# Check token distribution
folder2md --smart-condensing --stats --dry-run .
# Adjust limits if needed
folder2md --smart-condensing --token-limit 100000 .
Troubleshooting¶
Smart Condensing Not Activating¶
# Check if languages are supported
folder2md --dry-run --verbose . | grep "condensing"
# Force language detection
folder2md --condense-languages python,javascript .
Files Not Being Condensed¶
# Check priority classification
folder2md --dry-run --verbose . | grep "priority"
# Override with patterns
folder2md --include-pattern "src/**/*.py" --smart-condensing .
Output Still Too Large¶
# Use more aggressive strategy
folder2md --condense-strategy aggressive .
# Reduce token limit
folder2md --token-limit 60000 .
# Exclude more files
folder2md --exclude-pattern "tests/**" --exclude-pattern "docs/**" .
Advanced Configuration¶
Custom Token Allocation¶
# Override default budget allocation
token_budget_strategy: "custom"
token_budget_allocation:
critical: 0.45
high: 0.35
medium: 0.15
low: 0.05
Language-Specific Settings¶
# Different settings per language
condense_settings:
python:
preserve_docstrings: true
preserve_type_hints: true
javascript:
preserve_jsdoc: false
minify_functions: true
Performance Tuning¶
# Parallel condensing
parallel_condensing: true
# Cache AST parsing results
cache_ast: true
# Memory limit for large files
max_ast_size: 10485760 # 10MB
Smart condensing ensures your codebase fits within LLM token limits while preserving the most important information for meaningful AI interactions.