Configuration¶
Folder2MD4LLMs uses YAML configuration files to provide flexible control over the conversion process. This guide covers all configuration options and how to use them effectively.
Configuration File Location¶
Folder2MD4LLMs looks for configuration files in this order:
- File specified with
--config
flag folder2md.yaml
in current directoryfolder2md.yml
in current directory.folder2md4llms/config.yaml
in home directory
Basic Configuration¶
Create a folder2md.yaml
file in your project root:
# Basic output settings
output: project-summary.md
token_limit: 80000
# File filtering
include_patterns:
- "src/**/*.py"
- "**/*.md"
- "requirements.txt"
exclude_patterns:
- "tests/**"
- "**/__pycache__/**"
- "*.log"
Complete Configuration Reference¶
Output Control¶
# Output file path (relative or absolute)
output: "output.md"
# Output format (currently only markdown supported)
format: "markdown"
# Include file tree in output
include_tree: true
# Include statistics in output
include_stats: true
Token and Size Limits¶
# Maximum tokens in output
token_limit: 80000
# Maximum characters in output (alternative to token_limit)
char_limit: 400000
# Maximum size for individual files (bytes)
max_file_size: 1048576 # 1MB
# Maximum tokens per file chunk
max_tokens_per_chunk: 4000
Smart Condensing¶
# Enable intelligent code condensing
smart_condensing: true
# Condensing strategy: conservative, balanced, aggressive
condense_strategy: "balanced"
# Languages to condense (empty = all supported)
condense_languages:
- "python"
- "javascript"
- "typescript"
# Token budget strategy: conservative, balanced, aggressive
token_budget_strategy: "balanced"
# Files to never condense
preserve_patterns:
- "**/main.py"
- "**/config.py"
- "**/__init__.py"
File Filtering¶
# Patterns for files to include
include_patterns:
- "src/**/*.py"
- "lib/**/*.js"
- "**/*.md"
- "package.json"
- "requirements.txt"
# Patterns for files to exclude
exclude_patterns:
- "**/__pycache__/**"
- "**/node_modules/**"
- "**/.git/**"
- "**/dist/**"
- "**/build/**"
- "*.log"
- "*.tmp"
# Respect .gitignore files
use_gitignore: true
# Maximum depth for directory traversal
max_depth: 10
Document Processing¶
# Enable document conversion (PDF, DOCX, etc.)
include_docs: true
# Enable binary file analysis
binary_analysis: true
# Document formats to process
doc_formats:
- "pdf"
- "docx"
- "xlsx"
- "pptx"
- "ipynb"
Performance Options¶
# Enable parallel processing
parallel: true
# Number of worker threads (0 = auto)
max_workers: 4
# Memory limit warning threshold (MB)
memory_limit: 1000
# Enable progress display
show_progress: true
# Verbose output
verbose: false
Advanced Options¶
# Custom templates directory
templates_dir: ".folder2md4llms/templates"
# Skip update checks
skip_update_check: true
# Custom ignore file names
ignore_files:
- ".folder2md_ignore"
- ".folder2mdignore"
# Follow symbolic links
follow_symlinks: false
# Include hidden files
include_hidden: false
Configuration Validation¶
Folder2MD4LLMs validates configuration files and will show helpful error messages for invalid settings:
Common validation errors:
- Token limits: Must be between 1,000 and 1,000,000
- Worker count: Must be between 1 and 32
- File size limits: Must be positive integers
- Strategy values: Must be one of: conservative, balanced, aggressive
Environment Variables¶
Override configuration with environment variables:
# Custom config file location
export FOLDER2MD_CONFIG=/path/to/config.yaml
# Disable update checks
export FOLDER2MD_UPDATE_CHECK=false
# Set log level
export FOLDER2MD_LOG_LEVEL=DEBUG
Configuration Examples¶
Large Codebase¶
token_limit: 150000
smart_condensing: true
condense_strategy: aggressive
parallel: true
max_workers: 8
include_patterns:
- "src/**/*.{py,js,ts}"
- "lib/**/*.{py,js,ts}"
- "*.md"
exclude_patterns:
- "**/node_modules/**"
- "**/__pycache__/**"
- "**/dist/**"
- "**/build/**"
- "tests/**"
Documentation Project¶
include_docs: true
token_limit: 200000
include_patterns:
- "docs/**/*.{md,rst}"
- "*.md"
- "**/*.{pdf,docx}"
exclude_patterns:
- "docs/build/**"
- "**/.git/**"
Research Repository¶
include_docs: true
binary_analysis: true
token_limit: 100000
include_patterns:
- "**/*.{py,r,md,tex,pdf,docx}"
- "data/**/*.csv"
- "notebooks/**/*.ipynb"
- "scripts/**"
exclude_patterns:
- "**/.git/**"
- "**/output/**"
- "**/__pycache__/**"
Tips and Best Practices¶
Pattern Matching¶
- Use
**
for recursive directory matching - Use
*
for single-level wildcards - Use
{}
for multiple extensions:**/*.{py,js,ts}
- Patterns are case-sensitive on Linux/macOS
Performance Optimization¶
- Use specific include patterns rather than broad exclusions
- Set appropriate
max_file_size
limits - Enable
parallel
processing for large projects - Use
condense_strategy: aggressive
for very large codebases
Debugging Configuration¶
# Test patterns without processing
folder2md --dry-run --verbose
# Show what files would be included
folder2md --dry-run --include-pattern "src/**/*.py"
# Validate configuration syntax
folder2md --config-check folder2md.yaml
Common Pitfalls¶
- Remember to escape special characters in patterns
- Be careful with very broad include patterns
- Test with
--dry-run
before processing large directories - Check token limits match your LLM's context window
For more advanced configuration scenarios, see the Examples section.