Configuration¶

Folder2MD4LLMs uses YAML configuration files to provide flexible control over the conversion process. This guide covers all configuration options and how to use them effectively.

Configuration File Location¶

Folder2MD4LLMs looks for configuration files in this order:

File specified with --config flag
folder2md.yaml in current directory
folder2md.yml in current directory
.folder2md4llms/config.yaml in home directory

Basic Configuration¶

Create a folder2md.yaml file in your project root:

# Basic output settings
output: project-summary.md
token_limit: 80000

# File filtering
include_patterns:
  - "src/**/*.py"
  - "**/*.md"
  - "requirements.txt"

exclude_patterns:
  - "tests/**"
  - "**/__pycache__/**"
  - "*.log"

Complete Configuration Reference¶

Output Control¶

# Output file path (relative or absolute)
output: "output.md"

# Output format (currently only markdown supported)
format: "markdown"

# Include file tree in output
include_tree: true

# Include statistics in output
include_stats: true

Token and Size Limits¶

# Maximum tokens in output
token_limit: 80000

# Maximum characters in output (alternative to token_limit)
char_limit: 400000

# Maximum size for individual files (bytes)
max_file_size: 1048576  # 1MB

# Maximum tokens per file chunk
max_tokens_per_chunk: 4000

Smart Condensing¶

# Enable intelligent code condensing
smart_condensing: true

# Condensing strategy: conservative, balanced, aggressive
condense_strategy: "balanced"

# Languages to condense (empty = all supported)
condense_languages:
  - "python"
  - "javascript"
  - "typescript"

# Token budget strategy: conservative, balanced, aggressive
token_budget_strategy: "balanced"

# Files to never condense
preserve_patterns:
  - "**/main.py"
  - "**/config.py"
  - "**/__init__.py"

File Filtering¶

# Patterns for files to include
include_patterns:
  - "src/**/*.py"
  - "lib/**/*.js"
  - "**/*.md"
  - "package.json"
  - "requirements.txt"

# Patterns for files to exclude
exclude_patterns:
  - "**/__pycache__/**"
  - "**/node_modules/**"
  - "**/.git/**"
  - "**/dist/**"
  - "**/build/**"
  - "*.log"
  - "*.tmp"

# Respect .gitignore files
use_gitignore: true

# Maximum depth for directory traversal
max_depth: 10

Document Processing¶

# Enable document conversion (PDF, DOCX, etc.)
include_docs: true

# Enable binary file analysis
binary_analysis: true

# Document formats to process
doc_formats:
  - "pdf"
  - "docx"
  - "xlsx"
  - "pptx"
  - "ipynb"

Performance Options¶

# Enable parallel processing
parallel: true

# Number of worker threads (0 = auto)
max_workers: 4

# Memory limit warning threshold (MB)
memory_limit: 1000

# Enable progress display
show_progress: true

# Verbose output
verbose: false

Advanced Options¶

# Custom templates directory
templates_dir: ".folder2md4llms/templates"

# Skip update checks
skip_update_check: true

# Custom ignore file names
ignore_files:
  - ".folder2md_ignore"
  - ".folder2mdignore"

# Follow symbolic links
follow_symlinks: false

# Include hidden files
include_hidden: false

Configuration Validation¶

Folder2MD4LLMs validates configuration files and will show helpful error messages for invalid settings:

# Test your configuration
folder2md --config folder2md.yaml --dry-run --verbose

Common validation errors:

Token limits: Must be between 1,000 and 1,000,000
Worker count: Must be between 1 and 32
File size limits: Must be positive integers
Strategy values: Must be one of: conservative, balanced, aggressive

Environment Variables¶

Override configuration with environment variables:

# Custom config file location
export FOLDER2MD_CONFIG=/path/to/config.yaml

# Disable update checks
export FOLDER2MD_UPDATE_CHECK=false

# Set log level
export FOLDER2MD_LOG_LEVEL=DEBUG

Configuration Examples¶

Large Codebase¶

token_limit: 150000
smart_condensing: true
condense_strategy: aggressive
parallel: true
max_workers: 8

include_patterns:
  - "src/**/*.{py,js,ts}"
  - "lib/**/*.{py,js,ts}"
  - "*.md"

exclude_patterns:
  - "**/node_modules/**"
  - "**/__pycache__/**"
  - "**/dist/**"
  - "**/build/**"
  - "tests/**"

Documentation Project¶

include_docs: true
token_limit: 200000

include_patterns:
  - "docs/**/*.{md,rst}"
  - "*.md"
  - "**/*.{pdf,docx}"

exclude_patterns:
  - "docs/build/**"
  - "**/.git/**"

Research Repository¶

include_docs: true
binary_analysis: true
token_limit: 100000

include_patterns:
  - "**/*.{py,r,md,tex,pdf,docx}"
  - "data/**/*.csv"
  - "notebooks/**/*.ipynb"
  - "scripts/**"

exclude_patterns:
  - "**/.git/**"
  - "**/output/**"
  - "**/__pycache__/**"

Tips and Best Practices¶

Pattern Matching¶

Use ** for recursive directory matching
Use * for single-level wildcards
Use {} for multiple extensions: **/*.{py,js,ts}
Patterns are case-sensitive on Linux/macOS

Performance Optimization¶

Use specific include patterns rather than broad exclusions
Set appropriate max_file_size limits
Enable parallel processing for large projects
Use condense_strategy: aggressive for very large codebases

Debugging Configuration¶

# Test patterns without processing
folder2md --dry-run --verbose

# Show what files would be included
folder2md --dry-run --include-pattern "src/**/*.py"

# Validate configuration syntax
folder2md --config-check folder2md.yaml

Common Pitfalls¶

Remember to escape special characters in patterns
Be careful with very broad include patterns
Test with --dry-run before processing large directories
Check token limits match your LLM's context window

For more advanced configuration scenarios, see the Examples section.