Skip to content

Configuration

Folder2MD4LLMs uses YAML configuration files to provide flexible control over the conversion process. This guide covers all configuration options and how to use them effectively.

Configuration File Location

Folder2MD4LLMs looks for configuration files in this order:

  1. File specified with --config flag
  2. folder2md.yaml in current directory
  3. folder2md.yml in current directory
  4. .folder2md4llms/config.yaml in home directory

Basic Configuration

Create a folder2md.yaml file in your project root:

# Basic output settings
output: project-summary.md
token_limit: 80000

# File filtering
include_patterns:
  - "src/**/*.py"
  - "**/*.md"
  - "requirements.txt"

exclude_patterns:
  - "tests/**"
  - "**/__pycache__/**"
  - "*.log"

Complete Configuration Reference

Output Control

# Output file path (relative or absolute)
output: "output.md"

# Output format (currently only markdown supported)
format: "markdown"

# Include file tree in output
include_tree: true

# Include statistics in output
include_stats: true

Token and Size Limits

# Maximum tokens in output
token_limit: 80000

# Maximum characters in output (alternative to token_limit)
char_limit: 400000

# Maximum size for individual files (bytes)
max_file_size: 1048576  # 1MB

# Maximum tokens per file chunk
max_tokens_per_chunk: 4000

Smart Condensing

# Enable intelligent code condensing
smart_condensing: true

# Condensing strategy: conservative, balanced, aggressive
condense_strategy: "balanced"

# Languages to condense (empty = all supported)
condense_languages:
  - "python"
  - "javascript"
  - "typescript"

# Token budget strategy: conservative, balanced, aggressive
token_budget_strategy: "balanced"

# Files to never condense
preserve_patterns:
  - "**/main.py"
  - "**/config.py"
  - "**/__init__.py"

File Filtering

# Patterns for files to include
include_patterns:
  - "src/**/*.py"
  - "lib/**/*.js"
  - "**/*.md"
  - "package.json"
  - "requirements.txt"

# Patterns for files to exclude
exclude_patterns:
  - "**/__pycache__/**"
  - "**/node_modules/**"
  - "**/.git/**"
  - "**/dist/**"
  - "**/build/**"
  - "*.log"
  - "*.tmp"

# Respect .gitignore files
use_gitignore: true

# Maximum depth for directory traversal
max_depth: 10

Document Processing

# Enable document conversion (PDF, DOCX, etc.)
include_docs: true

# Enable binary file analysis
binary_analysis: true

# Document formats to process
doc_formats:
  - "pdf"
  - "docx"
  - "xlsx"
  - "pptx"
  - "ipynb"

Performance Options

# Enable parallel processing
parallel: true

# Number of worker threads (0 = auto)
max_workers: 4

# Memory limit warning threshold (MB)
memory_limit: 1000

# Enable progress display
show_progress: true

# Verbose output
verbose: false

Advanced Options

# Custom templates directory
templates_dir: ".folder2md4llms/templates"

# Skip update checks
skip_update_check: true

# Custom ignore file names
ignore_files:
  - ".folder2md_ignore"
  - ".folder2mdignore"

# Follow symbolic links
follow_symlinks: false

# Include hidden files
include_hidden: false

Configuration Validation

Folder2MD4LLMs validates configuration files and will show helpful error messages for invalid settings:

# Test your configuration
folder2md --config folder2md.yaml --dry-run --verbose

Common validation errors:

  • Token limits: Must be between 1,000 and 1,000,000
  • Worker count: Must be between 1 and 32
  • File size limits: Must be positive integers
  • Strategy values: Must be one of: conservative, balanced, aggressive

Environment Variables

Override configuration with environment variables:

# Custom config file location
export FOLDER2MD_CONFIG=/path/to/config.yaml

# Disable update checks
export FOLDER2MD_UPDATE_CHECK=false

# Set log level
export FOLDER2MD_LOG_LEVEL=DEBUG

Configuration Examples

Large Codebase

token_limit: 150000
smart_condensing: true
condense_strategy: aggressive
parallel: true
max_workers: 8

include_patterns:
  - "src/**/*.{py,js,ts}"
  - "lib/**/*.{py,js,ts}"
  - "*.md"

exclude_patterns:
  - "**/node_modules/**"
  - "**/__pycache__/**"
  - "**/dist/**"
  - "**/build/**"
  - "tests/**"

Documentation Project

include_docs: true
token_limit: 200000

include_patterns:
  - "docs/**/*.{md,rst}"
  - "*.md"
  - "**/*.{pdf,docx}"

exclude_patterns:
  - "docs/build/**"
  - "**/.git/**"

Research Repository

include_docs: true
binary_analysis: true
token_limit: 100000

include_patterns:
  - "**/*.{py,r,md,tex,pdf,docx}"
  - "data/**/*.csv"
  - "notebooks/**/*.ipynb"
  - "scripts/**"

exclude_patterns:
  - "**/.git/**"
  - "**/output/**"
  - "**/__pycache__/**"

Tips and Best Practices

Pattern Matching

  • Use ** for recursive directory matching
  • Use * for single-level wildcards
  • Use {} for multiple extensions: **/*.{py,js,ts}
  • Patterns are case-sensitive on Linux/macOS

Performance Optimization

  • Use specific include patterns rather than broad exclusions
  • Set appropriate max_file_size limits
  • Enable parallel processing for large projects
  • Use condense_strategy: aggressive for very large codebases

Debugging Configuration

# Test patterns without processing
folder2md --dry-run --verbose

# Show what files would be included
folder2md --dry-run --include-pattern "src/**/*.py"

# Validate configuration syntax
folder2md --config-check folder2md.yaml

Common Pitfalls

  • Remember to escape special characters in patterns
  • Be careful with very broad include patterns
  • Test with --dry-run before processing large directories
  • Check token limits match your LLM's context window

For more advanced configuration scenarios, see the Examples section.