Skip to content

Convert folder structures into LLM-friendly Markdown with smart condensing and document conversion.

Folder2MD4LLMs

Transform any codebase or document collection into a single, LLM-optimized Markdown file with intelligent content condensing, document conversion, and binary file analysis.

🚀 Quick Start

Basic workflow:

# Install folder2md4llms
pipx install folder2md4llms

# Convert your project
folder2md /path/to/your/project

# With smart condensing
folder2md /path/to/your/project --token-limit 80000

Complete Installation Guide →

✨ Why Folder2MD4LLMs?

  • LLM-Optimized: Purpose-built for AI assistant consumption
  • Smart Condensing: Never lose important code due to token limits
  • Document Conversion: PDFs, DOCX, XLSX automatically converted
  • Binary Analysis: Intelligent descriptions for images and archives
  • Parallel Processing: Multi-threaded file processing
  • Token Management: Precise control over output size
  • Configuration-Driven: YAML configuration for complex setups
  • Format Support: Multiple document formats supported
  • Version Control Friendly: Git-aware file filtering
  • Hierarchical Patterns: Multiple ignore files supported
  • Cross-Platform: Works on Windows, macOS, and Linux
  • Reproducible: Consistent output across environments

🔥 Key Features

🧠 Smart Condensing Engine

Intelligent approach to content condensing that preserves code semantics:

  • AST-Based Analysis: Understands code structure, not just text
  • Priority Preservation: Keeps important functions and classes intact
  • Token-Aware: Precise token counting for different LLM models
  • Incremental Processing: Condenses only when necessary
# folder2md.yaml
smart_condensing: true
token_limit: 80000
condense_languages: [python, javascript, typescript]
preserve_patterns: ["**/main.py", "**/config.py"]

📚 Document Conversion

Seamlessly handle diverse file formats:

  • PDF Documents - Extract text and structure
  • Microsoft Office - DOCX, XLSX, PPTX support
  • Scientific Formats - LaTeX, Jupyter notebooks
  • Archive Files - ZIP, TAR content analysis
# Convert project with documents
folder2md ./research-project --include-docs

🔍 Binary File Analysis

Intelligent analysis of non-text files:

  • Image Recognition - Identify content and dimensions
  • Archive Inspection - List contents without extraction
  • Media Analysis - Duration, format, and metadata
  • Custom Handlers - Extensible analysis framework

Advanced Filtering

Sophisticated file selection with multiple ignore patterns:

# Project structure
project/
├── .gitignore                 # Global patterns
├── docs/.folder2md_ignore     # Documentation-specific
└── tests/.folder2md_ignore    # Test-specific patterns
  • Hierarchical Ignore Files: Multiple .folder2md_ignore files
  • Git Integration: Respects .gitignore patterns
  • Custom Patterns: Flexible include/exclude rules
  • Size Limits: Skip large files automatically

🛠️ Configuration Flexibility

Powerful configuration system for complex scenarios:

# Example: Large codebase configuration
token_limit: 100000
max_file_size: 1048576  # 1MB
parallel_workers: 8

include_patterns:
  - "src/**/*.py"
  - "docs/**/*.md"
  - "config/**/*.yaml"

exclude_patterns:
  - "**/node_modules/**"
  - "**/__pycache__/**"
  - "**/dist/**"

smart_condensing:
  enabled: true
  strategy: "balanced"  # conservative, balanced, aggressive
  preserve_classes: true
  preserve_functions: true
  preserve_imports: true

📊 Token Management

Precise control over output size with multiple strategies:

  • Conservative: Minimal condensing, preserves most code
  • Balanced: Optimal trade-off between size and completeness
  • Aggressive: Maximum condensing for large codebases
  • Custom: Define your own condensing rules

📝 Use Cases

Folder2MD4LLMs is designed for modern AI-assisted development:

  • 🤖 Code Review: Prepare codebases for AI analysis
  • 📖 Documentation: Generate comprehensive project overviews
  • 🔍 Code Search: Create searchable project summaries
  • 📚 Knowledge Base: Convert documentation collections
  • 🎓 Learning: Analyze open-source projects efficiently

🎯 Real-World Examples

Large Python Project

# Django project with 1000+ files
folder2md ./my-django-app \
  --token-limit 80000 \
  --smart-condensing \
  --exclude-pattern "**/migrations/**" \
  --preserve-pattern "**/models.py"

Documentation Site

# Convert technical documentation
folder2md ./docs \
  --include-docs \
  --format markdown \
  --output docs-summary.md

Research Repository

# Academic project with papers and code
folder2md ./research \
  --include-pattern "**/*.{py,md,tex,pdf}" \
  --binary-analysis \
  --token-limit 100000

🔧 Advanced Features

📈 Performance Optimization

Built for speed and efficiency:

  • Parallel Processing: Multi-threaded file analysis
  • Streaming I/O: Memory-efficient file handling
  • Caching System: Avoid reprocessing unchanged files
  • Progress Tracking: Real-time processing updates

🔒 Security & Privacy

Safe analysis without execution:

  • Static Analysis: No code execution required
  • Sandboxed Processing: Isolated file analysis
  • Privacy Aware: Local processing only
  • Safe Patterns: Avoid sensitive file inclusion

🌐 Cross-Platform Support

Consistent behavior across platforms:

  • Windows: Native binary and pip installation
  • macOS: Universal binaries for Intel and Apple Silicon
  • Linux: Portable binaries and package managers
  • Docker: Container images available

🤝 Community & Support

  • GitHub


    Report issues, request features, and contribute to development

    Visit Repository

  • Discussions


    Ask questions, share tips, and connect with other users

    Join Discussions

  • Examples


    Learn from real-world configuration examples

    Browse Examples

  • Documentation


    Comprehensive guides and API reference

    Read Docs

📄 Citation

If Folder2MD4LLMs helps your research or development, please cite:

@software{folder2md4llms_2025,
  title={Folder2MD4LLMs: Smart Repository-to-Markdown Conversion},
  author={HenriquesLab Team},
  year={2025},
  url={https://github.com/HenriquesLab/folder2md4llms},
  note={Version 0.4.x}
}