Convert folder structures into LLM-friendly Markdown with smart condensing and document conversion.¶

Transform any codebase or document collection into a single, LLM-optimized Markdown file with intelligent content condensing, document conversion, and binary file analysis.
🚀 Quick Start¶
Basic workflow:
# Install folder2md4llms
pipx install folder2md4llms
# Convert your project
folder2md /path/to/your/project
# With smart condensing
folder2md /path/to/your/project --token-limit 80000
✨ Why Folder2MD4LLMs?¶
- LLM-Optimized: Purpose-built for AI assistant consumption
- Smart Condensing: Never lose important code due to token limits
- Document Conversion: PDFs, DOCX, XLSX automatically converted
- Binary Analysis: Intelligent descriptions for images and archives
- Parallel Processing: Multi-threaded file processing
- Token Management: Precise control over output size
- Configuration-Driven: YAML configuration for complex setups
- Format Support: Multiple document formats supported
- Version Control Friendly: Git-aware file filtering
- Hierarchical Patterns: Multiple ignore files supported
- Cross-Platform: Works on Windows, macOS, and Linux
- Reproducible: Consistent output across environments
🔥 Key Features¶
🧠 Smart Condensing Engine¶
Intelligent approach to content condensing that preserves code semantics:
- AST-Based Analysis: Understands code structure, not just text
- Priority Preservation: Keeps important functions and classes intact
- Token-Aware: Precise token counting for different LLM models
- Incremental Processing: Condenses only when necessary
# folder2md.yaml
smart_condensing: true
token_limit: 80000
condense_languages: [python, javascript, typescript]
preserve_patterns: ["**/main.py", "**/config.py"]
📚 Document Conversion¶
Seamlessly handle diverse file formats:
- PDF Documents - Extract text and structure
- Microsoft Office - DOCX, XLSX, PPTX support
- Scientific Formats - LaTeX, Jupyter notebooks
- Archive Files - ZIP, TAR content analysis
🔍 Binary File Analysis¶
Intelligent analysis of non-text files:
- Image Recognition - Identify content and dimensions
- Archive Inspection - List contents without extraction
- Media Analysis - Duration, format, and metadata
- Custom Handlers - Extensible analysis framework
⚡ Advanced Filtering¶
Sophisticated file selection with multiple ignore patterns:
# Project structure
project/
├── .gitignore # Global patterns
├── docs/.folder2md_ignore # Documentation-specific
└── tests/.folder2md_ignore # Test-specific patterns
- Hierarchical Ignore Files: Multiple
.folder2md_ignore
files - Git Integration: Respects
.gitignore
patterns - Custom Patterns: Flexible include/exclude rules
- Size Limits: Skip large files automatically
🛠️ Configuration Flexibility¶
Powerful configuration system for complex scenarios:
# Example: Large codebase configuration
token_limit: 100000
max_file_size: 1048576 # 1MB
parallel_workers: 8
include_patterns:
- "src/**/*.py"
- "docs/**/*.md"
- "config/**/*.yaml"
exclude_patterns:
- "**/node_modules/**"
- "**/__pycache__/**"
- "**/dist/**"
smart_condensing:
enabled: true
strategy: "balanced" # conservative, balanced, aggressive
preserve_classes: true
preserve_functions: true
preserve_imports: true
📊 Token Management¶
Precise control over output size with multiple strategies:
- Conservative: Minimal condensing, preserves most code
- Balanced: Optimal trade-off between size and completeness
- Aggressive: Maximum condensing for large codebases
- Custom: Define your own condensing rules
📝 Use Cases¶
Folder2MD4LLMs is designed for modern AI-assisted development:
- 🤖 Code Review: Prepare codebases for AI analysis
- 📖 Documentation: Generate comprehensive project overviews
- 🔍 Code Search: Create searchable project summaries
- 📚 Knowledge Base: Convert documentation collections
- 🎓 Learning: Analyze open-source projects efficiently
🎯 Real-World Examples¶
Large Python Project¶
# Django project with 1000+ files
folder2md ./my-django-app \
--token-limit 80000 \
--smart-condensing \
--exclude-pattern "**/migrations/**" \
--preserve-pattern "**/models.py"
Documentation Site¶
# Convert technical documentation
folder2md ./docs \
--include-docs \
--format markdown \
--output docs-summary.md
Research Repository¶
# Academic project with papers and code
folder2md ./research \
--include-pattern "**/*.{py,md,tex,pdf}" \
--binary-analysis \
--token-limit 100000
🔧 Advanced Features¶
📈 Performance Optimization¶
Built for speed and efficiency:
- Parallel Processing: Multi-threaded file analysis
- Streaming I/O: Memory-efficient file handling
- Caching System: Avoid reprocessing unchanged files
- Progress Tracking: Real-time processing updates
🔒 Security & Privacy¶
Safe analysis without execution:
- Static Analysis: No code execution required
- Sandboxed Processing: Isolated file analysis
- Privacy Aware: Local processing only
- Safe Patterns: Avoid sensitive file inclusion
🌐 Cross-Platform Support¶
Consistent behavior across platforms:
- Windows: Native binary and pip installation
- macOS: Universal binaries for Intel and Apple Silicon
- Linux: Portable binaries and package managers
- Docker: Container images available
🤝 Community & Support¶
-
GitHub
Report issues, request features, and contribute to development
-
Discussions
Ask questions, share tips, and connect with other users
-
Examples
Learn from real-world configuration examples
-
Documentation
Comprehensive guides and API reference
📄 Citation¶
If Folder2MD4LLMs helps your research or development, please cite:
@software{folder2md4llms_2025,
title={Folder2MD4LLMs: Smart Repository-to-Markdown Conversion},
author={HenriquesLab Team},
year={2025},
url={https://github.com/HenriquesLab/folder2md4llms},
note={Version 0.4.x}
}