Bulk Document Synthesizer
Convert large collections of PDFs and documents into markdown, analyze them against a relevance matrix, and synthesize findings into a cohesive narrative report with citations.
You have 50 policy documents, research papers, or reports to synthesize into one coherent summary — but reading them all would take weeks, and cherry-picking means you'll miss critical connections between documents.
Who it's for: policy analysts synthesizing government reports, researchers doing literature reviews, consultants producing client deliverables from source documents, executives needing briefings from large document sets, graduate students writing dissertations
Example
"Synthesize these 51 policy documents into a vision report" → Each document summarized and scored against your relevance matrix, then synthesized into a cohesive 8-page narrative with footnotes citing specific source documents
New here? 3-minute setup guide → | Already set up? Copy the template below.
# Document Synthesizer
## Goal
Process a collection of documents into a synthesized report. Convert source documents to markdown, analyze each against a relevance framework, and produce a final synthesis with citations that trace every claim back to its source.
## Directory Structure
- `source/` — Original documents (PDF, DOCX, TXT)
- `markdown/` — Converted plain-text versions of each document
- `summaries/` — Per-document summary and relevance analysis
- `framework/` — Relevance matrix and synthesis criteria
- `output/` — Final synthesized report(s)
## Processing Pipeline
### Phase 1: Convert
Convert all documents in `source/` to markdown in `markdown/`.
- Preserve structure (headings, lists, tables)
- Name files: `01-original-filename.md`, `02-original-filename.md` (numbered)
- Log any conversion issues to `output/conversion-log.md`
### Phase 2: Analyze
For each converted document, generate a summary in `summaries/`:
- 2-3 paragraph summary of key content
- Relevance score (1-5) against each criterion in the framework
- Key quotes or data points worth citing
- Cross-references to other documents covering similar topics
### Phase 3: Synthesize
Produce the final report in `output/`:
- Cohesive narrative that ties all documents together
- Organized by theme, not by source document
- Every factual claim includes a footnote citing the source document
- Executive summary at the top
- Appendix with full document list and relevance scores
## Relevance Framework Format (framework/criteria.md)
```
| Criterion | Weight | Description |
|-----------|--------|-------------|
| Strategic alignment | High | Does it support the stated goals? |
| Data quality | Medium | Are claims backed by evidence? |
| Recency | Medium | How current is the information? |
| Actionability | High | Does it suggest concrete next steps? |
```
## Rules
1. Every claim in the synthesis must cite a source document by number
2. Use footnotes in the format: [^1], [^2], etc. with references at the bottom
3. Do not invent information — only synthesize what's in the documents
4. Flag contradictions between documents explicitly
5. Process documents in batches of 5-10 to manage context
6. The final report should be self-contained — readable without the source docs
## Commands
- "/convert" — Run Phase 1: convert all source documents to markdown
- "/analyze" — Run Phase 2: generate summaries and relevance scores
- "/synthesize" — Run Phase 3: produce the final report
- "/status" — Show processing progress across all phases
- "/search [query]" — Search across all converted documents for a term
- "/contradictions" — List all identified contradictions between documents
What This Does
This playbook processes large collections of documents (PDFs, Word files, text files) into a synthesized report with proper citations. It converts documents to markdown, generates per-document summaries scored against a relevance matrix, then synthesizes everything into a cohesive narrative. Inspired by a Reddit user who processed 51 policy documents into a coherent 8-page vision document with footnotes — a task that would have taken weeks by hand — in 2-3 hours.
Prerequisites
- Claude Code installed and configured
- Documents to analyze (PDFs, DOCX, or text files)
- A clear synthesis goal or research question | Criterion | Weight | Description | |-----------|--------|-------------| | Strategic alignment | High | Does it support the stated goals? | | Data quality | Medium | Are claims backed by evidence? | | Recency | Medium | How current is the information? | | Actionability | High | Does it suggest concrete next steps? |
## Rules
1. Every claim in the synthesis must cite a source document by number
2. Use footnotes in the format: [^1], [^2], etc. with references at the bottom
3. Do not invent information — only synthesize what's in the documents
4. Flag contradictions between documents explicitly
5. Process documents in batches of 5-10 to manage context
6. The final report should be self-contained — readable without the source docs
## Commands
- "/convert" — Run Phase 1: convert all source documents to markdown
- "/analyze" — Run Phase 2: generate summaries and relevance scores
- "/synthesize" — Run Phase 3: produce the final report
- "/status" — Show processing progress across all phases
- "/search [query]" — Search across all converted documents for a term
- "/contradictions" — List all identified contradictions between documents
Step-by-Step Setup
Step 1: Create the project structure
mkdir -p ~/doc-synthesis/{source,markdown,summaries,framework,output}
cd ~/doc-synthesis
Step 2: Add your source documents
Copy all your PDFs, Word docs, or text files into the source/ folder.
Step 3: Define your relevance framework
Create framework/criteria.md tailored to your synthesis goal:
# Relevance Framework
## Synthesis Goal
Create a unified strategic vision from departmental policy documents.
## Criteria
| Criterion | Weight | Description |
|-----------|--------|-------------|
| Strategic alignment | High | Supports stated organizational goals |
| Evidence quality | Medium | Claims backed by data or case studies |
| Implementation feasibility | High | Practical and actionable recommendations |
| Stakeholder impact | Medium | Affects key stakeholder groups |
| Recency | Low | Published within last 3 years |
Step 4: Save CLAUDE.md and start processing
cd ~/doc-synthesis
claude
Try: "/convert" to start Phase 1.
Example Usage
Process a collection of policy documents:
"/convert — Convert all 30 PDFs in the source folder to markdown. Number them sequentially and log any conversion issues."
Analyze against your framework:
"/analyze — Generate a summary and relevance score for each document. Score against the criteria in framework/criteria.md."
Produce the final synthesis:
"/synthesize — Write an 8-page synthesis report organized by theme. Include footnotes for every claim, an executive summary, and an appendix with the full document list."
Search for specific topics:
"/search budget allocation — Show me every mention of budget allocation across all documents with context."
Find contradictions:
"/contradictions — Compare documents on overlapping topics and flag any conflicting claims or recommendations."
Export to Word:
"Convert the final synthesis report to a Word document with working hyperlinks in the footnotes."
Tips
- Batch processing: For large collections (50+ documents), process in batches of 5-10 to avoid context limits. Claude can track where it left off.
- Design the approach first: The Reddit user who processed 51 documents spent most of their time designing the approach and writing prompts. The actual processing was mostly waiting and spot-checking.
- Relevance matrix from context: If your documents relate to a specific initiative, have Claude distill the relevance criteria from meeting minutes, mission statements, or project briefs.
- Footnotes are essential: The synthesis is only useful if every claim can be traced back to a source. Insist on footnotes with document numbers.
- Iterative refinement: Run the synthesis once, review the output, then ask Claude to improve specific sections. The first pass gives you structure; subsequent passes add polish.
Troubleshooting
Problem: PDF conversion loses formatting or content
Solution: Some PDFs (especially scanned documents) may not convert well. For scanned PDFs, you may need OCR preprocessing. For well-structured PDFs, Claude can usually extract text directly.
Problem: The synthesis is too long or unfocused
Solution: Tighten your relevance framework. Add a word count target to the synthesis prompt: "Write a 3,000-word synthesis" rather than leaving it open-ended.
Problem: Context window is too small for all documents
Solution: The pipeline is designed for batch processing. The summaries act as compressed representations. In Phase 3, Claude reads the summaries (not full documents) to write the synthesis, keeping context manageable.