Roberto runs a payroll management firm. Every month he receives dozens of payslips and invoices from his clients in PDF format. Some come from modern accounting software. Others arrive scanned, crooked, with stamps over the text and handwritten notes in the margins. For years his routine was always the same: print, highlight, annotate by hand, then transfer the important numbers to Excel. Six hours a month gone just moving data from one place to another.
When he heard that Claude «read PDFs,» he thought he could finally offload that work. He uploaded an invoice and asked: «Summarize this invoice.» The result was accurate. And completely useless for what he needed. It told him what the invoice was for, who issued it, and the total amount — everything Roberto could already see at a glance. He tried again with several more. Same result.
The problem wasn’t Claude. The problem was the question. Roberto didn’t need anyone to understand the invoice. He needed someone to extract specific data points without making anything up. When he changed that, the six hours started disappearing.
Reading is not the same as analyzing
When you upload a document to Claude, several things happen at once. Claude reads whatever text it can detect. Then it attempts to understand that text based on what you ask. And, if you’re not careful, it infers what’s missing to give you a «nice» answer.
Here’s the fine line. For serious work, you want the first two. Never the third. That’s why asking for «summaries» is usually a bad idea with business documents. A summary invites gap-filling. A well-defined extraction doesn’t.
The difference between using Claude for document analysis as a summarizer or as an analyst isn’t about the technology. It’s about how you ask for what you need.
Which file formats Claude can read (and with what limits)
Claude can work with several common file types: searchable PDFs, scanned PDFs, images (JPG, PNG), Word documents, and spreadsheets. That doesn’t mean they all read equally well.
A PDF generated by accounting software usually reads almost perfectly. A scanned PDF depends on quality: tilt, resolution, stains, stamps. Images with small or poorly contrasted text generate more errors. Claude does what it can, but it doesn’t see like you do.
That’s why it’s crucial to ask it to tell you what it couldn’t read. That single request transforms an extraction into a reliable tool: if something is missing, it shows up as missing — not as a guess.
The analyst’s rule: don’t ask for summaries, ask for extractions
The change that saved Roberto time was this: he stopped saying «summarize the invoice» and started saying:
«Extract these fields, and if any aren’t legible, say so explicitly.»
That transforms Claude from writer to analyst. A good extraction has three characteristics:
- Specific fields: not «the important data,» but «invoice number, taxable base, VAT, total, issue date, issuer tax ID.»
- Fixed format: table, structured list, JSON — whatever lets you copy and paste without editing.
- Explicit prohibition on making things up: if a data point isn’t clear, it should say so. Not guess.
This way, if something is missing, it appears as missing. Not as a supposition. The difference between «not legible» and an invented number can be the difference between a properly recorded invoice and an accounting error.
Verification is also part of the work
When working with documents, a pretty table isn’t enough. You need to know how reliable it is. A good habit is to always request a final line:
«List the fields or pages you couldn’t read clearly.»
If Claude tells you a taxable base isn’t legible, you know you need to check that file. If it doesn’t flag anything, you can proceed with confidence. Verifying isn’t about distrusting AI. It’s about using it professionally. A reliable employee doesn’t just deliver results — they tell you where they’re uncertain.
The extraction prompt that works
An effective prompt for analyzing documents with Claude follows the Role · Context · Task · Format structure:
- Role: accounting analyst specialized in invoicing.
- Context: invoices from small clients, some scanned at low quality.
- Task: extract specific fields (number, date, base, VAT, total, tax ID, supplier).
- Format: table with one row per invoice. If a field isn’t legible, write «not legible» — don’t invent.
This prompt turns an hour of manual work into three minutes of supervised extraction. More importantly: it produces results you can audit.
When you should NOT upload a document
There are documents that, while technically uploadable, you shouldn’t upload without thinking. Confidential contracts. Health records. Sensitive tax information from third parties. In those cases, the answer isn’t «don’t use AI.» It’s «choose where.» Not every reading should happen in the cloud. Later we’ll cover local model alternatives for sensitive material. For now, keep one idea: every document you upload is a decision. Make it consciously.
This is just a taste. The full book shows you how to turn AI into your most productive team member.
📖 Your Digital Employee
Claude and AI as your best collaborator
