Question 1

Why are vendor accuracy claims usually inflated?

Accepted Answer

Four reasons. First, selection bias on the test corpus — vendors test on the statements their parser was tuned for, which over-represents clean private-bank native PDFs and under-represents the degraded PSU and co-operative bank statements that actually fail in production. Second, the accuracy metric is rarely defined the same way — field-level, transaction-level, statement-level, and end-to-end accuracy can differ by 8 to 15 percentage points on the same corpus. Third, the public benchmark used (when one is cited) is usually a single-distribution dataset that does not reflect lender portfolio mix. Fourth, the corpus size cited is sometimes too small to be statistically meaningful — a claim made on 200 statements has wide confidence intervals.

Question 2

What is a realistic accuracy on PSU dot-matrix statements?

Accepted Answer

On legacy PSU dot-matrix statements — printed from pre-2015 core banking and scanned to PDF — production-grade extractors typically operate at 72 to 85% field-level accuracy, with substantial variance across SBI, Bank of Baroda, PNB, and the smaller PSU banks. The bottleneck is OCR character recognition on degraded scans where ink density varies line-to-line. Claims above 90% on this segment should trigger an immediate test on your own corpus — they are achievable on a curated sub-segment but very rarely on the full PSU dot-matrix population.

Question 3

How do I run an apples-to-apples bake-off?

Accepted Answer

Assemble a corpus from your own production statements (not vendor-supplied) that mirrors your portfolio distribution — same share of private, PSU, co-operative, and AA payloads, same share of scanned vs native, same share of password-protected, same share of multi-account aggregation. Mask PII centrally. Define a single accuracy metric ahead of time and write down what counts as a correct extraction. Run all vendors on the same corpus. Measure end-to-end accuracy (extraction + categorisation + signal output), not just transaction extraction. Track latency at concurrent-call volume, not at single-statement volume. Score on the segments you actually lend to, not the aggregate.

Question 4

What is the minimum corpus size for a real benchmark?

Accepted Answer

Five hundred statements is the floor for a defensible per-segment accuracy claim with reasonable confidence intervals. One thousand to two thousand is the working standard for vendor diligence at a serious lender. Below 200, the confidence intervals are wide enough that two vendors with apparently different accuracy may not be statistically distinguishable. The corpus should also be stratified — at least 100 statements per major bank category (private, PSU, co-op, RRB) and at least 50 each on edge formats (password-protected, dot-matrix, multi-account).

Question 5

Why do AA-vs-PDF accuracy differ?

Accepted Answer

Account Aggregator payloads arrive as structured JSON with normalised transaction records, so the extraction problem is essentially solved at the source — accuracy on AA is usually above 99% on the extraction step. The accuracy claim that matters on AA is the normalisation parity question: does the analyzer produce the same categorisation, signal output, and credit score on a borrower's AA payload as it does on the borrower's PDF statement? If a borrower comes through AA on one application and PDF on the next, the underwriting decision should not change because the channel changed. Many analyzers have a 4 to 8 point accuracy gap between PDF and AA on the same borrower.

Bank Statement Analyzer Accuracy Benchmark: A Bake-Off Framework for Lenders

How this works

Describe your portfolio mix

Enter the three vendor claims

Read the normalised scores

Your portfolio profile

Vendor claims

Normalised comparison

How to read the output

Things to actually test

Why vendor accuracy claims are unreliable

Related

TransactIQ

Bank coverage

Architecture posture

BSA Build vs Buy Calculator

Frequently Asked Questions

Test TransactIQ on your own corpus