Skip to main content
ConvertBank to Excel Logo
Back to Blog
February 9, 2026
8 min read
Tutorials

Bank Statement to JSON: 4 Methods [2026]

Convert bank statement PDF to JSON format for APIs, databases, and automation. 4 tested methods from automated tools to Python scripts.

ConvertBankToExcel Team

ConvertBankToExcel Team

Bank Statement to JSON: 4 Methods [2026]

Bank Statement to JSON: 4 Methods That Work [2026]

I needed transaction data from a bank PDF in a format my Python script could actually read. CSV felt clunky. Excel was overkill. JSON was the obvious answer—structured, nested, and ready for any API or database I'd throw it at.

The problem? Banks don't export JSON. Not one that I've found. So I had to figure out how to bridge that gap myself.

Here are four methods I tested, from the quickest automated option to full manual control.

Why JSON for Bank Statements?

Before jumping into methods, here's why JSON makes sense for financial data—especially if you're a developer, data analyst, or fintech builder.

JSON handles nested structures naturally. A single transaction can carry a date, amount, description, category, and metadata without flattening everything into columns. That matters when you're feeding data into MongoDB, building a REST API, or running analytics with pandas.

!Diagram showing PDF bank statement converting to structured JSON format

Compare these two formats for the same transaction:

CSV:

2026-02-01,Amazon Purchase,-49.99,Debit,Shopping

JSON:

{
  "date": "2026-02-01",
  "description": "Amazon Purchase",
  "amount": -49.99,
  "type": "debit",
  "category": "shopping",
  "metadata": {
    "merchant": "Amazon",
    "reference": "AMZ-29481"
  }
}

The JSON version carries more context without column-header gymnastics. And it's immediately parseable in JavaScript, Python, Go, Ruby—any language with a JSON library (which is all of them).

Common use cases I've seen:

  • Fintech apps pulling transaction data into a database
  • Personal finance dashboards built with React or Vue
  • Accounting automation where you pipe bank data through a processing script
  • Expense tracking APIs that accept JSON payloads
  • Data analysis with Python or R, where JSON loads into dictionaries or dataframes

Method 1: Automated PDF-to-JSON Converter (Recommended)

The fastest route. Upload your PDF bank statement, get structured data back. No coding required.

  1. Go to ConvertBankToExcel.com
  2. Upload your bank statement PDF
  3. The OCR engine reads and structures your transactions
  4. Export as CSV, then convert to JSON (explained below)

The tool handles the hard part—reading messy PDF layouts, multi-column formats, and varying date styles. It exports to CSV, which you can convert to JSON in seconds using one of the techniques in Method 3.

!Screenshot showing converter interface with upload area and export options

Why I recommend starting here: manual PDF parsing is a nightmare. Bank statements have no standard layout. Column positions shift between banks. Headers change. Some PDFs use text layers, others are scanned images. An automated tool with OCR handles all of that.

Upload your statement now and get clean, structured data in under a minute.

Method 2: Python Script With Tabula or Camelot

If you want full control and you're comfortable with Python, you can extract tables directly from PDF bank statements and convert them to JSON.

Using Tabula:

import tabula
import json

# Extract tables from PDF
tables = tabula.read_pdf("statement.pdf", pages="all")

# Convert first table to list of dictionaries
transactions = tables[0].to_dict(orient="records")

# Clean up the data
cleaned = []
for t in transactions:
    cleaned.append({
        "date": str(t.get("Date", "")).strip(),
        "description": str(t.get("Description", "")).strip(),
        "amount": float(str(t.get("Amount", "0")).replace(",", "")),
        "balance": float(str(t.get("Balance", "0")).replace(",", ""))
    })

# Write JSON file
with open("transactions.json", "w") as f:
    json.dump(cleaned, f, indent=2)

print(f"Exported {len(cleaned)} transactions")

Using Camelot (better for complex tables):

import camelot
import json

tables = camelot.read_pdf("statement.pdf", flavor="stream", pages="all")

for i, table in enumerate(tables):
    records = table.df.to_dict(orient="records")
    with open(f"table_{i}.json", "w") as f:
        json.dump(records, f, indent=2)

The catch? Both libraries struggle with scanned PDFs (images). They work on text-based PDFs only. If your bank statement is a scanned document, you'll need OCR first—which brings us back to Method 1, or you'll need to add Tesseract to the pipeline.

Also, column detection isn't perfect. I spent about 2 hours tweaking Tabula parameters for a Chase statement before getting clean output. HSBC statements worked on the first try. Your mileage varies by bank.

When this method works well:

  • You process the same bank format repeatedly
  • You need a fully automated pipeline (cron job, CI/CD)
  • Your PDFs have selectable text (not scanned images)

When it doesn't:

  • Scanned or image-based PDFs
  • You process statements from many different banks
  • You need it done in 2 minutes, not 2 hours

Method 3: CSV-to-JSON Conversion

Already have your bank data in CSV? Converting to JSON is the easiest step in this entire process. Here are three quick ways.

Command line with Python:

cat transactions.csv | python3 -c "
import csv, json, sys
reader = csv.DictReader(sys.stdin)
print(json.dumps(list(reader), indent=2))
" > transactions.json

Node.js one-liner:

const fs = require('fs');
const csv = fs.readFileSync('transactions.csv', 'utf8');
const lines = csv.trim().split('\n');
const headers = lines[0].split(',');
const data = lines.slice(1).map(line => {
  const values = line.split(',');
  return headers.reduce((obj, h, i) => (
    { ...obj, [h.trim()]: values[i]?.trim() }
  ), {});
});
fs.writeFileSync('transactions.json', JSON.stringify(data, null, 2));

Python pandas (most flexible):

import pandas as pd

df = pd.read_csv("transactions.csv")
df.to_json("transactions.json", orient="records", indent=2)

The pandas approach is my go-to. It handles edge cases like quoted commas, mixed data types, and encoding issues. Plus you can clean the data before exporting:

import pandas as pd

df = pd.read_csv("transactions.csv")

# Clean amount column - remove currency symbols
df["Amount"] = df["Amount"].replace(r'[\$,]', '', regex=True).astype(float)

# Parse dates properly
df["Date"] = pd.to_datetime(df["Date"]).dt.strftime("%Y-%m-%d")

# Export clean JSON
df.to_json("clean_transactions.json", orient="records", indent=2)

!Terminal showing Python script converting CSV bank data to JSON output

If you don't have a CSV yet, get one from your PDF statement here—then use any of the methods above to convert it to JSON.

Method 4: Manual Copy-Paste With Online Tools

No coding. No installation. Just slow.

  1. Open your bank statement PDF
  2. Select and copy the transaction table
  3. Paste into Google Sheets or Excel
  4. Clean up rows and columns manually
  5. Export as CSV
  6. Use an online CSV-to-JSON converter (search "csv to json converter"—dozens exist)

I won't pretend this is efficient. For a 3-page statement with 40 transactions, expect 15-20 minutes of cleanup. The copy-paste step rarely captures table formatting correctly, so you'll spend most of your time fixing misaligned columns.

But if it's a one-time job and you don't want to install anything, it works.

Structuring Your JSON Output

Whatever method you use, think about your JSON structure before you export. A flat array of objects is fine for simple use cases:

[
  {"date": "2026-02-01", "description": "Grocery Store", "amount": -85.43},
  {"date": "2026-02-02", "description": "Salary Deposit", "amount": 3200.00}
]

But for more serious applications, wrap it with metadata:

{
  "account": {
    "bank": "Chase",
    "accountNumber": "****4521",
    "statementPeriod": {
      "from": "2026-01-01",
      "to": "2026-01-31"
    }
  },
  "summary": {
    "openingBalance": 4521.30,
    "closingBalance": 5102.87,
    "totalDebits": -2418.43,
    "totalCredits": 5000.00,
    "transactionCount": 47
  },
  "transactions": [
    {
      "date": "2026-01-03",
      "description": "Netflix Subscription",
      "amount": -15.99,
      "type": "debit",
      "category": "entertainment"
    }
  ]
}

This structure is immediately useful for:

  • Importing into MongoDB (the transactions array maps to a collection)
  • Building a REST API response format
  • Feeding into data visualization tools
  • Archiving with full context

Validating Your JSON

Bad JSON breaks everything downstream. Before using your exported file, validate it.

Quick check in terminal:

python3 -m json.tool transactions.json > /dev/null && echo "Valid" || echo "Invalid"

Common issues I've hit:

  • Trailing commas—JSON doesn't allow them (unlike JavaScript)
  • Unescaped quotes in descriptions like "Joe's Pizza"
  • NaN or Infinity values from pandas—use df.fillna(0) before exporting
  • Encoding issues—special characters in merchant names; always use UTF-8

Automated validation with Python:

import json

with open("transactions.json") as f:
    try:
        data = json.load(f)
        print(f"Valid JSON: {len(data)} records")
    except json.JSONDecodeError as e:
        print(f"Invalid JSON at line {e.lineno}: {e.msg}")

Which Method Should You Pick?

Method Speed Skill Needed Best For
Automated converter 1 min None Anyone, any bank format
Python (Tabula/Camelot) 10-60 min setup Intermediate Python Recurring automation
CSV-to-JSON scripts 2 min Basic coding Already have CSV data
Manual copy-paste 15-30 min None One-time, small statements

My recommendation: start with the automated converter to get clean CSV output, then use a simple Python or Node.js script to convert that CSV to JSON. This two-step approach gives you the best balance of speed and flexibility.

If you're building a pipeline that runs weekly or monthly, invest the time in Method 2. Once it's configured for your bank's format, it runs hands-free.

Wrapping Up

Bank statement to JSON isn't complicated once you know your options. The hardest part is actually extracting clean data from the PDF—the JSON conversion itself takes seconds.

For most people, the fastest path is: PDF to CSV with ConvertBankToExcel.com, then CSV to JSON with a 3-line Python script.

Try it free today—upload your statement and get structured data in under a minute. No signup, no credit card.