When Databases Made Sense: The 1995 Checkout Problem

2025-01-15 by Dan Quellhorst

There's a moment in web development history that everyone forgets: the exact reason we all started using MySQL and PostgreSQL. It wasn't because databases were cool or because they were "best practice." It was because flat files had a very specific, very real problem with concurrent writes.

The Problem: Double-Selling Inventory

Picture this: It's 1995. You're running an e-commerce site with CGI scripts and flat files. You have one widget left in stock. Two users click "Buy Now" at the exact same moment.

The Race Condition

# User A's CGI script
qty=$(grep 'widget_42' inventory.txt | cut -f2)  # Reads: 1
# User B's CGI script runs here!
qty=$(grep 'widget_42' inventory.txt | cut -f2)  # Also reads: 1

# User A writes
echo "widget_42\t0" > inventory.txt
# User B writes
echo "widget_42\t0" > inventory.txt

# Result: You just sold 2 widgets but only had 1!

This wasn't a theoretical problem. This was happening in production. People were overselling inventory, double-charging credit cards, corrupting order data. The web was getting real traffic, and flat files couldn't handle it.

Why MySQL Won

MySQL (and PostgreSQL, and others) solved this with transactions and row-level locking. The solution was elegant:

The Database Solution

-- User A's transaction
BEGIN;
UPDATE inventory 
SET quantity = quantity - 1 
WHERE product_id = 42 
AND quantity > 0;
-- Returns: 1 row affected
COMMIT;

-- User B's transaction (runs simultaneously)
BEGIN;
UPDATE inventory 
SET quantity = quantity - 1 
WHERE product_id = 42 
AND quantity > 0;
-- Returns: 0 rows affected (no inventory left!)
ROLLBACK;

The database handled the locking, the atomicity, the consistency. It was free (MySQL/PostgreSQL were open source), relatively easy to set up, and it just worked. The trade-off was obvious: add a 100MB daemon and some connection pooling overhead, but never worry about race conditions again.

The Trade-Off That Made Sense

In 1995-2005, using a database was clearly the right choice:

Real Problem: Race conditions were causing actual business problems
Easy Solution: Install MySQL, change a few lines of code
Low Cost: 100MB RAM and setup time < lost revenue from race conditions
Free: Open source databases were available and mature

The Cargo Cult Begins

But then something interesting happened. By 2005, using a database had become "best practice." Everyone was doing it. Tutorials assumed it. Frameworks were built around it. And we forgot why we started doing it in the first place.

The conversation shifted from:

1995: Making a Trade-off

Question: "Do I have concurrent write problems?"
If yes: "Use a database"
If no: "Flat files are fine"

To:

2005: Following Best Practices

Assumption: "Professional web apps use databases"
Result: Everyone uses PostgreSQL
Cost: Setup complexity, RAM overhead, connection pooling
Benefit: Solving a problem you might not have

Do You Actually Have The Problem?

Let's be honest about what most web applications actually are:

Most Web Apps in 2025

reality_check = {
    "Blog": {
        "Concurrent writers": 1,  # Just you
        "Concurrent readers": "Doesn't matter, reads don't conflict",
        "Need database?": "No"
    },
    
    "Internal tool": {
        "Concurrent users": 5,
        "Writes per second": 0.1,
        "Need database?": "Probably not"
    },
    
    "SaaS app": {
        "Concurrent users": 100,
        "Writes per user": "Their own data",
        "Conflicts": "Rare",
        "Need database?": "Maybe?"
    },
    
    "E-commerce checkout": {
        "Shared inventory": True,
        "Concurrent writes": "Common",
        "Need database?": "YES - this is the original problem!"
    }
}

What Changed: File Locking Actually Works

Here's the thing nobody talks about: Unix had file locking in 1995. We just didn't use it properly in CGI scripts. Modern approaches can handle the original problem just fine:

Modern Flat File Solution

#!/bin/bash
# Atomic inventory update with file locking

# flock ensures only one process modifies at a time
flock inventory.txt -c '
  qty=$(grep "^widget_42" inventory.txt | cut -f2)
  
  if [ "$qty" -gt 0 ]; then
    # Update with awk, write to temp, atomic rename
    awk -v id=widget_42 "\$1==id {\$2--} 1" inventory.txt > temp.txt
    mv temp.txt inventory.txt  # Atomic operation
    echo "Success"
  else
    echo "Out of stock"
  fi
'

# This actually solves the 1995 problem!

Alternative Approaches

Even if you can't or don't want to use file locking, there are other patterns that avoid the problem entirely:

Append-Only Logs

# No concurrent update conflicts - only appends!
echo "$(date +%s)\tsale\twidget_42\tuser_123" >> sales.log
echo "$(date +%s)\tsale\twidget_42\tuser_456" >> sales.log

# Calculate current inventory by reading the log
grep widget_42 sales.log | wc -l

Optimistic Locking

# Add a version column to your TSV
# widget_42  5  v123

def update_inventory(product_id, expected_version):
    # Read current state
    qty, version = read_inventory(product_id)
    
    # Check version hasn't changed
    if version != expected_version:
        return False  # Someone else updated it
    
    # Write with new version
    write_inventory(product_id, qty - 1, generate_new_version())
    return True

Single Writer Pattern

# Queue all writes to a single process
echo "decrement widget_42" >> write_queue.txt

# Separate process handles writes sequentially
while true; do
  if [ -s write_queue.txt ]; then
    cmd=$(head -1 write_queue.txt)
    execute_write "$cmd"
    tail -n +2 write_queue.txt > write_queue.tmp
    mv write_queue.tmp write_queue.txt
  fi
  sleep 0.1
done

When You Actually Need a Database

Let's be clear: databases aren't wrong. They're just solving a specific problem. You actually need PostgreSQL/MySQL when you have:

High-frequency concurrent writes: Multiple users modifying the same data >100 times per second
Complex multi-table transactions: Updates that must span multiple files atomically
E-commerce checkout: The original 1995 problem - shared inventory with concurrent purchases
Financial transactions: Where correctness is critical and conflicts are common
Complex relational queries: JOINs across many tables with complex conditions

When Flat Files Are Fine

For most applications, you don't have the 1995 problem:

Blogs and content sites: One writer (you), many readers (no conflicts)
Analytics and logs: Append-only data (no concurrent updates)
Configuration management: Rare updates, version control is more important than transactions
Personal tools: Single user or very low concurrency
Prototypes and MVPs: Optimize for speed of development, not theoretical scale

The Absurdity of Modern Defaults

Here's what we've normalized:

For a 46-Article Blog

"enterprise" = {
    "Data size": "500KB",
    "Writers": 1,
    "Concurrent writes": "Never",
    
    "Solution": {
        "PostgreSQL daemon": "100MB RAM idle",
        "Connection pool": "10+ connections",
        "Query parser": "Complex optimization",
        "Network sockets": "Even for localhost",
        "ORM layer": "Additional abstraction",
    },
    
    "Alternative": {
        "TSV files": "500KB on disk",
        "grep": "Native C code",
        "Total overhead": "~0MB",
    },
    
    "Ratio": "200:1 complexity for zero benefit"
}

It's like renting a semi-truck to transport a sandwich. Sure, the truck works, and it would be necessary if you were transporting 10,000 sandwiches. But you're not.

What We're Building

This realization led me to create dbbasic-tsv: a TSV-based database that's honest about what it's solving:

File locking for atomic operations
Optimistic locking for update conflicts
Append-only patterns where possible
Single-writer queues when needed
Zero setup because there's no daemon
grep-friendly because text files are debuggable

Performance Reality

Pure Python TSV operations achieve 163,000 inserts/second - more than enough for any blog, most internal tools, and many production applications. When you actually need more speed, optional Rust acceleration provides 600,000+ ops/sec. For comparison, your blog probably receives fewer than 100 writes per day.

Conclusion: Remember Why

Databases made sense in 1995 because of a specific problem: concurrent writes to shared data causing race conditions in e-commerce checkouts. The trade-off was clearly worth it.

But somewhere along the way, we forgot to ask: Do I have the 1995 problem?

Most of us don't. We're running PostgreSQL for blogs with one writer. We're using connection pools for internal tools with five users. We're optimizing for problems we don't have.

The good news: you can go back. Flat files work fine when you don't have concurrent write conflicts. And when you do? Modern file locking, append-only logs, and optimistic concurrency control can handle far more than you think.

The Question to Ask

Before you install PostgreSQL, ask yourself:

Do I have multiple users updating the same data simultaneously?
Is this actually happening, or am I just planning for it?
Could I solve this with file locking or append-only logs?
Am I optimizing for a problem I don't have yet?

If you're building e-commerce checkout, use a database. If you're building a blog, maybe just use grep.

Simple > Complex. Even in 1995, we knew that. We just forgot.