Dogfooding dbbasic-tsv: Migrating This Blog From JSON to TSV

2025-09-30 by Dan

There's a special kind of satisfaction that comes from using your own tools in production. Today I want to share the story of how dbbasic-tsv went from concept to PyPI package to powering this very blog you're reading - all in the same day.

The Launch: dbbasic-tsv 1.0.1

First, let me introduce the tool. dbbasic-tsv is a Python library that treats TSV (Tab-Separated Values) files as a database. It provides a simple API for inserts, queries, updates, and deletes - while keeping all your data in human-readable text files.

Why TSV Files as a Database?

Zero setup: No server to configure, no migrations to run
Human-readable: You can grep, cat, or edit files directly
Git-friendly: Text files mean meaningful diffs and version control
Fast enough: 163K inserts/sec in pure Python, 600K/sec with Rust
Simple: The whole library is under 1000 lines of code

The core philosophy is simple: most applications don't need PostgreSQL. They need something between "47 individual JSON files" and "enterprise database cluster."

The Dogfooding Moment

After publishing to PyPI, I looked at my own blog's architecture. This site was storing 47 articles as individual JSON files in content/articles/. Each article had metadata (title, date, author, tags) and content blocks. The Flask app would open and parse these JSON files on every request.

It worked fine. But there was a certain irony in having just published a tool for managing structured data... and not using it myself.

The Decision

"Could this blog run on dbbasic-tsv?"

More importantly: Should it?

The answer to both was yes. If the tool couldn't handle 47 blog articles, how could I recommend it for real applications?

The Migration: JSON → TSV

I wrote a migration script that would read all 47 JSON files and convert them to TSV format. The schema was straightforward:

TSV Schema Design

articles_db = TSV(
    "articles",
    columns=[
        "slug",          # URL-friendly identifier
        "title",         # Article title
        "date",          # Publication date
        "author",        # Author name
        "category",      # Primary category
        "description",   # Meta description
        "tags",          # Comma-separated tags
        "content_json"   # Blocks as JSON string
    ],
    data_dir=Path("data")
)

The clever bit: storing the content blocks (paragraphs, headings, cards, lists) as a JSON string in the content_json column. This preserved the complex nested structure while keeping the rest of the data queryable as flat fields.

Running the migration:

Migration Results

$ python migrate_to_tsv.py --execute
✓ TSV database initialized at: data/articles.tsv
✓ Found 47 JSON articles

✓ Migrated: evolution-internet-clients
✓ Migrated: when-databases-made-sense
✓ Migrated: unix-foundation-web-dev
...

============================================================
Migration complete:
  Migrated: 47
  Skipped: 0
  Errors: 0

Database stats:
  Total articles: 47
  File size: 950 KB
  Location: data/articles.tsv

47 separate JSON files → 1 TSV file. 950KB total. Human-readable. Greppable.

The Safe Deployment: Fallback Strategy

My first instinct was defensive programming. I updated the Flask app to try TSV first, but fall back to JSON files if anything went wrong:

Initial Implementation (With Safety Net)

def load_article(slug):
    # Try TSV first
    if TSV_ENABLED and articles_db:
        try:
            row = articles_db.query_one(slug=slug)
            if row:
                blocks = json.loads(row['content_json'])
                return {
                    'slug': row['slug'],
                    'title': row['title'],
                    'meta': {...},
                    'blocks': blocks
                }
        except Exception as e:
            print(f"TSV load failed: {e}, falling back to JSON")
    
    # Fallback to JSON files
    json_path = f'content/articles/{slug}.json'
    if os.path.exists(json_path):
        with open(json_path, 'r') as f:
            return json.load(f)

This felt responsible. It added 64 lines of code: TSV loading logic, error handling, JSON fallback, TSV_ENABLED flag checking.

I deployed it at midnight. Checked the logs:

Production Logs

✓ TSV DATABASE ACTIVE: 47 articles loaded
[TSV] Loaded article: unix-foundation-web-dev
[TSV] Loaded article: evolution-internet-clients
[TSV] Loaded article: when-databases-made-sense

Every single article was loading from TSV. The JSON fallback code never executed. Not once.

The Complexity Question

Looking at the diff, I realized something: I had increased complexity by 64 lines. The goal was simplification, but defensive programming had made the codebase more complex, not less.

The Problem with Safety Nets

64 lines of fallback code that never runs
Dual-system complexity (TSV and JSON)
"What happens if TSV fails?" mental overhead
Two sources of truth to maintain
More branches in control flow

The irony was thick. I had just written an article about questioning whether you need a database. Now I was questioning whether I needed fallback code.

Going All In: Removing the Safety Net

After verifying TSV worked perfectly in production, I made the decision: remove all JSON fallback code.

Final Simplified Version

# Initialize (no error handling)
articles_db = TSV(
    "articles",
    columns=["slug", "title", "date", "author", 
             "category", "description", "tags", "content_json"],
    data_dir=Path("data")
)

# Load article (no fallback)
def load_article(slug):
    row = articles_db.query_one(slug=slug)
    if not row:
        return None
    
    blocks = json.loads(row['content_json'])
    return {
        'slug': row['slug'],
        'title': row['title'],
        'meta': {...},
        'blocks': blocks
    }

# RSS feed (no fallback)
def generate_rss_posts():
    posts = []
    for article in articles_db.all():
        if article.get('date'):
            year, month, day = article['date'].split('-')
            posts.append({...})
    return sorted(posts, key=lambda x: x['date'], reverse=True)

The results:

Code Reduction

Removed: 105 lines
Added: 41 lines
Net reduction: 64 lines (1836 → 1772 lines)
load_article(): 40 lines → 20 lines
generate_rss_posts(): 61 lines → 31 lines
Cyclomatic complexity: ~8 → ~2

More importantly:

Single source of truth (TSV only)
No "what if TSV fails" mental overhead
Simpler control flow (no try-catch, no if-enabled checks)
Easier to understand for future maintainers
Actually practicing the "Simple > Complex" philosophy

What We Learned

1. Dogfooding reveals truth

You can claim your tool is simple, but using it in production forces you to confront reality. If I wasn't willing to run my own blog on dbbasic-tsv, why would anyone else trust it?

2. Safety nets can be complexity nets

Defensive programming adds code. Fallback logic adds branches. Error handling adds mental overhead. Sometimes the simplest code is code that just works - with no backup plan.

3. Production verification before simplification

The two-step deployment was actually smart: First add TSV with fallback, verify it works, then remove fallback. This gave confidence to simplify without fear.

4. Text files are legitimately fast

The site "feels very fast, as if it's static." That's because TSV loads the entire file into memory on startup and queries are just dictionary lookups. For 47 articles (950KB), this is instant.

5. The 1995 problem still matters

As I wrote in When Databases Made Sense, the question is: "Do I have the 1995 checkout race condition problem?" For a blog with one author and no concurrent writes, the answer is no. TSV is perfect.

Try It Yourself

Want to try dbbasic-tsv? It's on PyPI:

Installation

pip install dbbasic-tsv

Basic usage:

Quick Start

from dbbasic.tsv import TSV
from pathlib import Path

# Create a table
users = TSV("users", ["id", "name", "email"], data_dir=Path("data"))

# Insert data
users.insert({"id": "1", "name": "Alice", "email": "[email protected]"})

# Query data
user = users.query_one(email="[email protected]")
print(user)  # {'id': '1', 'name': 'Alice', 'email': '[email protected]'}

# Your data is just a text file!
# $ cat data/users.tsv
# id  name   email
# 1   Alice  [email protected]

Check out the GitHub repo for documentation, benchmarks, and examples.

The Bottom Line

This blog now runs on a "toy" TSV database. The entire article database is a single 950KB text file. You can grep it, diff it, version control it, or edit it in vim.

It's simpler than the JSON file approach (one file vs 47). It's simpler than PostgreSQL (zero setup). It's simpler than the code I wrote yesterday (64 fewer lines).

Most importantly: it works. You're reading proof right now.

The Dogfooding Test

If you wouldn't use your own tool in production, why should anyone else?

Now we can confidently say: dbbasic-tsv powers real websites. Including this one.

Simple > Complex. Proven in production. At midnight. On a Tuesday.