Work / imessage-history

imessage-history

Open-source Python CLI

Open-source Python CLI · stdlib only

macOS keeps every iMessage you have ever sent in a local SQLite database (chat.db), but the raw schema is awkward: outgoing rows point at the recipient instead of the sender, long messages are stuffed into an Apple typedstream blob that truncates at 255 characters if you parse the length prefix wrong, and tapbacks live as cross-referenced rows. imessage-history smooths all of that out and emits one window of one conversation as a clean, speaker-attributed dataset ready for analysis or LLM prompting.

It is Python 3.10+ with zero required runtime dependencies, opens chat.db read-only with both URI flags and PRAGMA query_only (and a regression test that proves every write statement raises and the file is byte-for-byte unchanged after close), and ships an opt-in pseudonymizer for handles, names, phone numbers, emails, and URLs for the moments when the only reasonable way to think about a long thread is to drop it into a hosted model.

An optional Textual TUI ships under the [tui] extra for a guided picker-and-window flow; the headless CLI still works exactly the same. Free and open source.

A read-only, stdlib-only Python exporter that pulls one iMessage conversation off macOS chat.db and turns it into AI-ready CSV / JSON / Markdown / TXT. Optional pseudonymization for hosted-LLM use. Free and open source.

What it is, in one paragraph

imessage-history is a single-purpose CLI for a single use case: take one conversation out of your local Messages database, in one time window, and produce clean files you can analyze, archive, or feed to an LLM. macOS keeps every iMessage in a local SQLite database (~/Library/Messages/chat.db), but the raw schema is awkward and easy to misread. This tool smooths it out and emits a speaker-attributed dataset with a clear audit trail.

Why it exists

The chat.db schema has subtle traps: outgoing rows point at the recipient (not the sender); long plain-text messages live inside an Apple NSAttributedString typedstream blob whose length prefix truncates at 255 chars if you parse it wrong; tapbacks live as cross-referenced rows with prefixed GUIDs; edits, retractions, and app-payload rows can have NULL text AND NULL attributedBody.
Most existing exporters either dump everything (great for archival, noisy for AI) or live on the cloud (problem: this is private message data).
An AI-ready single-conversation export with explicit speaker labels turns out to be a really good unit for analysis, summarization, and reflection — and trivially redact-able.

What it does, concretely

One-file Python. imessage_export.py plus a small package under imessage_export/. Python 3.10+. Zero required runtime dependencies.
Read-only by construction. Opens chat.db with mode=ro&immutable=1, sets PRAGMA query_only=ON, asserts the read-only PRAGMA actually took, and ships a regression test proving every write statement (DELETE / UPDATE / INSERT / CREATE / DROP / ALTER / REPLACE) raises and the file is byte-for-byte unchanged after close.
Speaker attribution everywhere. author_label is the source of truth — outgoing rows are relabeled with --me-name; incoming rows resolve via your contacts.csv (phone + email normalization).
Local time windows, Apple-epoch math. --date / --start-time / --end-time / --start-datetime / --end-datetime all interpret bounds in the system's local timezone, convert to Apple's 2001-epoch nanoseconds, and write the resolved window (local + UTC + Apple-ns + detected unit) into the JSON metadata block and the AI-ready header.
Five output formats per export. conversation.csv, conversation.json, conversation.txt, conversation.md, and conversation_ai_ready.txt (header + speaker-attributed body + attribution footer), plus an analysis_prompt.txt template.
Optional pseudonymizer. --redact / --redact-only swap handles, names, phone numbers, emails, and URLs for Person A / Person B / [PHONE] / [EMAIL] / [URL]. A pseudonym_map.json is written alongside for de-redaction. --suggest-names scans for likely third-party names not in your contacts.
Optional Textual TUI. pip install 'imessage-history[tui]' gives you a guided picker-and-window flow inside the terminal. The headless CLI (imessage-export --chat-id N --date YYYY-MM-DD) still works exactly the same.
No network. Zero outbound calls, no telemetry, no auto-update. The Messages DB requires Full Disk Access for the running process — if FDA is missing, the tool fails fast with a clear message.
Privacy hygiene by default. umask(0o077) so new files are 600 and new dirs 700. exports/, contacts.csv, and pseudonym_map.json are gitignored.

Tech and architecture

Language: Python 3.10+, stdlib only for the core; [tui] extra adds Rich + Questionary + Textual.
Layout: Core under imessage_export/ (timestamps, decoder, models, db, contacts, window, export, writers, redactor, cli). TUI under imessage_export/tui/ (Phase 1 linear questionary wizard + Phase 2 Textual app). Core modules never import TUI at module top-level.
Decoder: Hand-written decode_attributed_body that walks Apple's typedstream, correctly parses the 0x81 two-byte little-endian length prefix (the truncate-at-255 gotcha), and strips the (U+FFFC) attachment placeholder.
Writers: write_csv, write_json, write_txt, write_markdown, write_ai_ready each take (path, messages, metadata), share no state, and are unaware of redaction — the Redactor runs as an optional second pass before the writers see the data.
TUI theming: Two-palette scheme (dawnfox / terafox) inspired by EdenEast/nightfox.nvim, auto-detected from macOS appearance, overridable via flag / env / settings. No hardcoded colors — everything routes through tui/theme.py semantic roles.
Tests: stdlib unittest, runs with no external deps. Includes the read-only regression test, decoder regression tests against the typedstream edge cases, and timestamp math against known macOS DB samples.

Use cases

Personal reflection. Pull a day's worth of a conversation, paste the AI-ready file into a local LLM (Ollama / LM Studio recommended for privacy), and ask for themes, action items, or a Voice-Memo-style summary.
Relationship archives. Generate clean per-day Markdown for Obsidian or Notion — conversation.md is structured for it.
Hosted-model use. Use --redact-only first so names, phone numbers, emails, and URLs are swapped for Person A / [PHONE] before anything leaves the machine. The pseudonym_map.json stays local for de-redaction.
Investigations and data requests. A single chat-id + window is exactly the unit a lawyer or HR investigator typically wants, in a format they can read.

What I'd talk about in an interview

The read-only guard is layered, not a single PRAGMA. URI flags + runtime PRAGMA + pre-flight assertion + regression test on every write statement raising + file-hash unchanged after close. Treat the database like the immutable source-of-truth it is.
Schema gotchas as first-class invariants. Each one (handle_id is the OTHER party on outgoing rows, attributedBody length prefix is 2 bytes, tapback associated_message_guid has a p:N/ or bp: prefix, NULL text doesn't mean empty) has a comment in the code and a test case. Future-me can't accidentally re-introduce them.
Local time vs Apple-epoch. Every bound the user types is local; every comparison against the DB is Apple-epoch-nanoseconds. The resolved window is serialized in both so the metadata is reviewable and self-describing.
Privacy as the product surface, not the README. umask(0o077), gitignored outputs, no network calls, redactor as an opt-in pre-writer pass, pseudonym map treated like a password. The README leads with privacy and .gitignore already excludes everything sensitive.
TUI as an extra, not a dependency. The default install is stdlib only — important for pip install imessage-history to keep working on machines where adding deps is friction.

Repo, install, links

Install (default): pip install imessage-history
Install (with TUI): pip install 'imessage-history[tui]'
License: MIT
Privacy: read-only access to chat.db, zero network calls, optional pseudonymization

Visit live site