Rajesh Goli

Rajesh Goli

I run end-to-end engineering and science teams
built entirely on AI agents.

Former Amazon AGI. Now running AI agent teams that build the app, generate the data, train the models, interpret the results, and feed it all back — a recursive loop that improves itself. The tools below? I built those because I had to.

0 messages to AI agents
in 186 days
0 sessions
across providers
0 tool calls
executed
0 active days
since Dec 2025
Session Manager app icon

Agents that talk to each other,
ship code, and report to your phone.

I needed a way to run 6 agents in parallel across engineering and science workstreams. Native Claude Code and a forked Codex CLI on tmux — the fork adds the telemetry hooks Claude ships natively, so SM tracks every Codex tool-call and sends messages without hijacking the terminal. No wrappers, no third-party abstractions. Agents message each other, create PRs, review code, and merge while you watch from Telegram.

EM claude
$ sm dispatch engineer-7a2 --issue 2644
→ Delivered to engineer-7a2
Going idle. Waiting for results...
 
...
engineer-7a2 PR #2646 merged to dev
$ sm dispatch engineer-7a2 --issue 2645
→ Next ticket dispatched
Engineer claude
em-main Implement issue #2644
$ read docs/working/2644_spec.md
Reading spec... implementing...
$ git checkout -b feature/2644-auth-refresh
Editing src/auth/token_manager.py
Editing tests/test_auth.py
$ git push && gh pr create --title "Fix token refresh #2644"
✓ PR #2646 created
$ sm send reviewer-3b1 "Review PR #2646"
Waiting for review...
reviewer-3b1 Fix 2 findings on PR #2646
Fixing: missing edge case test, stale import
$ git push && gh pr merge #2646 --merge
✓ PR #2646 merged to dev
$ sm send em-main "Done: PR #2646 merged"
Reviewer codex
Idle — $0/token
 
engineer-7a2 Review PR #2646
$ gh pr diff 2646
Reviewing 3 files, 47 additions...
$ gh pr comment 2646 --body "2 findings"
$ sm send engineer-7a2 "Fix 2 findings"
Waiting for fixes...
PR merged. Going idle.
EM → Engineer
Engineer → Reviewer
Reviewer → Engineer
Engineer → EM
9:41 ●●●●  ■
sm watch
Last sync 2s ago · rajeshgoli
em-main 1749a2fe · em
2h
working running claude
engineer-7a2 a3f812cb · engineer
47m
working running claude
Implementing #2644 — editing src/auth/token_manager.py
reviewer-3b1 d7e291ff · architect
12m
thinking running codex
Reviewing PR #2646 — 3 files, 47 additions
architect-9c4 82bc40a1 · architect
1h
idle stopped claude
Watch
Analytics

Built-in mobile app. Every agent, every message, every status — at a glance from your phone.

Native Claude Code + forked Codex CLI Tool-call telemetry on both providers No wrappers or 3P abstractions 984 tests $0 while agents sleep
View on GitHub →
DeskBar app icon

macOS shows apps.
You work in windows.

macOS's Dock tells you Safari is running — not which of three Safari windows holds the GitHub PR. DeskBar replaces it with a real Windows-style taskbar: one button per window, Option+Tab that cycles windows LRU (not apps) with live thumbnails, a built-in system resource widget, a keyboard launcher, and a native plugin system — including live session-manager integration. A productivity powerhouse for macOS, with no more flow-breaking trips to Mission Control.

macOS Dock
C
S
V
F

“Safari is open.”

But which of three windows? How many sessions of Claude? Can't tell.

→ Cmd+Tab. Mission Control. Click around. Flow broken.

DeskBar
C Spec for #2644 — agent-os
C PR #2646 review
S GitHub — rajeshgoli/deskbar
S ScreenCaptureKit docs
V TaskbarPanel.swift
F Finder — Projects (min)

Every window. Named. One click away.

PR #2646 · live preview
S
V
C
C Spec #2644
C PR #2646
S GitHub
V auth.py
MEM
11/18G
CPU
38%
sm
em eng rev
9:41
launcher · pinned + kbd one button per window system widget · plugins tray
+ Tab cycle windows — LRU, not apps
C PR #2646 review
S GitHub
V auth.py
C Spec #2644

Hold Option, tap Tab to advance, release to switch. Browsers, terminals, editors — jump directly to the window you actually want.

One button per window Option+Tab — windows, not apps Live thumbnails (ScreenCaptureKit) MEM / CPU / GPU widget Plugins — incl. session-manager Multi-monitor Pure Swift/AppKit, zero deps
View on GitHub →
Backup Manager app icon

Encrypted backups.
Sleep through the next disk failure.

Agent configs, API keys, project data, db dumps — sensitive local state I'd be dead without. Backup Manager has two faces: a Rust core that runs forever via daily LaunchAgent (low memory, robust, no babysitting), and a SwiftUI macOS app for when you'd rather click than type. tar + bzip2 + GPG-encrypted deltas land on Google Drive; restore from any date with one command — or one button.

Rust daemon · CLI
backup-manager status
$ backup-manager status
Latest checkpoint 2026-05-14 · 5d ago
Latest delta 2026-05-19 03:17
Total on disk 1.2 GB compressed
Encryption GPG · AES-256
Destination Google Drive ✓ synced
Recent runs
2026-05-19 03:17 delta 12 files 2.4 MB
2026-05-18 03:14 delta 8 files 1.1 MB
2026-05-17 03:11 delta 23 files 4.7 MB
2026-05-14 03:09 full 312 files 1.2 GB
Swift SwiftUI · macOS app
Backup Manager
Overview
Sources
Storage
Logs
▰ menu bar extra
All protected
312 files · last checkpoint 5d ago
Next run 03:00 · daily
Storage 1.2 GB · Google Drive ✓
Sources 6 configured
📁 scan + diff
📦 tar + bzip2
🔒 GPG encrypt
Google Drive
Rust core + SwiftUI app tar + bzip2 + GPG Encrypted deltas + checkpoints macOS Keychain passphrase Daily LaunchAgent + menu bar extra One-command (or one-button) restore
View on GitHub →
Agent-OS icon

Agents need org charts,
not bigger context windows.

When you're running engineering and science teams on AI agents, you need the same thing human orgs figured out — roles, review protocols, handoff rules, accountability. This is the operating system my agent teams run on.

Product
EM
Architect
Engineer
Engineer
Scout

"7–17 spec review rounds before a line of code is written. Those specs need 2–3 fix rounds. Specs that skip? 4+, burning 50% more tokens and time."

View on GitHub →
Office Automate icon

Two questions.
One signal.

Two questions every workday: is the air I'm breathing any good, and did I actually work? Office Automate answers both from a single signal — the presence state machine that decides ERV speed is the same one that decides whether a git commit counted as office time. Runs on a Mac Mini in the office. No Kubernetes, no Home Assistant. Three Launch Agents and a SQLite file.

Office Mini · Live PRESENT · 4h 12m in seat
721 ppm CO2
72.4 °F
48 % humidity
0.32 ppm tVOC
5 μg/m³ PM2.5
41 dB noise
ERV Auto · quiet
HVAC Eco heat · 71°F
Door Closed · 4h 18m
This week · project leverage 28h 12m verified in seat
session-manager 34 commits 142 Claude · 87 Codex
deskbar 18 commits 23 Claude
office-automate 12 commits 18 Claude · 12 Codex
agent-os 9 commits 5 reviews
sm dispatches · 412 engram folds · 23 persona reads · 38

Sensors

Door / window Motion Mac activity CO2 / tVOC

State machine

PRESENT / AWAY door-open mode settle windows

Outputs

ERV + HVAC control Productivity ledger

Same signal. Two jobs.

Single source of truth for both halves Hysteresis-tuned ERV control Per-project input ledger PWA dashboard · Google OAuth Mac Mini · Python + React + SQLite
View on GitHub →
0 contributions
0 public repos
0 messages / day
to AI agents
0 total agent
messages

Other things I've built along the way.

Writing.

Essays, fiction, and analysis from the blog — spanning 2005 to present.

Loading posts…
All posts on the blog →

The backstory.

It started in 2003 — training a neural network to recognize faces on a Pentium, writing TCP/IP stacks on 8051 microcontrollers, building paint clones in raw DOS graphics. A computer science degree at BMS College of Engineering, then an MBA at IIM Bangalore.

At Alcatel-Lucent, I went from writing routing protocols to running technical support strategy for a 1,000-person IP division. Then co-founded a data visualization startup. When that didn't work out, I joined Microsoft as the founding PM for Azure Backup — built the vision from scratch, growth hacked it 10x in three years, landed as leader in Gartner's Magic Quadrant.

At AWS, I spent seven years across two roles — Principal PM and then Senior Manager — launching Honeycode, creating its pricing (approved by Andy Jassy), and eventually championing a GenAI product for knowledge workers that became Q-apps, with $100MM+ revenue potential. Four patents. 300+ customer interviews across 45 enterprise accounts. Then Amazon AGI — where I led the "memory pattern" processing 5 billion data items a week across 5 lines of business.

In March 2023, I wrote a memo on the disruption potential of generative AI. I argued that AI "workers" operating under human supervision was tractable in the near term, and that chat was the wrong modality — complex tasks need iterative, asynchronous architectures with built-in validation. I estimated five years. It happened in eighteen months.

In January 2026, I left to build independently. I now run end-to-end engineering and science teams built entirely on AI agents. They build the application, generate the data, train the models, run mechanical interpretation on the results, and feed everything back into the loop — recursively improving the whole system. Claude and Codex work as teammates: Claude writes specs, Codex reviews them. Codex writes code (0.3 fix rounds per ticket), Claude reviews it (catches architectural drift). The tools on this page? I built them because I needed them to run those teams.

"Using AI agents is an empirical science. I didn't design this workflow. I tried every combination, tracked the data, and this is where it converged."

Let's talk.

Building in AI agent infrastructure, multi-agent systems, or thinking about what AI-native organizations look like? I'd like to hear from you.