Building FermiApp: AI as Engineering Partner

Fernando Martel García, Ph.D.

2025.11.28

Background and Motivation

The decisions that matter most to a business are often the hardest to analyze. Whether to acquire a competitor, enter a new market, or bet on a risky product line, these are all strategic choices.¹ They commit large resources, shape the firm’s trajectory for years, and are not the sort of thing you can A/B test and optimize easily.

Even in tech-savvy, data-driven companies, strategic decisions often fall outside the domain of the data science organization. Most data science teams focus on small optimizations and incremental profits at scale, not one-off, high-stakes decisions. It is striking that firms spend millions building data infrastructure and tooling, yet often ignore them when facing their most consequential decisions.

In principle, strategic decisions should be supported by scenario planning, simulation, and judgment informed by both quantitative and qualitative analysis, even when data are sparse. In practice, they are frequently shaped by bounded rationality, politics, and chance (Eisenhardt and Zbaracki 1992). Hence, the gap between how we ought to make strategic decisions and how we actually make them is pretty wide.

There are, however, techniques designed precisely for environments with little data and much uncertainty. One such tool is Fermi estimation:

Fermi estimation is a method for making rough, back-of-the-envelope calculations by breaking a problem into smaller, estimable parts and making educated guesses for each component. The goal is to arrive at an answer that is accurate within an order of magnitude, rather than a precise number, and is named after physicist Enrico Fermi, who was known for his ability to make accurate estimations with limited data.

Source: Gemini (accessed 2025-12-05)

Fermi estimation is used by scientists (Oliveira-Filho, Campos-Silva, and Hanson 2024), decision makers, product managers, superforecasters (Tetlock and Gardner 2016), and participants in prediction markets to reason quantitatively under uncertainty whenever traditional data and models are unavailable or unreliable.

In this blog post, I describe how I used AI to build FermiApp, a terminal-based calculator for Fermi estimation and back-of-the-envelope decision making. The project sits at the intersection of several of my interests: superforecasting and calibration, text-based productivity tools, and AI-assisted development. I wanted an interactive tool where I could define models step by step, revise them quickly, and separate model building from execution. The goal was to keep it simple, fast, and usable entirely from the terminal using text, the language of AI.

The design draws inspiration from Nuño Sempere’s elegant Fermi command line app, and from my work at Microsoft prioritizing Windows features across teams by USD ROI. I also wanted something lighter-weight than tools like Squiggle, Guesstimate, or Causal, a tool powerful enough for realistic decision problems yet quick enough to yield estimates in a few minutes.

I built FermiApp almost entirely with AI assistance, not by delegating everything to an autonomous agent, but by treating the AI as an engineering partner. We worked one step at a time mostly using Chat and Plan mode in VS Code. This post describes that process, what worked, what did not, and what I learned about collaborating with AI on technical projects. A companion post will cover the science of Fermi estimation itself, including the methods, the evidence for its superiority over intuition, best practices, and applications in forecasting and decision making.

Process

I started the project with a spec. Instead of jumping straight into code, I treated Nuño Sempere’s Fermi as a reference implementation and wrote down what I wanted to change. For example, I wanted to define models interactively and to separate model definition from execution. Nuño’s approach is elegant and efficient, but it assumes you know your full decomposition up front, given how it executes as you type. Hence, editing often means starting over. Using his example problems as test cases helped me anchor the spec in concrete use cases.

Early on I decided the app would be a Textual User Interface (TUI). The goal was not to be cute or retro, but to strip away as much UI overhead as possible and keep the tool simple, functional, and fast. A text-based interface is easy for me to work with and easy for AIs to ingest, reason about, and modify. TUIs are a sweet spot for personal tools: fast to prototype, easy to run from the terminal, and amenable to ASCII sketches. It was striking how naturally the AI could describe and refine layouts in plain text.

Mockup of the Textual UI created by Claude 4.5

The AI was most helpful in turning that initial idea into a real technical design. Working in chat felt like partnering with a seasoned engineer who made really useful suggestions on technical requirements. Through several rounds of back-and-forth, we translated the product vision into concrete requirements and constraints. By the time we were done, we had a shared spec that was specific enough to build from and narrow enough to be feasible.

At that point I could have handed the spec to an autonomous agent in VS Code and asked it to build the whole thing. I chose not to. I wanted to stay close to the stack and verify each change myself. So I used chat for almost everything and reserved Plan mode for sprint planning. Taking an agile approach, the AI and I agreed on a thin, end-to-end first slice to de-risk the project: core engine, scalars only, a basic Textual UI, and no randomness or simulation. When I asked for Sprint 1 goals, the AI suggested starting with a simple CLI. I overruled it. I wanted the Textual UI in place from the start so I could test the end-to-end experience, not just the core logic.

Plan mode then produced a concrete Sprint 1 plan: goals, steps, definition of done, and deliverables. I then used the AI to implement the plan one step at a time, interactively in the chat. For each step it proposed code and tests in chat that I copied into files. Before running the tests using VS Code’s test utilities, I opened a terminal in VS Code, launched the Python CLI, imported the new module, and did some testing on my own. That manual exercise often revealed edge cases not covered by the initial tests. If I did not understand a function, I asked the AI to walk me through it. Only once I had a decent grasp of the code did I run the automated tests.

About half the time everything passed on the first run, which is more than I expected. I attribute this success to the detailed docs and step-by-step execution with a rich context. The rest of the time, I pasted test error messages back into chat and the AI produced fixes, usually in a single iteration. I repeated this step-by-step loop until the sprint was complete.

I did not write automated tests for the Textual layer at this stage. The overhead felt high relative to the payoff. Instead, the AI proposed manual test scenarios and I ran them, which also forced me to “eat my own dog food.” Because Textual is relatively niche the AI stumbled, writing standard CSS instead of Textual’s simplified TCSS. I had to point it at the Textual docs and insist on TCSS syntax. That became a useful pattern. In domains that are niche or dominated by a different default, you have to steer the AI explicitly and give it the right references.

After finishing Sprint 1, I returned to Plan mode with the spec and the previous plan as context, asked for options for Sprint 2, and chose to focus on the next big risk: adding simple uncertainty via uniform distributions. The AI produced a new Sprint 2 plan in raw Markdown, which I saved into the docs folder as both roadmap and grounding for the next implementation cycle. The end product is in the FermiApp repo. Take a look.

Learnings

Spending time on the spec and sprint planning paid for itself many times over. I do not always know what I want at the start of a project. Working with the AI to generate options, list pros and cons, and explore the space of possibilities helped surface tradeoffs before any code was written. Once we had that landscape, choosing a thin end-to-end slice that delivered real value and de-risked the project felt natural. In my work I always try to step on any landmines as early as possible. It is much better to discover an insurmountable obstacle on day two than on day twenty. The idea is to fail early and learn fast.

In practice I have found it much easier to delegate to AI when I know the domain and can verify the work. Working step by step, rather than handing off the entire spec to an autonomous agent, kept me engaged with the technical details. This one-step-at-a-time approach made it straightforward to supervise the AI, validate outputs through tests and manual inspection, and course-correct as needed. Specs, sprint plans, and especially automated and manual tests provided clear checkpoints that kept both me and the AI grounded and aligned.

This mirrors my heuristic for delegating to AIs: automation works best when you understand the problem and can verify the solution. Because Scrum is fundamentally an empirical process, the core practices (specs, sprint planning, demos, retrospectives) remain valuable as a way to understand the domain and verify the output. What changes is the clock speed. With AI as your engineering partner, you can run the entire loop much faster.

The main bottleneck in this setup was not engineering speed but my own span of control, so sprint length had to come down. With a human team, weekly or biweekly sprints are common. Here, even dedicating only a few hours a day, I could complete meaningful sprints in a day or two. The experience felt like being a technical PM paired with a very fast, very capable engineer. In that world, it is better to compress the work of a weekly sprint into a day than to try to do the work of seven weekly sprints in a week.

Looking ahead, the counterfactual is instructive. Without AI, I doubt I would have built this app at all, let alone as quickly, reliably, and with as much learning. A rough guess is that AI made me about ten times more productive. At the same time, I am glad I did not use full agent mode for everything. An agent might have produced a working app faster, perhaps with similar quality, but the internals would have remained a black box. That is a problem if you are ultimately the one in charge. You cannot manage an AI, or supervise human collaborators, if you are completely removed from the stack. Sometimes using chat and copy-paste (or better yet, typing the code yourself) slows you down in the moment but pays off in understanding and future progress.

More broadly, this project changed how I think about what is being automated. From a PM’s perspective, it is tempting to see the AI as “the engineer.” From an engineer’s perspective, it can feel like the AI is automating the PM: the spec generation, documentation, and sprint planning. In both cases, the crucial point is that the AI does not choose goals or tradeoffs on its own. Left to itself, it tends to follow the most well-trodden paths, producing something generic. You have to steer it.

Finally, AI has not automated away the customer. In this project I was the customer, and the AI could not second-guess what I wanted. For commercial products, AI can help you build faster and explore more ideas, but it will not make customers flock to your app. Product–market fit still lives in the hands of users. What AI changes is the speed at which you can iterate toward it.

References

Child, John. 1972. “Organizational Structure, Environment and Performance: The Role of Strategic Choice.” Sociology 6 (1): 1–22. https://doi.org/10.1177/003803857200600101.↩

Dutton, Jane E., Liam Fahey, and V. K. Narayanan. 1983. “Toward Understanding Strategic Issue Diagnosis.” Strategic Management Journal 4 (4): 307–23. https://doi.org/10.1002/smj.4250040403.↩

Eisenhardt, Kathleen M., and Mark J. Zbaracki. 1992. “Strategic Decision Making.” Strategic Management Journal 13 (S2): 17–37. https://doi.org/10.1002/smj.4250130904.↩

Mintzberg, Henry, Duru Raisinghani, and Andre Theoret. 1976. “The Structure of ‘Unstructured’ Decision Processes.” Administrative Science Quarterly 21 (2): 246. https://doi.org/10.2307/2392045.↩

Narayanan, V. K., and Liam Fahey. 1982. “The Micro-Politics of Strategy Formulation.” Academy of Management Review 7 (1): 25–34. https://doi.org/10.5465/amr.1982.4285432.↩

Oliveira-Filho, Edmar R, Rodrigo Campos-Silva, and Andrew D Hanson. 2024. “Running Fermi Calculations as a Superpower to Gauge Reality.” Plant Physiology 198 (3). https://doi.org/10.1093/plphys/kiae347.↩

Shivakumar, Ram. 2014. “How to Tell Which Decisions Are Strategic.” California Management Review 56 (3): 78–97. https://doi.org/10.1525/cmr.2014.56.3.78.↩

Tetlock, Philip E, and Dan Gardner. 2016. Superforecasting: The Art and Science of Prediction. Random House.↩

A strategic decision is a non-routine, top-management choice that significantly and often irreversibly commits organizational resources and changes the firm’s overall scope and long-term direction, thereby shaping its relationship with the environment and structuring subsequent lower-level decisions (Mintzberg, Raisinghani, and Theoret 1976; Child 1972; Narayanan and Fahey 1982; Dutton, Fahey, and Narayanan 1983; Shivakumar 2014) ↩︎