Engineering approach: Startup Mode v/s Big Tech Mode
Last week, I delivered a talk at PyDelhi discussing strategies that leverage how large language models work to improve the performance of your LLM applications. Here is the talk as a PDF.
Some techniques I discussed were:
- Strategies for faster token throughput.
- Strategies for quick time to first token.
- Effective context window management and
- Model routing strategies.
But here's the uncomfortable truth for founders: if you're just starting your LLM startup, you should completely ignore this advice.
Let me explain why — and when application performance actually matters.
How It Works in the Ideal World: Big Tech's Playbook
Imagine the typical trajectory at a company like Google or Stripe. You see a problem in the market. It's well-defined. Your user base is established. You build a team to solve it.
Your first step isn't writing code—it's understanding your performance requirements.
You study incumbent competitors. You conduct user research. You measure what your users actually tolerate. For e-commerce, that's Amazon's 5-second response time threshold. For payments, that's Stripe's sub-100ms latency requirement. For real-time LLM interfaces, that might be streaming tokens within 200ms.
These user expectations become Service Level Objectives (SLOs)—formal performance, reliability, and usability targets your application must meet to remain competitive.
Once you have SLOs, someone (usually a principal engineer or architect) translates them into a system architecture. This involves:
- Weighing architectural tradeoffs (monolith vs. microservices, synchronous vs. asynchronous)
- Selecting technology stacks for different components
- Deciding on execution environments (web app vs. IDE plugin vs. CLI tool)
- Planning for scale from day one
This approach works beautifully for mature companies with stable product-market fit. You have reliable data about what your users need, so you can build the right system the first time.
The Cost of Performance Engineering
Performance optimization requires trade-offs—all of them expensive.
At Google and VMware, my teams answered questions like:
- How much does adopting AVX-512 improve RAID-6 computational throughput?
- What's the performance gain if we convert random disk I/O into sequential reads?
- How much latency can we save by building local caches with remote diffs?
- Can we prefetch data and pipeline operations based on dependency graphs?
These questions have answers, and the answers are valuable. But solving them has a cost: optimized code is complex, harder to understand, and harder to debug.
Consider a simple workflow with a few network calls and database queries. Now transform it for performance: add Redis for slow queries, implement continuations for async operations, consider UDP over TCP for specific data patterns, reduce logging overhead.
Consider a simple workflow with a few network calls and database queries. Now transform it for performance: add Redis for slow queries, use async with continuations, add TCP connection pooling with keepalives, distribute read load across multiple backend instances, say NO to heap allocations.. you get the point.
Each optimization adds complexity. Each line becomes harder for the next engineer to reason about.
Performance engineering also locks you into early technical decisions. Refactoring code can mean rolling back optimizations.
How It Actually Works: The Startup Reality
Here's where the Big Tech playbook breaks down.
At a startup, almost nothing is stable. Your SLOs don't exist yet because you don't know who your customers are. Your product architecture will change—not once, but repeatedly.
The sources of uncertainty are constant:
-
Product pivot: Your initial idea evolves. Instagram started as Burbn, a cluttered check-in app with photos as a side feature. When founders realized users were ignoring the check-in functionality and only engaging with photo sharing, they stripped everything away and rebuilt the architecture around that single use case.
-
Customer pivot: You discover your ideal customer profile is different from what you assumed. That financial services firm won't buy your product, but the open-source community will—and they have completely different scalability requirements.
-
The landscape is evolving: New models, new APIs, better caching strategies emerge monthly. Locking into early architectural decisions is especially costly.
-
Your use cases will change: You might start with synchronous inference, then need streaming. You might start with single-turn interactions, then add multi-turn conversations. Each shift requires rearchitecting.
As Gergely Orosz noted after years at Uber: the biggest constraint at startups isn't computing resources—it's the coordination overhead. At big tech companies, you wait days for approvals on simple plumbing changes. At startups, you need to move fast and change direction constantly.
The Counterargument: When Performance Matters Early
I need to be clear: there are exceptions.
If your business model directly depends on latency—say, you're selling real-time trading alerts and charge per-millisecond-saved—then performance optimization matters from day one.
If your unit economics fundamentally depend on throughput (you make money per inference, and your margins vanish if you're inefficient), then measure and optimize.
But ask yourself honestly: is performance actually your constraint, or is it a distraction?
Most startups discover their real constraints are customer acquisition, product-market fit, and unit economics—not milliseconds.
What to Do Instead
Here's your startup engineering philosophy:
Use third-party solutions liberally. Use managed databases instead of self-hosting Postgres. Use cloud APIs instead of building infrastructure. Use open-source libraries even if they're slower or have some overhead. The velocity gain from not building custom infrastructure outweighs the performance cost—until you reach scale.
As Paul Graham noted in his essay on startup strategies: founders often resist early customer work because they'd "rather sit at home writing code." The same applies here. You'd rather optimize your codebase than talk to customers. Both are mistakes.
Optimize for changeability, not performance. Write simple code that's easy to refactor. Clear, straightforward solutions beat clever optimizations.
This means:
- Choose simple data structures over complex ones
- Write tests that give you confidence to refactor
- Measure, but don't optimize based on measurements
Think of it this way: if you were solving this problem in a language like Python or JavaScript (where performance is never the limit), what would you do? Do that. Build it carefully, but don't overthink it.
Build the metrics foundation, but not the optimizations yet. Set up basic monitoring from day one. Understand where time is spent. Just don't act on it yet—collect data for when it matters.
The Inflection Point: When Everything Changes
Here's the transition: when your product stabilizes and you have real users, everything changes.
Once you've validated that customers actually want what you built, and you understand your unit economics, then you switch modes. At this point:
- Define your actual SLOs based on user behavior and business requirements
- Profile your application to find real bottlenecks
- Invest in performance engineering
Notice what happens at this stage: you have a product that works, customers who are paying, and clear visibility into what's slow. You're no longer gambling on architecture decisions.
The Real Lesson
The difference between Big Tech and startups isn't that Big Tech engineers are smarter. It's that Big Tech has certainty about its problem space, while startups operate under radical uncertainty about everything.
The engineering approach must match reality.
The best startup engineers I've known—including those who came from Big Tech—learned to shift modes. They brought discipline and architectural thinking from their Big Tech experience, but they abandoned the assumption that everything needs to be perfect from day one.
Your job as a startup founder isn't to build the most performant system. It's to build something that works, that users want, and that you can change when you learn something new.
Performance optimization will still be there when you need it. For now, focus on moving fast and learning what actually matters.
Key Takeaways
| Big Tech Engineering | Startup Engineering |
|---|---|
| Problem space is known; optimize for scale | Problem space is uncertain; optimize for learning |
| SLOs defined upfront based on market research | SLOs emerge from customer feedback |
| Complex architecture justified by requirements | Simple architecture enables rapid pivots |
| Performance optimization adds value | Performance optimization is often wasted work |
| Code should be optimized and reliable | Code should be clear and changeable |
Your job early on is to prove the hypothesis, not to implement it perfectly.