How to refactor for Startups 2022: reasons and tradeoffs

(Cross-posted from the Klotho web site)

This blog post is a high level overview on the reasons to refactor code and systems in a startup setting. We cover risks, approaches and tradeoffs to consider in 2022.

How to judge when the cost is worth the gains

Be honest with yourself

The temptation will always be to refactor: real-world code is messy, and engineers don’t like messy code. Make sure there’s a business case for refactoring by measuring how much time the team is spending on directly customer-visible features.

Our research shows that mid-sized companies and fast-growing startups spend 39% of engineering capacity on undifferentiated work, like infrastructure. Split stories into sub-tasks like infrastructure and refactoring so you can measure where your time is going.

Well-structured is well-maintained

Clear boundaries between modules make them easier to test, deploy, and monitor. Keep an eye on customer-reported bugs, service latency, and how often you have to revert code. On the other end of the software lifecycle, there are multiple indicators that you’re trending towards a bottleneck: how much time it takes to go from design to shipped, the degree of engineering satisfaction and raising infrastructure costs — these can all be leading indicators that you’ve built up debt.

Inflection points in the businesses

Code bases tend to organize themselves on three dimensions: team size, pace of new features, and number of customers. When these numbers change significantly, it may be time to look at splitting up components.

If you’re growing the team or increasing the rate of feature development, the limiting factor will be the code’s readability. Start with targeted, opportunistic refactoring. If your customer base is growing, you may need more scalable technologies or workflows.

Control what you can, plan for the rest

Choosing right won’t prevent re-architecting

Refactoring and re-architecture doesn’t mean you made a bad choice earlier. More often, the driving forces behind re-architecture are tied to requirement changes or external factors. There are at least 4 significant dimensions that will force a re-architecture over the lifetime of a product: new feature development rate, engineering team size, the amount of time spent on undifferentiated work, and customer growth. Each of these is progressively harder to directly control.

Start with the easy dials…

f the four dimensions, the two that are easiest to control are the rate of new features and the engineering team’s size. You can control the rate of new features by being stricter about planning and prioritization. Scaling the team is a slower process, but it’s usually one you can at least plan for.

If both the team and the rate of new features is small, refactoring is unlikely to have a significant impact on the business. At the other end of the spectrum, a large team working on many features may benefit from reorganizing into smaller teams — and you should consider refactoring or re-architecting the code to match. An architecture that enables cleaner organizational and code boundaries will allow the product and company to scale.

…and then move onto the harder ones

The amount of time your team spends on undifferentiated work can be hard to rein in, and customer growth is the hardest measure of all to affect. If these were easy, everyone would minimize undifferentiated work and maximize customer growth! Still, you can

get ahead of problems with a careful and proactive approach to refactoring.

The first step is knowing when not to refactor. If your customer growth and amount of time spent on undifferentiated work is low, don’t spend time on refactoring: focus instead on impactful, customer-visible features. Similarly, if you have good customer growth and low amount of undifferentiated work, your team is doing well. Consider tactical refactoring to avoid the amount of undifferentiated work from growing, but don’t spend too much time on it.

If your team is spending too much time on undifferentiated work, it’s time to revisit the architecture to one that scales better to where your company is today.

If your customer adoption is lower, your priority should be a cheaper architecture that will give you more runway.
If both your customer adoption and the amount of time your team spends on undifferentiated work are high, it may be time to focus on a centralized, optimized solution. This typically takes the form of a dedicated operations team that can efficiently execute on infrastructure tasks. This is a great problem to have — so take a moment to congratulate yourself and your team for getting here!

Have a target, then find shortcuts to get there

Have a plan, even if it’s not perfect

Once you’ve committed to a re-architecture, don’t be afraid to think big. Lean on your engineers to come up with an end-state they’d love, and then pare it down as needed. Chances are, the opportunity for a major re-architecture will only come once or twice in a product’s lifecycle, so be prepared to live with any compromise you make. But by the same token, know that even the best laid plans will go awry as you start implementing. 

Make big plans, take little steps

Once you know where you want the code to be, be tactical about how to get it there. Work one one component at a time, or pick components that are as far away as possible from each other. If you haven’t already invested in solid testing, both at the unit and system level, now’s the time. Tests will give you confidence that your changes won’t break existing customer experiences, but they can also help your team come up with its definition of done. When the tests pass, the component is ready!

The best technology is the one you can adapt

The key to reducing the impact of refactoring and re-architecting on startups in particular is to use technology that is adaptable. 

Historically, companies choose specific technologies like VMs, serverless, or containers to host their applications. The problem is that switching from one technology to another is prohibitively expensive, and what you need today may not be what you need tomorrow.

An adaptive architecture is one that lets you host your application on any technology equally easily. This lets you adjust the hosting environment on the fly, to match your current needs.  Specific technology choices like AWS Lambda, Fargate, Kubernetes, gRPC, Linkerd, Azure/GCP become interchangeable.

By reusing existing programming language constructs like functions and event handlers, as well as interfaces that are idiomatic to each language, adaptive architectures make cloud services easier to use.

Look for abstractions and tools that are lightweight, but flexible enough to let you switch technologies. We think Klotho annotations fit the bill, since they let you separate your architecture’s semantic meaning from the deployment configuration — but  with enough investment in runtime libraries and infrastructure automation, you can build out a similar solution yourself.

Serverless vs. Microservices: Two Sides of the Same Coin

I just wrote up a piece around the confusion the Internet creates around Serverless and Microservices. The “Serverless vs. Microservices” debate presents a dilemma between two supposedly incompatible strategies that must be fundamentally at odds with each other. In reality, they are as similar as two flavors of ice cream – you might prefer chocolate chip, but strawberry will work just as well.

Check it out at the official Klotho blog.

Cloud computing architecture for the next ten years – Part 2

Cloud development has become prohibitively complex, and the current generation of solutions have low-level interfaces that require extensive investment from developers and operators to understand how to configure, learn, assemble, and scale them properly. For a new architectural shift to occur, we need approaches that absorbs the cognitive load, not streamline it.

Maintain benefits from existing architectures

There has been a continuous discussion among backend and service developers about whether things should be built using one strategy (monoliths) or the other (microservices). There are no one-size-fits-all solutions, because there’s always a trade-off involved.

Monolithic development offers high productivity, ease of deployment, and a straightforward observability story. Microservices offer flexibility in fault isolation, resource tuning and team autonomy. Unfortunately, microservice-based architectures usually involve piecing things back together – back into the monolith’s basic architecture, but with duct tape. As a result, the benefits of neither solution are fully realized.

When building Klotho, we zoomed out and asked, “What aspects of computer engineering can we apply to bridge that gap?”. We concluded that a key characteristic in the new architecture must be the convenience of monolithic development, coupled with an adaptive system that leveraged the benefits found in microservices architectures. Most importantly, it has to reduce the cognitive load for developers, while maintaining configurability and control for operators.

By focusing on developer and operator intent, we created a solution based on ease of use through separation of concerns. Using three different programming constructs, Capabilities, Requirements, and Directives, developers and operators can specify what parts of the application should be cloud-aware, what additional tradeoffs Klotho should consider for your application, and what specific overrides are required.

Solution: Developers should write code the way they know best. We leverage their intent early on to determine what backend wiring and analysis is done behind the scenes to properly meet their needs. Requirements and Directives allow developers and operators to provide more fine tuning and controls without developers needing to change the code.

Read the rest of the post on the official blog:

Cloud computing architecture for the next ten years

In computing, bigger and more ambitious dreams have always been realized by pushing the limits. Cloud computing is no exception; parallel computing, cluster computing, grid computing, and edge computing are all continuously expanding what we consider to be possible. But they also make development more difficult.

Cloud computing is now in the phase of streamlining complexity. There are several examples of integrated solutions that are optimized for certain workloads or development models: Google’s Anthos, Amazon’s Outposts, Azure’s Stack Hub, and Hashistack.

These solutions bundle together building blocks necessary for larger-scale applications and systems, but they present complicated low-level interfaces that require developers and operators to configure, learn, assemble, and scale appropriately.

It’s similar to the complexity reduction evolution happening in programming languages: Punch cards, assembly, C, C++, Java …

Continuous improvement keeps happening, but at some point, an architecture shift emerges that addresses the accumulation of complexity.

In our first blog post on Klo.Dev, we take a look at a few principles that we view as critical for this architectural shift to emerge, and what we need from products to effectively take us into the new world of cloud computing:

Conferences vs unConferences

Welcome to Vegas

Most people I know have attended or are familiar with entrepreneurship and startup related conferences. Conferences gather people, seat them, and then lecture to them on a certain subject. The quality of the conference predominantly depends on the speakers’ ability to inspire the audience.

However, nowadays you can hear some of the best talks on TED and YouTube, and the value of sitting in a room with tons of people isn’t as high as it used to be.

Conferences aren’t the only model for gatherings. My favorite alternative is the ‘Unconference‘ model. Lucky for you i’ve been to both and i’m here to tell you all about it.