Scale is a feature

Written by Anders Marzi Tornblad 2 November 2023

This is part 9 of the Getting into software architecture series. If you haven't read the first part, here it is: A primer for emerging software architects

Scale in software is a feature in itself. Consider a simple game where a ping-pong ball navigates an obstacle course. With just a few objects, you'd model basic physics, forces, collisions and interactions on the Euclidean level. Add a hundred balls, ten thousand balls, and the physics still work. But introduce billions of objects, an entire ocean of things to emulate, and you might have to switch your model to fluid dynamics. The same transition happens in business systems, logistics, databases, and social media when scaling up.

At petabyte-scale, downtime isn't just inconvenient, it's potentially catastrophic. Redundancy becomes essential, not a luxury. This immense scale demands a complete re-evaluation of architecture, data distribution, and fault tolerance. As requests grow to millions per second, milliseconds of latency matters. Databases need fine-tuning, queries require optimization, and you start looking into edge computing to spread load.

You change the way you think about data integrity. The ACID approach might fail, so you need to embrace eventual consistency and the CAP theorem. With such vast amounts of data, caching isn't just for optimization; it's vital. Truly distributed solutions like Redis, Couchbase or Elasticsearch can help save your business.

Petabyte-scale solutions is also where microservices really shine, but remember to keep your services "micro". If you're implementing your microservices as full-fledged dotnet MVC applications with lots of layers, you're probably doing it wrong!

The sheer volume of logs and metrics becomes a challenge in itself. Monitoring and alerting systems transition from being supportive tools to critical lifelines. In such scenarios, even the slightest inefficiency can bottleneck an entire system. Performance testing and profiling are no longer periodic checks; they're continuous essentials, while cybersecurity evolves from mere breach prevention to real-time threat detection, isolation, and recovery.

Curiously, at this point, you might need to circle back to the basics: complexity theory and Big-O notation. Anything with a complexity larger than O(n) will be a problem. And sometimes, the component struggling to scale might not even be within our control. It could be an OS thread scheduler, a hardware driver, or even a network protocol.

When scaling to world-wide proportions, every layer, every component, and every line of code must come under scrutiny. It's demanding, but an exciting challenge!

Articles in this series: