Alex Mitelman Personal website

System Design Weekly 019: August 2021

Highlights HashiCorp State of Cloud Strategy Survey HahiCorp surveyed 3200+ decision-makers from their contact database. Here are key takeaways: 76% of the companies are already multi-cloud. 86% plan to be multi-cloud in two years. 90% of large corporations adopt multi-cloud solutions while only 60% of startups moved towards this direction. Digital transformation is the main multi-cloud driver. Other reasons are avoiding single cloud vendor lock, cost reduction, and scaling. Service mesh adoption is expected to grow 2.

System Design Weekly 018: August 2021

Highlights Logging at Twitter: Updated Centralized logging at Twitter was limited by low ingestion capacity and query capabilities which resulted in poor adoption. The previous solution ingested around 600K events per second per data center. However, only around 10% of the logs were submitted, with the remaining 90% discarded by the rate limiter. To address this, Twitter adopted Splunk Enterprise and migrated centralized logging to it. Now it ingests 4 times more logging data and has a better query engine and better user adoption.

System Design Weekly 017: July - August 2021

Highlights How WhatsApp enables multi-device capability WhatsApp phone client was previously a source of truth. If someone wanted to use WhatsApp on another device, the messages would be transferred through the smartphone app. If the smartphone battery was drained, such a companion app would not be able to work. The smartphone kept the data. WhatsApp now allows connecting 4 additional devices that are independent of the smartphone. Each device gets an identity key.

System Design Weekly 016: July 2021

Highlights DoorDash: Building Faster Indexing with Apache Kafka and Elasticsearch The DoorDash team faced an issue of a very long time for updating the search index. They’ve built a search system relying on open source technologies. It uses Kafka as a message queue and for data storage, Flink for data transformation, and sending data to Elasticsearch. A reliable indexing system would ensure that changes in stores and items are reflected in the search index in real-time.

System Design Weekly 015: July 2021

Highlights Managing Asynchronous Workflows with a REST API Building a REST API, sometimes there is a need to run some complicated logic that takes some time. In these cases, the REST call sparks an asynchronous job. For example, a call to generate a PDF report: POST /api/v1/report. In response, REST API answers with status HTTP/1.1 201 Created and a Location header to get the result Location: /api/v1/report/123. What are the options to fetch the result of this asynchronous job?

System Design Weekly 014: July 2021

I came across the word “exabyte” three times in just one today. Previously I didn’t even know this word exists. So 1 exabyte is 1,000 petabytes, or 1 exabyte is 1,000,000 terabytes. Companies operate at a scale of millions of terabytes now. “Apple is apparently Google’s largest customer now, followed by ByteDance (parent company of the TikTok app). Apple holds 8 exabytes of data with Google Cloud, ByteDance is in the region of 500 petabytes — 16x less.

System Design Weekly 013: June 2021

Highlights Learn how Dream11, the World’s largest fantasy sports platform, scale their social network with Amazon Neptune and Amazon ElastiCache Dream11 is a fantasy sports platform that has social network features. The team evaluated different graph database solutions for the social network service and chose Amazon Neptune after a load/stress PoC. Dream11 is already operating within AWS infrastructure so including a fully managed graph DB into the VPC was one of the factors.

System Design Weekly 012: June 2021

Highlights Uber: Handling Flaky Unit Tests in Java While the headline mentions Java, this experience is language-agnostic and can be helpful with any other programming language. Uber team has moved all their repositories to a single monolithic repository. This move helps to better manage dependencies, testing infrastructure, build systems, static analysis tooling. Although individual repos had stable tests, after merging to a monorepo there were lots of flaky tests. Why did it happen?

Follow up on Serverless Frameworks

I’ve received incredible feedback on my last week’s blog post Amazon API Gateway with Lambda is the Next Generation of Web Frameworks. As Adrian Mace suggested, there is a new Infrastructure as Code (IaC) tool from Amazon, called AWS CDK (Cloud Development Kit). I missed this awesome tool. To quote Adrian: “CDK allows you to define your infrastructure using imperative languages like TypeScript, Python, or Golang with the full powers that those languages can provide, and then ‘compiles down’ into Cloudformation templates upon deploy/synth.

How I Migrated this Site to Cloudflare

I’ve migrated my website from GitHub Pages to Cloudflare Pages. I’ve also moved my domain from Namecheap to Cloudflare. I have already used Cloudflare Web Analytics instead of Google Analytics. The reasoning behind this is following: Cloudflare Web Analytics. I didn’t like the idea of Google tracking my website visitors. Google Analytics sets a cookie. This probably means that such sites need to put that annoying cookie disclaimer that the EU came up with (out of good intentions, of course).