Simplifying Facebook Marketplace System Design and Operations with AWS Cloud

Designing a traditional classified ads e-commerce system, similar to Facebook Marketplace, presents challenges in server procurement, maintenance, and scalability to manage high traffic and growing business complexity. These issues result in elevated costs and reduced efficiency with business growth. To achieve cost optimization, enhance system performance, and ensure security, migrating workloads to AWS is a viable solution.

This article explores the strategy with a simple example, similar to Facebook Marketplace and Gumtree.

System Requirement Clarification

For simplicity, we focus on the following core requirements of the system:

Functional Requirements(FR):

  1. Sellers must be able to publish, edit, and delete products.
  2. Users must be able to search for products using keywords.
  3. The system must enable buyers to send inquiries and notify sellers via email.

Non-functional Requirements(NFR):

  1. The service needs to be highly available to ensure consistent and reliable access for users.
  2. The service needs to be highly scalable to effectively accommodate the growing number of users and interactions.
  3. The service needs to be highly secure, encompassing features such as data encryption in transit and at rest, along with robust user authentication and authorization mechanisms.
  4. The service needs to be highly extendable, supporting the integration of emerging features and technologies, such as machine learning for enhancing recommendation systems.

Design Consideration

  • The system is read-heavy with an estimated read-write ratio of 100:1
  • All product posts, including up to 9 photos each, require reliable and near real-time retrieval.

Capacity Estimation

The storage requirements are primarily determined by four factors: the number of posts, the volume of user data, storage needs for Elasticsearch indexing, and memory requirements for the distributed caching system.

  1. Assuming a total user base of 200 million, we have 1 million daily active users.
  2. Assuming a 100:1 read-write ratio, approximately 10,000 new posts are created daily.
  3. Assuming an average of 50 words per post and 5 bytes per word.
  4. Assuming each post contains an average of 5 photos, each photo utilizing 2MB of block storage for different compression levels.
  5. Assuming the business has operated at this scale for a continuous period of 5 years.

For simplicity, we’ll assume that the storage used before reaching the current scale is equivalent to one year’s storage at the current scale.

This estimation serves as a simplified representation of the core concept for illustrative purposes. In practice, more complex scenarios and requirements, such as logging and data for machine learning, need to be considered.

The storage required for posts will be

 10000*365*(5+1)*50*5 + 10000*365*(5+1)*(2048*1024)*5 ≈ 209TB

where daily new posts require around 97.66GB of storage.

User storage requirements:

  1. User ID (8 Bytes): This should be sufficient to support 200 million users and provides ample scalability.
  2. Username (32 bytes): A 32-byte username length should be sufficient for most cases, allowing for relatively long usernames.
  3. Email (64 bytes): A 64-byte email address length should be sufficient to support various email address formats.
  4. Password (16 bytes): A 16-byte password length is typically enough, but ensure a secure password storage scheme, such as hashing with salt.
  5. Language (2 bytes): A 2-byte language code is a standard way to represent language and is usually sufficient.
  6. Avatar URL (200 bytes): A 200-byte avatar URL length allows for relatively long URLs for storing avatar links.

Storage Space (bytes) = Bytes per single user record x Number of users

Storage Space (bytes) = 322 bytes x 200,000,000 users = 64,400,000,000 bytes
Storage Space (GB) = 64,400,000,000 bytes / 1,073,741,824 ≈ 59.88 GB

Elasticsearch and Redis capacity estimation

For this classified platform, where user interest predominantly lies in the latest ads, a cost-effective approach involves Elasticsearch indexing only the content from the most recent three months. Concurrently, Redis will cache the top 20% of the most accessed content from the same period, employing a Least Recently Used (LRU) strategy for efficient data management. Additionally, Redis caches the top 10 pages of each category, as users often browse these top pages in their areas of interest. While caching other popular local pages is a consideration, it is currently beyond the scope of this discussion.

Daily Text Storage (Bytes)=Daily Posts×Words per Post×Bytes per Word
Given: Daily Posts = 10,000, Words per Post = 50, Bytes per Word = 5
So, Daily Text Storage=10,000×50×5=2,500,000 Bytes
Three-Month Storage=2,500,000×30×3=225,000,000 Bytes

Elasticsearch Storage Requirement (3x Original Data Size): ~0.63 GB

Elasticsearch might require up to 3 times the original data size for indexing.
Max Index Storage (GB) = Three-Month Storage×3
Max Index Storage (GB) = 225,000,000×3 / 10243 ≈ 0.63 GB

Redis Storage Requirement(assuming 108 categories):

Redis Cache Storage = Three-Month Storage × Percentage for Caching + (108 categories storage with 10 pages per category and 32 items per page)
Redis Cache Storage = 0.21GB × 0.20 + 0.00864GB = 0.05064GB

High-Level Design

Search Requests:

  • Incoming search requests are directed through the Node.js Gateway.
  • The Gateway forwards requests to Elasticsearch for search operations.
  • Elasticsearch processes the request and sends back results, optimizing search functionality.

Category and Product Page Retrieval:

  • When category page requests arrive:
    • The Node.js Gateway first checks the Redis cluster for cached results, utilizing a Redis sorted set to fulfill this requirement.
    • Cached results are immediately served if available.
    • In the absence of a cache, Node.js Gateway sends the request to the Application Service Cluster.
    • Results from the Application Service Cluster are stored in Redis for future use.

React App is hosted on an Nginx Cluster

HDFS for Static Files Access:

  • Static files, such as images and videos, are accessed through the HDFS for Static Files, which is managed by the Node.js Gateway.
  • Centralized static file services enhance manageability and scalability.

Advertisement Updates:

  • For add/update/delete requests regarding advertisements:
    • The Application Service Cluster processes requests and sends relevant data to a message queue (MQ).
    • A dedicated “Advertisement Update Service” monitors the MQ.
    • This service handles advertisement changes, ensuring synchronization with Elasticsearch and Redis Cluster.
    • Periodic or on-demand operation guarantees data consistency and updates.

Sharding

To evenly distribute Product Post storage and traffic across different shards, we’ll employ Consistent Hashing based on Product IDs. We’ve addressed high-frequency access to category pages and filtered searches with Redis and Elasticsearch.

For range queries, we query all MySQL shards, and then a Consistent Hash Load Balancer aggregates and filters Top N Items from all shards before returning the Top 10 Items to App Services or relevant calling services.

This design efficiently balances data and handles performance requirements, although it’s essential to monitor query complexity as data scales for optimal system performance.

Challenges

As evident from the previous discussions, meeting system requirements involves considering various scenarios from the outset. This entails procuring an adequate number of servers to accommodate business and user growth needs. Common challenges we’ll encounter include:

  1. Server Procurement and Deployment: How to dynamically scale server resources based on demand? There may be idle servers during off-peak hours, leading to resource wastage, yet they’re essential during peak periods.
  2. Daily Operations and Maintenance: Responsibilities encompass Gateway maintenance, cluster scaling strategies, regular backups, security patch management, monitoring, and performance adjustments.
  3. Security and Compliance: Ensuring regular credential updates to meet compliance requirements while managing data encryption, identity verification, access control (possibly through customized Redis API rate limiters), DDoS protection, IP restrictions, and more.
  4. Disaster Recovery

Although practices like automated operations and continuous integration/continuous delivery (CI/CD) can mitigate some of these challenges to a certain extent, it’s crucial to acknowledge the associated human and resource costs.

Questions

The challenges outlined earlier raise a fundamental question: How can we reduce costs, particularly in terms of operational expenses, while simultaneously enhancing system performance, and availability, ensuring data security and compliance, and meeting the demands of business growth and user expansion?

Solutions

Now, let’s delve into how AWS provides solutions to tackle these challenges.

Design Details

The AWS cloud architecture has enabled us to tackle server procurement and deployment, operational costs, security, and disaster recovery more efficiently and effectively.

Route 53

By configuring Route 53, we meet our disaster recovery (DR) requirements, implementing a multi-location DR strategy. Additionally, Route 53 helps us deliver lower-latency services by intelligently routing traffic to geographically closer locations, enhancing user experience and providing more reliable services.

AWS WAF

WAF, integrated with CloudFront, effectively enhances the security of our services. It monitors traffic through configured rules and takes actions, such as IP blocking, upon detecting abnormal traffic patterns, effectively mitigating Distributed Denial of Service (DDoS) attacks. It can also be used for IP restrictions, like limiting access from specific geographic regions.

Cost Optimization

Although capacity estimation offers more controlled cost management and handles scenarios like traffic surges, using AWS cloud greatly simplifies server procurement and deployment. Additionally, the elastic scaling feature of cloud computing ensures more efficient resource utilization.

Static and Dynamic Segregation

S3 will be used to host front-end web app files and other static files like images, effectively reducing operational costs for HDFS and Nginx clusters. Additionally, as a highly available service, S3 supports cross-region data redundancy through Cross Region Replication (CRR) to optimize latency and aid disaster recovery under critical conditions.

Optimizing Application Service Clusters

API Gateway serves as the main gateway between front-end and back-end services, offering numerous out-of-the-box features such as rate limiting, security controls, and integration with serverless services like Lambda, greatly reducing our development and operational costs. As a regional service, each API Gateway can achieve 99.9% availability. Deploying it across multiple regions can further decrease latency and improve availability.

App Services instances deployed in an Auto Scaling group managed by ALB with memory-based scaling rules allow for cost-effective operations. Using Reserved Instances for regular traffic and On-Demand Instances for peak traffic efficiently reduces operational costs, improves resource utilization, and minimizes the waste of idle server resources seen in traditional on-premises models.

Replacing existing services with AWS Application Load Balancer, Amazon Elasticsearch Service, and Amazon ElasticCache for Redis significantly lowers operational costs while enhancing availability and security.

These services can be easily configured, scaled, and monitored by teams. Configuring deployments across multiple availability zones ensures failover capabilities. Elastic scaling further enhances efficient resource utilization.

AWS Aurora

AWS Aurora will replace MySQL, capitalizing on its compatibility with MySQL and performance advantages in high-load and large-scale data processing scenarios. AWS Aurora offers not just improved performance but also superior scalability.

Moreover, the use of AWS Aurora Global Database will further enhance system availability and reduce data access latency. It automatically replicates data across multiple regions, ensuring high data availability and durability. By providing data in regions closer to users, it effectively decreases access latency.

To manage loads more efficiently, we will incorporate Lambda services at the forefront of our architecture to implement consistent hashing. Lambda’s serverless and on-demand computing capabilities are ideal for handling irregular or burst workloads, thus further boosting the overall architecture’s flexibility and scalability.

Overall, combining AWS Aurora’s high-performance database solution with the flexibility of Lambda services, this architecture is designed to improve performance, enhance availability, and optimize user experience.

Migration

Considering the challenge of migrating hundreds of terabytes of data, AWS Snowball is an ideal choice. Snowball is designed for efficient and secure large-scale data transfers. Our plan involves initially using Snowball to migrate the bulk of the data, followed by utilizing AWS Database Migration Service (DMS) for synchronizing new data updates. Once all data synchronization is complete, we will carefully execute the database switchover to ensure a seamless business transition. While this discussion does not detail the migration of other services, it’s worth mentioning that tools provided by AWS, such as Service Discovery, play a crucial role in service orchestration and server performance assessment during the migration process.

Unveiling the Ultimate Secret of High Concurrency: How Java Thrives in High-Frequency Trading?

The Characteristics of High-Frequency Trading Systems

In the current financial markets, there is a significant need for high-frequency trading (HFT) systems. Java, renowned for its cross-platform compatibility, efficient development capabilities, and maintainable features, is extensively employed within the financial industry. High-frequency trading revolves around the rapid execution of buy and sell transactions, leveraging swift algorithms and advanced computer technology, all while prioritizing the attainment of low-latency targets.

The Challenges: High Concurrency Processing and Low Latency Requirements

High-frequency trading systems necessitate managing many trade requests within extremely tight timeframes, creating substantial demands for robust concurrency processing capabilities and exceptional low-latency performance.

How can the challenges of high concurrency and low latency be tackled?

Conventional data structures and algorithms may be inadequate to meet high-frequency trading systems’ high concurrency and low latency demands. As a result, exploring more efficient concurrent data structures becomes imperative.

Limitations of Traditional Queues in High-Frequency Trading Systems

Traditional queue data structures, such as LinkedList, ArrayBlockingQueue, and ConcurrentLinkedQueue, face limitations in high-frequency trading systems. These queues may encounter issues like lock contention, thread scheduling overhead, and increased latency, preventing them from meeting high-frequency trading requirements. The lock and synchronization mechanisms in traditional queues introduce additional overhead, restricting their ability to efficiently handle high concurrency and achieve low-latency performance.

The limitations of traditional queues in high-frequency trading systems can be summarized as follows:

  1. Lock contention: Traditional queues use locks to ensure thread safety during concurrent access. However, when many threads compete for locks simultaneously, performance can degrade, and latency can increase. Locks introduce extra overhead, potentially leading to thread blocking and extended system response times.
  2. Thread scheduling overhead: In traditional queues, frequent thread scheduling operations occur when threads wait or are awakened within the queue. This overhead adds to system costs and introduces additional latency. Frequent thread context switching in high-concurrency scenarios can waste system resources and degrade performance.
  3. Increased latency: Due to issues like lock contention and thread scheduling, traditional queues may fail to meet the low-latency requirements of high-frequency trading systems in high-concurrency environments. Trade requests waiting in the queue experience increased processing time, resulting in longer trade execution times and affecting real-time responsiveness and system performance.

These limitations prevent the standard queues provided by the JDK from meeting the high concurrency processing and low-latency performance requirements of high-frequency trading systems. To address these issues, it is necessary to explore more efficient concurrent data structures, such as the Disruptor, to overcome the challenges posed by high-frequency trading systems.

Introduction to the Disruptor Framework

Disruptor is a high-performance concurrent data structure developed by LMAX Exchange. It aims to provide a solution with high throughput, low latency, and scalability to address concurrency processing challenges in high-frequency trading systems.

Core Concepts and Features of Disruptor

Disruptor tackles concurrent challenges through core concepts such as the ring buffer, producer-consumer model, and lock-free design. Its notable features include:

  1. High Performance: Disruptor’s lock-free design minimizes competition and overhead among threads, resulting in exceptional performance.
  2. Low Latency: Leveraging the ring buffer and batch processing mechanism, Disruptor achieves extremely low latency, meeting the stringent requirements of high-frequency trading systems.
  3. Scalability: Disruptor’s design allows effortless expansion to accommodate multiple producers and consumers while maintaining excellent scalability.

Ring Buffer

The Ring Buffer is a core component of the Disruptor framework. It is a circular buffer for efficient storage and transfer of events or messages, enabling high-performance and low-latency data exchange between producers and consumers.

Key Features of Disruptor’s Ring Buffer:

Cache Line Optimization

Disruptor is designed to leverage cache line utilization for improved access efficiency. By placing padding data in the same cache line, unnecessary cache contention and cache invalidation are minimized, resulting in enhanced read and write performance.


The following code demonstrates how to improve performance by using padding data placement.

import java.util.concurrent.CountDownLatch;

public class CacheLinePadding {
    public static volatile long COUNT = 10_0000_0000L;

    private static abstract class TestObject {
        public long x;
    }

    private static class Padded extends TestObject {
        public volatile long p1, p2, p3, p4, p5, p6, p7;
        public long x;
        public volatile long p9, p10, p11, p12, p13, p14, p15;
    }

    private static class Unpadded extends TestObject {
        // Empty class, no padding required
    }

    public static void runTest(boolean usePadding) throws InterruptedException {
        TestObject[] arr = new TestObject[2];
        CountDownLatch latch = new CountDownLatch(2);

        if (usePadding) {
            arr[0] = new Padded();
            arr[1] = new Padded();
        } else {
            arr[0] = new Unpadded();
            arr[1] = new Unpadded();
        }

        Thread t1 = new Thread(() -> {
            for (long i = 0; i < COUNT; i++) {
                arr[0].x = i;
            }
            latch.countDown();
        });

        Thread t2 = new Thread(() -> {
            for (long i = 0; i < COUNT; i++) {
                arr[1].x = i;
            }
            latch.countDown();
        });

        long start = System.nanoTime();
        t1.start();
        t2.start();
        latch.await();
        System.out.println((usePadding ? "Test with padding: " : "Test without padding: ") +
                (System.nanoTime() - start) / 100_0000);
    }

    public static void main(String[] args) throws Exception {
        runTest(true);  // Test with padding
        runTest(false); // Test without padding
    }
}

Here’s the result:

Test with padding: 738
Test without padding: 2276

Similar Implementation in the Disruptor Framework

Circular Buffer Structure

Disruptor utilizes a circular buffer as its primary data structure, providing efficient event storage and delivery. Using a circular buffer eliminates the need for dynamic memory allocation and resizing, optimizing overall performance.

Strict Ordering

Disruptor ensures events are consumed in the same order they were produced, maintaining data flow integrity and consistency.

Lock-Free Design and CAS Operations

Disruptor employs a lock-free design and atomic operations, such as Compare-and-Swap (CAS), to achieve high concurrency. This reduces thread contention and enhances concurrent processing capabilities.

For a detailed understanding of how CAS (Compare and Swap) operates and its application in addressing high concurrency challenges, I recommend referring to my previous article available at: https://masteranyfield.com/2023/07/03/unleashing-the-potential-of-synchronized-and-cas-addressing-high-concurrency-challenges-in-e-commerce-and-hft/.

High Performance and Low Latency

By leveraging the circular buffer and batch processing mechanisms, Disruptor achieves exceptional performance and ultra-low latency. It meets the stringent low-latency requirements of high-frequency trading systems.

Scalability

Disruptor’s design allows for easy scalability to accommodate multiple producers and consumers, providing flexibility to handle increasing thread and processing demands.

Disruptor VS Traditional Queues

Disruptor offers significant advantages over standard JDK queues in terms of low latency, scalability, and high throughput:

  1. Low Latency: Disruptor is designed to achieve low-latency data processing. By minimizing lock contention, reducing thread context switching, and leveraging the ring buffer structure, Disruptor provides extremely low event processing latency. This makes it highly suitable for applications with strict real-time requirements, such as high-frequency trading systems.
  2. Scalability: Disruptor employs a lock-free design and efficient concurrent algorithms, allowing it to effortlessly scale to accommodate multiple producers and consumers. This enables Disruptor to handle large-scale concurrent operations while maintaining high performance and scalability.
  3. High Throughput: Disruptor achieves exceptional throughput through optimized design and lock-free mechanisms. Multiple producers and consumers can operate in parallel by utilizing the ring buffer and implementing batch processing, maximizing system throughput.

Disruptor is well-suited for scenarios with high concurrency and demanding low latency, while traditional queues are suitable for general concurrency requirements.

Comparing Disruptor Performance with LinkedBlockingQueue and ArrayBlockingQueue

import com.lmax.disruptor.*;
import com.lmax.disruptor.dsl.Disruptor;
import com.lmax.disruptor.dsl.ProducerType;

import java.util.concurrent.*;

/**
 * Author: Adrian LIU
 * Date: 2020-07-18
 * Desc: Compare the difference of performance between Disruptor, LinkedBlockingQueue and ArrayBlockingQueue
 * implemented Producer and Consumer with single producer and consumer
 * Disruptor took 137ms to handle 10000000 tasks
 * ArrayBlockingQueue took 2113ms to handle 10000000 tasks
 * LinkedBlockingQueue took 1312ms to handle 10000000 tasks
 */
public class DisruptorVSBlockingQueueWithSingleProducerConsumer {
    private static final int NUM_OF_PRODUCERS = 1;
    private static final int NUM_OF_CONSUMERS = 1;
    private static final int NUM_OF_TASKS = 10000000;
    private static final int BUFFER_SIZE = 1024 * 1024;

    public static void main(String[] args) {
        LongEventFactory factory = new LongEventFactory();
        ExecutorService service1 = Executors.newCachedThreadPool();
        SimpleTimeCounter disruptorTimeCount = new SimpleTimeCounter("Disruptor", NUM_OF_PRODUCERS + 1, NUM_OF_TASKS);

        Disruptor<LongEvent> disruptor = new Disruptor<LongEvent>(factory, BUFFER_SIZE, service1, ProducerType.SINGLE, new SleepingWaitStrategy());
        LongEventHandler handler = new LongEventHandler(NUM_OF_TASKS, disruptorTimeCount);

        disruptor.handleEventsWith(handler);

        disruptor.start();
        RingBuffer<LongEvent> ringBuffer = disruptor.getRingBuffer();

        LongEventTranslator translator = new LongEventTranslator();

        disruptorTimeCount.start();
        new Thread(() -> {
            for (long i = 0; i < NUM_OF_TASKS; i++) {
                ringBuffer.publishEvent(translator, i);
            }

            disruptorTimeCount.count();
        }).start();

        disruptorTimeCount.waitForTasksCompletion();
        disruptorTimeCount.timeConsumption();
        disruptor.shutdown();
        disruptor.shutdown();
        service1.shutdown();

        SimpleTimeCounter arrayQueueTimeCount = new SimpleTimeCounter("ArrayBlockingQueue", NUM_OF_CONSUMERS, NUM_OF_TASKS);
        BlockingQueue<LongEvent> arrayBlockingQueue = new ArrayBlockingQueue<>(BUFFER_SIZE);

        LongEvent poisonPill = new LongEvent(-1L);
        LongEventProductionStrategy productionStrategy = new LongEventProductionStrategy(0, NUM_OF_TASKS, poisonPill, NUM_OF_CONSUMERS);
        LongEventConsumptionStrategy consumptionStrategy = new LongEventConsumptionStrategy(poisonPill, arrayQueueTimeCount);

        Thread t1 = new Thread(new BlockingQueueProducer<>(arrayBlockingQueue, productionStrategy));
        Thread t2 = new Thread(new BlockingQueueConsumer<>(arrayBlockingQueue, consumptionStrategy));
        t2.start();

        arrayQueueTimeCount.start();
        t1.start();

        arrayQueueTimeCount.waitForTasksCompletion();
        arrayQueueTimeCount.timeConsumption();

        SimpleTimeCounter linkedQueueTimeCount = new SimpleTimeCounter("LinkedBlockingQueue", NUM_OF_CONSUMERS, NUM_OF_TASKS);
        BlockingQueue<LongEvent> linkedBlockingQueue = new LinkedBlockingQueue<>();

        try {
            t1.join();
            t2.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        consumptionStrategy = new LongEventConsumptionStrategy(poisonPill, linkedQueueTimeCount);
        t1 = new Thread(new BlockingQueueProducer<>(linkedBlockingQueue, productionStrategy));
        t2 = new Thread(new BlockingQueueConsumer<>(linkedBlockingQueue, consumptionStrategy));
        t2.start();

        linkedQueueTimeCount.start();
        t1.start();

        linkedQueueTimeCount.waitForTasksCompletion();
        linkedQueueTimeCount.timeConsumption();
    }
}
Disruptor took 137ms to handle 10000000 tasks
ArrayBlockingQueue took 2113ms to handle 10000000 tasks
LinkedBlockingQueue took 1312ms to handle 10000000 tasks

Based on our experiments and explanations above, it is evident that Disruptor exhibits significant advantages over the standard JDK Queue in certain scenarios. Its features, such as low latency, high throughput, concurrency, and scalability, effectively address challenges encountered in high-frequency trading (HFT) and other demanding environments.

Here are some of its applications in Fintech and trading systems:

  1. Trade Data Aggregation: Disruptor’s high throughput and low latency make it ideal for rapidly aggregating and processing large volumes of trade data. It efficiently handles numerous trade requests and ensures real-time and accurate processing.
  2. Event-Driven Trade Execution: With an event-driven model, Disruptor enables efficient concurrent processing by allowing producers to publishing trade events and consumers to execute the corresponding trade logic. This eliminates performance bottlenecks associated with traditional lock mechanisms, enabling swift trade execution and prompt response to market changes.
  3. Parallel Computing and Risk Management: Disruptor’s scalability and concurrent processing capabilities facilitate efficient handling of large-scale computing tasks and effective risk management in high-frequency trading. By leveraging Disruptor’s parallel computing capabilities, accurate risk assessment and trade decision execution can be achieved.
  4. Lock-Free Design and Low Latency: Disruptor’s lock-free design minimizes lock contention and thread scheduling overhead, resulting in low-latency data exchange and processing. This meets the high-concurrency and low-latency requirements of HFT, enabling rapid response to market changes and improved trade execution efficiency.

Summary

Disruptor’s exceptional features, including low latency, high throughput, concurrency, and scalability, empower HFT systems to overcome challenges and achieve faster and more efficient trade execution. Its applications in trade data aggregation, event-driven trade execution, parallel computing, risk management, and lock-free design contribute to its effectiveness in high-frequency trading systems.

Unleashing the Java Concurrent Potential: Conquering High-Concurrency Challenges with Optimal Adoption of Synchronized and CAS

Synchronized and Compare and Swap (CAS) are two major concurrent mechanisms in Java used to solve thread safety issues. Choosing the appropriate mechanism can significantly improve the performance and resource utilization of your Java application.

In terms of features, synchronized is a keyword provided by Java, while CAS has its own set of implementations, such as AtomicLong and several classes that utilize AbstractQueuedSynchronizer (AQS), such as ReentrantLock and ReentrantReadWriteLock.

This article will delve into their respective advantages and disadvantages through the following structure to help us make informed choices.

Basic features and usage

Synchronized mainly controls the granularity of locks in three ways:

public class SynchronizedVSLock {
    private Object lock = new Object();

    public void objectLock() {
        // The lock is scoped to the code block below, ensuring exclusive access within that block.
        synchronized (lock) {

        }
    }

    public void classLock() {
        // The code block is protected by a global lock associated with the SynchronizedVSLock.class, ensuring that concurrent access by any instances is synchronized.
        synchronized (SynchronizedVSLock.class) {

        }
    }

    // The lock is scoped to the method, ensuring exclusive access within its execution.
    public synchronized void methodLock() {

    }
}

In contrast, Lock controls the lock granularity through the lock() and unlock() methods and its scope depends on the lifecycle of the lock instance:

public class SynchronizedVSLock {
    private Lock reentrantLock = new ReentrantLock();

    public void raceConditionDemo() {
        reentrantLock.lock();
        // critical area
        reentrantLock.unlock();
    }
}

Synchronized releases the lock passively only after executing the synchronized code block or when an exception occurs. It cannot provide non-blocking lock acquisition methods.

On the other hand, Lock is more flexible than synchronized. It allows autonomous decisions on when to lock and unlock. The tryLock() method can be used to acquire a boolean value, indicating whether the lock is already held by another thread. Additionally, Lock provides both fair and unfair locks, while synchronized only provides unfair locks.

Performance comparison

Synchronized has different lock states during its implementation, including no lock, biased lock, lightweight lock, and heavyweight lock. When it is upgraded to a heavyweight lock, system calls and context switching occurs, resulting in significant performance overhead.

Java threads are mapped to native operating system threads, and blocking or waking up a thread involves a context switch between user mode and kernel mode. This switch consumes system resources and requires overhead for passing variables and parameters between modes. The kernel preserves register values and variables during the mode transition.

Frequent thread state transitions can consume a significant amount of CPU processing time. In cases where the time spent on context switching due to a heavyweight lock exceeds the time spent on executing user code, this synchronization strategy becomes highly inefficient.

When is synchronized applicable?

In contrast to the previous point, the synchronized keyword is suitable when the time taken to acquire the heavyweight lock is shorter than the time spent on executing user code. This is particularly true in scenarios where multiple threads spin and compete for resources that require a significant duration of execution. After all, spinning also occupies CPU time slices.

CAS principle

CAS (Compare and Swap) is a commonly used atomic operation in concurrent programming to address race conditions in a multi-threaded environment. The CPU primitive for CAS is the Lock cmpxchg instruction. The usage of “Lock” ensures atomicity during the comparison and write phase, preventing concurrency issues. The Lock cmpxchg instruction locks the memory block during the CAS operation, disallowing access from other CPUs.

The CAS principle involves the following key steps:

  • Comparison: CAS compares the value in memory with the expected value. If they match, it proceeds to the next step. Otherwise, another thread has modified the value, resulting in a failed CAS operation.
  • Exchange: If the comparison succeeds, CAS attempts to write the new value into memory. This operation is atomic, ensuring it is not interfered with by other threads.
  • Check result: CAS returns a boolean indicating the success or failure of the operation.

CAS is based on the atomicity of comparison and exchange operations. By comparing the value in memory with the expected value, CAS ensures only one thread successfully executes the operation. In multi-threaded scenarios, CAS maintains data consistency and correctness.

Performance comparison of synchronized and CAS implementations in different scenarios

Comparison of synchronized, AtomicLong, and LongAdder:

import java.util.concurrent.atomic.AtomicLong;
import java.util.concurrent.atomic.LongAdder;

public class AtomicVsSyncVsLongAdder {
    private static int NUM_OF_THREADS = 100;
    private static long COUNT = 0L;
    private static AtomicLong atomicLong = new AtomicLong();
    private static LongAdder longAdder = new LongAdder();

    public static void main(String args[]) throws InterruptedException {
        Thread[] threads = new Thread[NUM_OF_THREADS];

        for (int i = 0; i < threads.length; i++) {
            threads[i] = new Thread(() -> {
                for (int k = 0; k < 100000; k++) {
                    synchronized (AtomicVsSyncVsLongAdder.class) {
                        COUNT++;
                    }
                }
            });
        }

        long startTime = System.currentTimeMillis();
        for (int i = 0; i < threads.length; i++) {
            threads[i].start();
        }

        for (int i = 0; i < threads.length; i++) {
            threads[i].join();
        }
        long expiredTime = System.currentTimeMillis();

        System.out.println("Synchronized took " + (expiredTime-startTime) + "ms.");

        for (int i = 0; i < threads.length; i++) {
            threads[i] = new Thread(() -> {
                for (int k = 0; k < 100000; k++) {
                    atomicLong.incrementAndGet();
                }
            });
        }

        startTime = System.currentTimeMillis();
        for (int i = 0; i < threads.length; i++) {
            threads[i].start();
        }

        for (int i = 0; i < threads.length; i++) {
            threads[i].join();
        }
        expiredTime = System.currentTimeMillis();

        System.out.println("AtomicLong took " + (expiredTime-startTime) + "ms.");

        for (int i = 0; i < threads.length; i++) {
            threads[i] = new Thread(() -> {
                for (int k = 0; k < 100000; k++) {
                    longAdder.increment();
                }
            });
        }

        startTime = System.currentTimeMillis();
        for (int i = 0; i < threads.length; i++) {
            threads[i].start();
        }

        for (int i = 0; i < threads.length; i++) {
            threads[i].join();
        }
        expiredTime = System.currentTimeMillis();

        System.out.println("LongAdder took " + (expiredTime-startTime) + "ms.");
    }
}

When NUM_OF_THREADS is set to 100, the test results are as follows:

Synchronized took 981ms.
AtomicLong took 680ms.
LongAdder took 74ms.

When NUM_OF_THREADS is set to 1000, the test results are as follows:

Synchronized took 4472ms.
AtomicLong took 6958ms.
LongAdder took 157ms.

The experiments demonstrate that as the number of competing threads increases, the AtomicLong implementation based on CAS takes longer than the synchronized mechanism. This confirms the considerations mentioned earlier:

  • CAS can be considered when lock acquisition time is shorter than code execution time.
  • When there is high thread contention and the overhead of spinning exceeds suspension costs, using the synchronized keyword is advisable.

DDD(Domain Driven Design)-4 Find the software architecture to apply Domain Driven Design

In this article, I am going to discuss what architecture can ideally integrate with DDD and how it works. To achieve this goal, I am going to walk through the features, pros, and cons of the MVC architectural pattern, 3-tier architecture, 4-tier architecture, 5-tier architecture, and hexagonal architecture. You might want to read my previous post to better understand the concept of the coming sections.

Previous post

Three-tier architecture

It’s a simple layered architecture. Most projects can apply it. We can integrate it with the MVC Architectural Pattern.

Why do we need architecture layers?

All logical implementations are tightly coupled in a package, which undoubtedly brings a heavy maintenance cost and has poor scalability.

The layered architecture design aims to help us achieve high cohesion, low coupling, scalable, and reusable design. We can segregate responsibilities and limit change impact within each layered. However, it is a misunderstanding that having more layers brings more benefits to a system. Fewer layers mean that a workflow traverses less logic and less overhead to do so. We need to balance the trade-off to have a practical layer design.

There are two main types of layered architectures:

  1. Strictly layered architecture, which only allows interaction with the layer below.
    • e.g. an upper presentation layer is only allowed to interact with the business logic layer below it.
  2. Relaxed layered architecture, it allows a layer to interact with any layers below it.

What should we do as architects?

As architects, we need to define standards that other engineers follow. We will need to document them and have all associated engineers’ awareness of them so that they won’t break the design during implementation. Documentations are essential here. If we only notify engineers in a verbal manner, people would forget over time and take them less seriously.

Four-tier architecture

To better fit the business and implement our domain-driven design, layered architecture design can help to highlight the benefits of the domain model concept. However, as we can see from the three-tier architecture, the database is its starting point, which engineers generally start with database modeling, and DDD’s focus is not on the database model, but on the domain model concept. It takes domain models as the basis for analysis to design architecture hierarchy.

What is the DDD solution for this?

Eric Evans provided a traditional four-tier architecture as the DDD solution in his book, Domain-Driven Design: Tackling Complexity in the Heart of Software.

It is a Relaxed layered architecture.

We can have the following interaction:

  • User Interface –> Application, Domain, and Infrastructure
  • Application –> Domain and Infrastructure
  • Domain –> Infrastructure

Five-tier Architecture

Let’s take a look at the popular MVC Architectural Pattern.

MVC Architectural Pattern

Model–view–controller (MVC) is a software architectural pattern commonly used for developing user interfaces that divide the related program logic into three interconnected elements. 

Wikipedia

It’s highly recommended by the Spring ecosystem. However, it has certain inherent defects.

What are the problems of MVC Architectural Pattern?

  1. There’s implementation redundancy across various layers.
    • Human has the tendency to seek shortcut. MVC pattern itself doesn’t stop developers from writing business logic on Controllers, which brings redundancy across the controller layer and other layers. It is a prevalent issue that occurs during development.
    • As more logic putting into controllers, they become complicated over time.
  2. Controllers are heavily dependent on models. In terms of cross-layer interaction, Controllers are strongly dependent on services while services are heavily dependent on Daos. Models play the intermediate medium role during the interaction. This means that model prototypes need to be as simple as possible.
  3. Objects are coupled together. For example, while we send a query request to a controller, a user object is passed to an order object.
  4. It neither complies with object-oriented programming principles nor domain-driven design principles. It’s the Anemia Mode. Roles objects(referred to as domain objects in DDD) only contain get and set methods. MVC is inherently in conflict with DDD in design, where DDD emphasizes the domain concept and behavioral interaction between roles/entities.
    • MVC is more of a structural design pattern, while DDD is more of a behavioral design pattern.

What can we do to improve the MVC pattern?

The DCI(Data, Context, and Interaction) Architecture Pattern

Trygve Reenskaug, the inventor of MVC

James O. Coplien, author of Multi-Paradigm Design for C++

Trygve and James together invented and refined the formulation of DCI architecture. It’s complementary to the MVC pattern. It perfectly fits the domain-driven design.

The DCI architecture design in the above diagram has the physical separation of models and behaviors. Data objects are static objects which don’t possess any business logic and only contain the get, set, and some basic behaviors, such as sorting and pagination. It’s the Anemia Model. In addition, users’ behaviors and actions are extracted to those RoleObjs. They have a two-way binding relationship. As both a role object and a data object have their own boundaries, binding them together forms another boundary. As a result, we achieve high cohesion within the boundary.

The context layer is a simple and thin layer working as an interaction entrance. It is responsible for business scenario aggregation and carrying out interactions between contexts.

An e-commerce example of how DCI takes the benefit from both the anemia model and congestion model.

Let’s take a look at a product order workflow.

In an SOA or Microservices architecture, we generally have the order, product, and user modules for the above actions. Each of them contains related classes with some associated methods.

We normally have the following two approaches to handle this workflow:

  1. We aggregate behaviors, across several subdomains, in the current domain or specific service, e.g., a business middle platform. It’s an anemia model. Domain objects only contain some get, set, and basic methods. It’s simple and suitable for fast development.
  2. Domain models contain all associated objects’ behaviors to complete the business chain. It’s a congestion model. Domain models have lots of complicated both innate and non-innated behaviors. They might also cover some behaviors that belong to other domain models. It’s hard to define the abstract layer. As the business chain might involve interactions between several modules, data relations could be complicated, and it would be hard for us to define repositories and behavioral boundaries. It requires our dev team with robust skills and experiences in the domain and techs to manage it.

Both the anemia and congestion model have their pros and cons in scenarios like this. We can apply the DCI architecture pattern to put behaviors into those Role objects and data belonging to the same data boundaries into data objects.

Finally, we achieve the following benefits with the DCI pattern:

  1. Physical segregation and reference between Role objects and data objects to achieve high cohesion and loose coupling between data and behaviors;
  2. Resolve the issue of inconsistency between data and behaviors.

As we can see, the DCI architecture pattern is more flexible compared to the MVC pattern. It serves better for domain-driven design as it fits domain models better. We can practice domain-driven design by applying the DCI architecture design paradigm with Four-Color Modeling, an approach to analyzing requirements.

After we introduce the DCI architecture design pattern into 4-tier architecture, we have the following 5-tier architecture.

Five-tier Architecture — the derivative of the integration of the DCI architectural pattern and 4-tier architecture

It’s a Relaxed layered architecture. If business scenarios have complicated logic and interactions, we can also divide the domain layer into two adjacent layers, e.g., a transactional management layer and a context layer. The transactional management layer is responsible for the business logic chain execution while the context layer takes care of atomic action, such as a single synchronization message or roles’ interactions. As a result, we will have a six-tier architecture, which is also a variant form of five-tier architecture.

Six-tier Architecture

It is also a Relaxed layered architecture.

However, we might encounter system breakdown if there is infrastructure failure occur where no matter whether our architecture is strictly layered architecture or relaxed layered architecture. Also, in a strictly layered architecture, the upper layers break down if the layers below them malfunction.

How can we solve this problem?

We can apply the Dependency Inversion Principle to resolve this issue. The upper layers shouldn’t depend on the concrete implementation of the lower layers. Instead, they should rely on lower layers’ abstractions or interfaces. Also. the concrete implementations should also depend on abstraction.

There’s a more elegant architectural style, i.e., Hexagonal Architecture.

Hexagonal Architecture

Alistair Cockburn, one of the initiators of the agile movement in software development, invented the Hexagonal Architecture.

We can also call it ports and adapters architecture. It is a strictly layered, highly cohesive, and flexible architecture.

Features:

  • Domain Model: Core business logic
  • Public APIs: The internal application exposes public APIs for outer layer usage.
  • Two types of Adapters:
    • Driving Adapters are responsible for clients’ inbound requests.
    • Driven Adapters are responsible for outbound interactions, e.g., interaction with a Redis cluster.
  • Each adapter is paired with specific usage. (Single Responsibility Principle)
  • Interactions are strictly through APIs but not code implementation, i.e., concrete application layer implementation must not occur in the domain model.

Conclusion

We’ve looked at features of various architectural styles.

Again, we use technology to empower business. Tech is supposed to be driven by business requirements, and so do architecture design. Having more tiers doesn’t mean it’s better to serve our software system and development. We should design software architecture based on business scenarios. We need to consider the trade-off between the clear responsibility of multi-tier architecture and its cost. We need to avoid over-design.

If you are interested in DDD, I suggest you also take a look at four-color modeling and event storming.

If I have time, I might write more articles about other architectural styles and how we practice DDD with implementation and project structure.

If you have any career or business opportunities to share with me, please contact me through LinkedIn.

DDD(Domain Driven Design)-3 Key elements of DDD

In the previous post, I discussed the Domain and Subdomain concepts of DDD and what we need to avoid when we define them.

To better elaborate on my next post regarding various software architecture styles and how we can integrate DDD with some of them, I will explain the critical concepts of DDD in this article.

Ubiquitous Language

Ubiquitous Language is the Domain-driven design term that describes common, rigorous language that developers and stakeholders share within specific boundaries.

It helps teams develop standardized terms and language so that all stakeholders have a consensus. It aims to assist us as product managers/owners, QAs, developers, and architects in accurately expressing business and understanding requirements.

I used to work for a telecom company, and my project has the term MVP. MVP generally stands for minimal viable product, but within our project scope, it refers to managed voice platform. It will bring confusion if we don’t clearly define it. That’s why we need Ubiquitous Language.

Bounded Context

Bounded Context is the boundary within a domain where a particular domain model and ubiquitous language apply. It is the central pattern of DDD that helps us avoid ambiguity and uncertainty in our software development.

Domain Model

Domain model is a structured visual representation of interconnected concepts or real-world objects that incorporates vocabulary, key concepts, behaviours, and relationships of all its entities. 

In a business logic implementation, a domain model is presented as a business object model, which describes the interconnected relationship between our business objects.

Business Objects(BO)

A BO model generally has 3 main components: business roles, business entities, and business use-case.

Business Roles: The roles play in our projects/systems, possessing inherent properties and a series of responsibilities, e.g., E-commerce customer service staff responsible for answering sale inquiries, order management, refund, product return, etc.

Business Entities: The components that are necessary for business roles’ interaction, e.g., product and invoice in an E-commerce project.

Business use-case: The workflow that business roles and business entities work together to execute, e.g., the action chain of product searching, product browsing, ordering, payment, deduction from inventory, invoicing, and delivery.

These 3 components together form a business object model.

4 categories of domain models

Blood Loss Model

  1. Classes with only get and set without other logic, not even a simple sorting method.
  2. Pros: It has a simple structure.
  3. Cons:
    • It is hard to maintain with inflated business logic
    • It is hard to satisfy frequent changing requirements as it doesn’t have the Dao layer. All logics, including implementation code associated with JDBC invocation, reside in the service layer.

Anemia Model

  1. Classes added atomic domain logic based on the Blood loss model, such as sorting and pagination methods.
  2. Classes have attributes and innate behaviors/methods(i.e., belong to the domain model without persistence needs).
  3. Other logic or non-innate behaviors that require database interaction resides in the business logic layer.
  4. It is the recommended model by the Spring ecosystem.
  5. Pros:
    • It has a clear layer structure.
    • It has single-direction dependency across layers.
    • It is easy to understand and brings fast development benefits for applications with smile business logic.
  6. Cons:
    • It can’t gracefully cope with highly complicated logic and business scenarios.
    • It has low cohesion in projects’ core domains.
    • It doesn’t comply with some object-oriented programming principles.

Congestion Model

  1. It is more in line with object-oriented programming principles than the Anemia model.
  2. Domain objects contain both inherent and non-inherent logic. The business logic layer is only responsible for consolidation operations and transaction encapsulation, while domain objects take care of other operations.
  3. Pros:
    • It has a thinner business logic layer, which is more in line with the single responsibility principle.
    • The business logic layer doesn’t interact with the Dao layer.
  4. Cons:
    • It requires engineers to have robust implementation and design skills because it’s difficult to distinguish whether we should put specific business logic in the business logic layer or domain objects in particular scenarios.
      • We will need to write business logic to domain objects in some scenarios. Unfortunately, it could lead to cascading errors if we create some bugs during the process.
    • A higher cost to hire better engineers compared to using the Anemia model is one of the main reasons the Anemia model is more popular.
    • It has a higher instantiation cost, as many operations reside in the domain model.

Bloating Model

  1. It has a simple layer structure.
  2. There’s no service layer in the bloating model, only domain objects and Daos.
  3. Domain objects and their methods take care of transactional operations, authorization, etc.
  4. Pros:
    • It has a simple architecture layer.
    • It complies with object-oriented programming principles.
  5. Cons:
    • As domain objects contain business logic and many other responsibilities, the code maintainability is terrible.
    • As domain objects’ domain logic takes care of transaction encapsulation and authorization, it contains some logic that does not necessarily belong to the domain objects causing greater code complexity compared to other models.
    • Engineers might need to refactor and redesign the domain model over time as increased requirements changes would cause logical chaos and unstable domain models.

Decision-Making Overview

The Anemia model is the most popular model which Spring ecosystem recommends. In contrast, the Congestion model has higher requirements for the dev teams, better system cohesion, and is suitable for businesses with straightforward business logic, such as businesses in the insurance industry.

Preview of next post

My next blog post will discuss the features, pros, and cons of the MVC architectural pattern, 3-tier architecture, 4-tier architecture, 5-tier architecture, hexagonal architecture, and more. After the discussion, I will elaborate on the architectural pattern solution that can gracefully integrate with the DDD approach.

See you in my next post.

DDD(Domain Driven Design)-2 Apply DDD to design microservice architectures that genuinely satisfy our business evolution needs

This is a subsequent post of

How does DDD help us define business domain and model boundaries?

There are mainly two phases to achieve this. Here is a brief summary:

Strategy Design Phase:

  1. Architects, product managers/owners, and operation staff are responsible for this phrase.
  2. They start from the business perspective and circle around the business boundaries to define the domain models, domain boundaries, and bounded Context(i.e. the business boundary of microservice).

Tactical Design Phase:

  1. Business logic developers are responsible for this phrase.
  2. They start from the technical perspective and execute implementation according to outputs, such as domain models, from the strategy design phase.

Strategy Design

Domain — the scope of critical problems our system’s core business wants to solve

In DDD, we plan, execute, and resolve problems within the scope.

We can take a domain as a problem domain. Therefore, once we define the domain for the system, we already have the problem scope boundaries for our system’s core business.

It is the boundaries about what problems we solve, what we will do, to what extent, etc.

SubDomains

As a domain is a scope of wrapping critical problems together, they are big and we need to divide and conquer these problems to make things less complicated and more manageable. Hence, we have subdomains to achieve it.

There are three types of subdomains:

  1. Core subdomain — the must-have components of our system
  2. Generic subdomain — the share components of our system, similar to common.jar of our Java project
  3. Supporting subdomain — the nice-to-have components of our system, e.g. coupon service of a traditional e-commerce website.

Core subdomain

they are the most critical and most competitive subdomains of the system. They are the components that our business can’t live without.

To define core subdomains, we must understand the core value of our business. This leads to the domain vision statement. Take Alibaba as an example, its mission is to make it easy to do business anywhere. Their core value is trade empowerment. Components associated with it, such as the merchant system and transaction system, are the core subdomains.

Generic subdomain

They are the subdomains that other subdomains within the domain would like to use. The IAM(Identity and Access Management) module is one of the typical examples of it.

Supporting subdomain

They are neither core nor common features but essential in specific business scenarios. For example, coupon and last-minute trade modules are essential to particular workflows of a traditional e-commerce system, but the system can still work well without them.

Why it is a bad idea to copy domain models from similar businesses?

Let’s compare Alibaba Taobao and JD.com.

Both of them are e-commerce giants in China. However, Taobao is the premier C2C online marketplace in China while JD.com has a B2C business model. Their products have many common features.

However, copying domain models from each other would bring fundamentally wrong.

Taobao’s focus is trade empowerment while JD.com focus on quality and brand reputation.

Alibaba’s mission statement is to make it easy to do business anywhere. It invests the majority of its resources in facilitating trading between buyers and sellers. It’s the core value for the business to survive and thrive. Double 11 festival and Taobao Live are typical examples to express their core value. Ordering and merchant systems are its core subdomains.

On the other hand, JD.com dedicates most of its resources to ensuring product quality and protecting and promoting branding. JD’s delivery is super fast. It provides same-day delivery in some cities and next-day delivery in others. Not only the delivery service is fast and provides a better user experience, but it also ensures the product’s quality until the last moment before handing it to customers. Therefore, supply chain, procurement, warehousing, and delivery are all critical domains.

As they have different business models, their focuses are different. It leads to the divergence of software design and domain model design. Hence, copying models from other similar businesses wouldn’t be wise. It is also the main reason that we will need our architects to have in-depth domain knowledge to have the competence to design an ideal architecture.

Conclusion

We have looked at the Strategy Design and things to avoid when we define our domain models. In the following blog articles, I will explain more DDD concepts, such as domain models, various implementations, the pros and cons of different architecture models, and how we apply DDD to our architecture design.

Featured

DDD(Domain Driven Design)-0 architecture design with DDD

Technology empowers business. Most Internet companies are Business-Outcome-Driven Enterprise, where business determines the technology adoption, architecture adoption, and service boundaries. Most projects on the market cannot reflect the professionalism of software architecture design. We need a practical methodology to solve this problem, and DDD is an ideal formula.


DDD is a software architecture design methodology that facilitates engineers/architects to understand the business better and design tech solutions that satisfy changing business requirements. I’m about to write a series of blog articles to elaborate on this methodology. These articles cover architectural decision dilemmas, common misunderstandings about DDD, how we address these problems, DDD basic concepts, elaborate on the pros and cons of various architecture in detail, architecture design that complies with DDD practice, etc.

Directory

DDD(Domain Driven Design)-1 Apply DDD to design microservice architectures that genuinely satisfy our business evolution needs

Microservice boundary definition dilemma

Defining the boundary of services is inevitable regarding microservice architecture design. How micro does a service have to be to be considered a microservice? What are the ideal boundaries of services that can meet changing business requirements? What are the differences between SOA and microservices?

Many engineers claim that they work on microservices but apply the SOA approach.

Service-oriented Architecture(SOA) VS Microservice

SOA focuses on service reusability and resolves the information silos issue.

Microservices emphasized higher service granularity, service decoupling, and scalability.

From the above definition, it is evident that SOA has a more coarse-grained service partitioning compared to microservice.

Oversized services inevitably introduce certain redundancy, relatively poorer resource isolation, and loses some of the microservice’s benefits, while over-decomposed services bring service governance and operational difficulties. Imagine the nightmare of managing thousands of microservices for the latter case.

Microservices structure at Amazon

Therefore, we will need an approach to define the boundaries, modeling and etc.

What is the problem with starting with data modeling?

physical data model is a database-specific model that represents relational data objects. It couples with requirements. Requirements changes would cause data model changes, and the internet industry is rapidly iterating. We can expect that physical data models are no longer suitable for the business workflow if they are unthoughtfully defined. We will need something more conceptual, like context boundary, architecture design, specification, etc.

So, what approach can help us to design microservice architecture?

Define the domain model, divide the domain boundary, and then carry out the micro definition of microservices according to the outcome. As a result, the business and technical boundaries are reasonable, achieving high cohesion and loose coupling, which is the ideal state of microservices.

DDD is the formula to achieve the above outcome.

What is Domain Driven Design(DDD)?

Domain Driven Design(DDD) is an effective software design approach focusing on modeling software based on business-expert-defined business domains.

While it brings a great benefit to our software design, it has certain difficulties to implement it.

Before jumping into how to do it, the following sections will elaborate on what is required to apply the approach, why it is hard to implement it, and common mistakes we need to avoid.

What is required to apply DDD?

The crucial components in DDD are business models. Defining ideal business models requires us to have an in-depth understanding of our businesses. It brings difficulties to implement it.

Why are the difficulties to implement DDD?

  1. It requires architects to have rich experience in business domains and related techs.
    • Architects from other industries might not be able to define proper domain models.
  2. It requires each role in the team to have a considerable degree of professionalism to implement it.
    • Tech experts must be able to assist architects in completing the logical design phase; otherwise, the project will stall.
    • To implement it, all parties, including technical managers, technical experts, and developers from various levels, must be able to understand well-defined diagrams, such as conceptual diagrams.

If the project doesn’t have a particular volume and the company’s business model doesn’t have a robust business model, practicing such a design would lengthen the entire development cycle. This is because completing the holistic design following the approach consumes an obnoxious amount of time. The early conceptual design and the logical design phases are time-consuming. Many companies, especially startups, do not resources(e.g. time, experts, etc) to follow a standardized approach like this. Even worse, some startups’ core business domain varies frequently over time, and it will cost more resources to make adjustments.

However, it is still valuable to integrate multiple software design methods and DDD to formulate a design approach suitable for the scenarios of your projects and to construct solutions for complex business requirements. I am going to elaborate on how we do it later. Let’s start with some basic concepts.

Three phases to practice DDD

  1. Conceptual design (scope definition)
    • From stages between 1 to 3 in the above diagram, architects, product managers/owners, and operations staff work together to produce concept diagrams, concept classes, domain class diagrams, etc.
  2. Logical Design
    • Senior devs(e.g., tech lead, tech experts, senior developers, core developers, etc.) work with architects to do modeling according to the first step’s outputs. These outputs are business-modeling-related documentation and diagrams, such as use case diagrams, use case specifications, use case scenarios, use case views, business object models, etc.
  3. Physical Design
    • Developers work together to implement the requirements. They fix some emerging issues that they haven’t considered in previous steps. For example, some performance issues might require them to adjust models.
    • Developers and architects decide how to store data(e.g., indexing, read-write segregation, partitions, etc.).
    • Developers and architects define architecture software architecture styles(e.g., microservice, RESTful architecture, etc.)
      • We will know whether our models are suitable for microservice or not till this point.

Conclusion

We’ve looked at why we need DDD and three major phases in this blog article. In my next post, I will explain how DDD helps us to determine business domain and model boundaries, basic concepts of strategic design, and why copying models online or similar business is not a good idea.

Spring Boot + AWS Aurora MySQL Read Write Segregation

MySQL Read write segregation brings a great amount of benefits to our applications in terms of performance and stress resistance. These includes:

  1. The capability of query process increase as the number of physical machines increase;
  2. The reader and writer response for read and write respectively greatly alleviates exclusive lock and share lock contention;
  3. We can independently improve and optimize the read, e.g. using myiasm engine setting with specific optimization parameters;
  4. Synchronization via binlog;
  5. Improve availability.

AWS Aurora has made read write segregation really simple.

In the following section, I will explain how to write a SpringBoot application to work with Aurora reader and writer. We will first create the Aurora cluster then writing an annotation to enable classes or methods to use reader or writer as data source.

Step 1: create an Aurora cluster with reader and writer

Step 2: The Spring Boot application configuration

spring:
  datasource:
    reader:
      username: admin
      password: [your_password]
      driver-class-name: com.mysql.cj.jdbc.Driver
      jdbc-url: jdbc:mysql://your_reader_url:3306/commentdb?serverTimezone=UTC&useUnicode=true@characterEncoding=utf-8
      pattern: get*,find*
    writer:
      username: admin
      password: [your_password]
      driver-class-name: com.mysql.cj.jdbc.Driver
      jdbc-url: jdbc:mysql://your_writer_url:3306/commentdb?serverTimezone=UTC&useUnicode=true@characterEncoding=utf-8
      pattern: add*,update*
  jpa:
    hibernate:
      ddl-auto: update

server:
  port: 8080

Step 3: The Dynamic Data Source

We will extends the AbstractRoutingDataSource class to write our dynamic data source

public class DynamicDataSource extends AbstractRoutingDataSource {
    public DynamicDataSource(DataSource defaultTargetDataSource, Map<Object, Object> targetDataSources) {
        super.setDefaultTargetDataSource(defaultTargetDataSource);
        super.setTargetDataSources(targetDataSources);
        super.afterPropertiesSet();
    }

    @Override
    protected Object determineCurrentLookupKey() {
        return DynamicDataSourceContextHolder.getDataSourceType();
    }
}

We will have a DynamicDataSourceContextHolder class internally using ThreadLocal to ensure

every thread has its own data source context which won’t be changed by other threads.

public class DynamicDataSourceContextHolder {
    /**
     * Use ThreadLocal to make sure every thread has its own data source context
     * which won't be changed by other threads
     */
    private static final ThreadLocal<String> CONTEXT_HOLDER = new ThreadLocal<>();

    public static void setDataSourceType(String dataSourceType){
        CONTEXT_HOLDER.set(dataSourceType);
    }

    public static String getDataSourceType(){
        return CONTEXT_HOLDER.get();
    }

    public static void clearDataSourceType(){
        CONTEXT_HOLDER.remove();
    }
}

Step 4: The DataSource annotation

We will have a DataSource annotation, which can annotate both classes and methods.

Here’s how it works:

  1. If a class is annotated with @DataSource, all its methods matches the regex rules we specify in our application.properties will switch to their related datasource.
  2. If both a method and its class are annotated with @DataSource, only the method annotation will take effect.

The DataSourceType enum

public enum DataSourceType {
    READER,
    WRITER
}

The DataSource Annotation

@Target({ ElementType.METHOD, ElementType.TYPE})
@Retention(RetentionPolicy.RUNTIME)
@Documented
public @interface DataSource {
    DataSourceType value() default DataSourceType.WRITER;
}

The DataSourceAspect

@Aspect
@Order(1)
@Component
public class DataSourceAspect {
    private final Pattern READ_PATTERN;
    private final Pattern WRITER_PATTERN; // by default we use writer

    public DataSourceAspect(@Value("${spring.datasource.reader.pattern}") String readPattern,
                            @Value("${spring.datasource.writer.pattern}") String writerPattern) {
        READ_PATTERN = Pattern.compile(getRegex(readPattern));
        WRITER_PATTERN = Pattern.compile(getRegex(writerPattern));
    }

    private String getRegex(String str) {
        return str.replaceAll("\\*", ".*")
                                 .replaceAll(" ", "")
                                 .replaceAll(",", "|");
    }

    @Around("within(@com.osfocus.springbootreadwritesegregation.annotation.DataSource *)")
    public Object around(ProceedingJoinPoint point) throws Throwable {
        MethodSignature signature = (MethodSignature) point.getSignature();
        point.getTarget();
        Method method = signature.getMethod();
        DataSource dataSource = method.getAnnotation(DataSource.class);
        if (dataSource != null) {
            // In order to have higher granularity,
            // I make method level annotation has higher priority than the class level.
            DynamicDataSourceContextHolder.setDataSourceType(dataSource.value().name());
        } else {
            if (READ_PATTERN.matcher(method.getName()).matches()) {
                DynamicDataSourceContextHolder.setDataSourceType(DataSourceType.READER.name());
            } else {
                DynamicDataSourceContextHolder.setDataSourceType(DataSourceType.WRITER.name());
            }
        }

        try {
            return point.proceed();
        } finally {
            // clear data source after method's execution.
            DynamicDataSourceContextHolder.clearDataSourceType();
        }
    }
}

We will have Comment related Controller, Service and Repository to simulate requests to read, update and write comments to social media platforms.

We annotate the CommentServiceImpl service

@Service
@DataSource
public class CommentServiceImpl implements CommentService {
    @Autowired
    private CommentRepository commentRepository;

    public List<Comment> findTop25() {
        return commentRepository.findTop25ByOrderByIdDesc();
    }

    public void addComment(CommentDTO commentDTO) {
        commentRepository.save(Comment.builder()
                                      .content(commentDTO.getContent())
                                      .build());
    }

    public void updateLastComment(CommentDTO commentDTO) {
        Optional<Comment> lastCommentOpt = commentRepository.findTopByOrderByIdDesc();
        if (lastCommentOpt.isPresent()) {
            commentRepository.updateLastComment(lastCommentOpt.get().getId(), commentDTO.getContent());
        }
    }
}

Last Step: The JMeter test

Without Read/Write segregation

With Read/Write Segregation enable

As we can see from the above two screenshots of the simple experiment, there’s improvement on the throughput.

In the experiment, the db instances are still way below its upper capability and they used the default parameter groups. As an extra machine involved, its capability of handling queries is certain to have dramatically improvement. Also, we can play around with the parameter group to make its performance better.

Here’s the code: https://github.com/adrianinthecloud/spring-aurora-read-write-segregation

Have fun.

Domain Driven Design(DDD) and Microservice decomposition

Mindset

There’s no silver bullet for software engineering.

Keep what is good for the business from various methodologies, avoid or troubleshoot the bad is my approach to tackle problems.

Why Microservices? Why decompose our monolithic system into microservices?

Technology is to enable business.

Here are the key factor and benefits that microservices bring to businesses:

  1. Ultimate goal: high cohesion, loose coupling, and these are the basement to support the following benefits.
  2. Agile software development and testing with clear team responsibility and isolated dependencies, which in result fasten the time to market providing first-mover advantage to the business
  3. Locality benefit(Principle of locality) and better resource(CPU, Memory, and etc) usage because when the application run, it doesn’t has to load unnecessary resource, such as cache, and execute useless schedule tasks.
  4. Dynamic scaling with cloud and containerisation technologies to solve issues like heavy traffic in high concurrent scenario.
  5. Better communication with ubiquitous language.

What are the potential problems regarding microservices?

  1. Performance issue, RPC(Remote procedure call) brings extra latency to process a request.
  2. Microservice governance difficulty, data consistency, link tracking and monitoring are tip.
  3. Greater complexity compared to monolithic systems in terms of development. Engineers will face those distributed systems design challenges such as distributed transaction, distributed ID, distributed locks, data consistency, and etc.

How to optimise the advantages and minimise the disadvantages?

  1. According the business requirements, unless having the necessary to have the advantages like those mentioned above, don’t decompose to microservices;
  2. Having clear bounded context;
  3. Benchmark performance
    1. Should limit to 3 to 5 microservices to process a request;
    2. When having performance issue, figure out which service is the bottleneck of the process link, applying caching, allowing certain level of data redundancy to reduce the latency or shorten the link.
    3. If the point 2 cannot solve the performance issue, then merging certain microservices should be taken into consideration.

Sample approaches:

DDD issues

While DDD has many benefits over a long term software development, it has certain severe issues that hinders its application on software development.

Here are some of the typical examples:

  • No mature and rich community support framework to implement this methodology till now.
  • Decomposition of modules is hard.
  • The aggregate root could be cumbersome. Within the domain, there could be duplicate methods. It is hard to make an abstract parent class as some of the methods has no direct relation with certain children classes. We might end up need experts with in-depth experience to extract those common methods to a common module.

Despite these problems, what we can take away from DDD?

The concept of Bounded Context

A bounded context is simply the boundary within a domain where a particular domain model applies.

25 Feb 2019 Domain analysis for microservices – Azure Architecture

In simple language, it is the boundary that a service/business can self-sufficient to achieve specific purpose with its own rules, procedures, process, and etc.

We can apply the bounded context concept to decompose our monolithic system into different domains. Sample event storming approach will be provided in later section of this artcile.

Types of Domain

Core domain: must have things, the things that all your strategic resources must invest in.

Support Subdomain: could have things, less important things that can be outsourced.

Generic Subdomain: things that everybody want. For example, authentication service that most if not all services of your system will need.

Conventional strategies to divide into subdomains

  1. bounded by business responsibility(kinda like SOA);
  2. bounded by feature, more about microservices with higher granularity compared the first one;
  3. bounded by organisational structure;
  4. the two-pizza rule

You can have a boundary large like a department store. You can also make it small like an exquisite boutique only selling specific type of goods. It really depends on what works well for your business.

Again, technology is to enable business.

The difference between hallucination and vision here is just two words, business needs.

Find the boundary with event storming

Identify domain events with 3 key factors:

  1. having business benefits
  2. in the past tense
  3. ordered by time

The Process

Strangler Vine Approach

A strangler vine grows around a tree to reach the sunlight above the forest canopy. Eventually, the host tree dies leaving a tree-shape vine.

Strangler Fig - Rain Forest Reports

Basically, with the bounded context, decompose certain part of the monolithic system into microservice. At the beginning, the extracted microservice can share db with the original service, but eventually it will has its own db and RPC to original service for some API invocation.

You might need to write glue code to assist the communication between the new microservice and the old service.

Where to start?

  1. Some modules that are simple to extract.
  2. Some modules that are frequent changed.
  3. Some modules that have the most business benefits.
  4. Some modules that implemented with heterogeneous technologies.
  5. Some modules that in your high priority.

What Patterns can apply?

Integration Patterns

  1. API Gateway
  2. Aggregator
  3. Proxy
  4. Chained
  5. Branch
  6. Client-side UI Composition Pattern

Cross-Cutting Patterns

  1. Service Discovery Pattern(Eureka, ZK)
  2. Circuit-Breaker(Hystrix)
  3. Client-side load balancing(Ribbon, Spring Cloud LB)
  4. External Configuration(Apache Apollo, Spring Cloud Config with RabbitMQ)

Database Patterns

  1. Database per service
  2. Shared Database per Service
  3. Saga

Decomposition Patterns

We’ve talked about these in previous sections.

  1. Strangler Pattern
  2. Decomposition by business capability
  3. Decompose by subdomain(DDD)

Observability Patterns

  1. Log Aggregation
  2. Distributed Trancing(Apache SkyWalking, Slueth + ZipKin)
  3. Health Check(Actuator)
  4. Metrics(SpringBoot Admin)

Distributed System Design

Distributed Transaction

  1. 2PC, 3PC, TCC
  2. Transaction Outbox Pattern(It’s local transaction table + MQ)
    1. https://masteranyfield.com/2021/07/26/dealing-distributed-transactions-with-2pc-3pc-local-transaction-table-with-mqs/ I’ve talked this pattern at the end of this post.

Distributed Lock

  1. Redis + MySQL Optimistic Lock
  2. RedLock
  3. ZK

Distributed ID

  1. UUID(Not storage friendly, not suitable for many scenarios, security issues, not recommend)
  2. MySQL Auto Increment ID(Poor approach, low performance, single node failure, not recommend)
  3. Redis (not reliable enough, AOF reduces performance, master-slave asynchronous synchronisation would result in data loss)
  4. ZK + Redis (Recommend, ZK for consistency, decent performance with data consistency)
  5. ZK with SnowFlake (Recommend)

I will go through how these techs works and their implementation details when I have time in the future.