Simplifying Facebook Marketplace System Design and Operations with AWS Cloud

Designing a traditional classified ads e-commerce system, similar to Facebook Marketplace, presents challenges in server procurement, maintenance, and scalability to manage high traffic and growing business complexity. These issues result in elevated costs and reduced efficiency with business growth. To achieve cost optimization, enhance system performance, and ensure security, migrating workloads to AWS is a viable solution.

This article explores the strategy with a simple example, similar to Facebook Marketplace and Gumtree.

System Requirement Clarification

For simplicity, we focus on the following core requirements of the system:

Functional Requirements(FR):

  1. Sellers must be able to publish, edit, and delete products.
  2. Users must be able to search for products using keywords.
  3. The system must enable buyers to send inquiries and notify sellers via email.

Non-functional Requirements(NFR):

  1. The service needs to be highly available to ensure consistent and reliable access for users.
  2. The service needs to be highly scalable to effectively accommodate the growing number of users and interactions.
  3. The service needs to be highly secure, encompassing features such as data encryption in transit and at rest, along with robust user authentication and authorization mechanisms.
  4. The service needs to be highly extendable, supporting the integration of emerging features and technologies, such as machine learning for enhancing recommendation systems.

Design Consideration

  • The system is read-heavy with an estimated read-write ratio of 100:1
  • All product posts, including up to 9 photos each, require reliable and near real-time retrieval.

Capacity Estimation

The storage requirements are primarily determined by four factors: the number of posts, the volume of user data, storage needs for Elasticsearch indexing, and memory requirements for the distributed caching system.

  1. Assuming a total user base of 200 million, we have 1 million daily active users.
  2. Assuming a 100:1 read-write ratio, approximately 10,000 new posts are created daily.
  3. Assuming an average of 50 words per post and 5 bytes per word.
  4. Assuming each post contains an average of 5 photos, each photo utilizing 2MB of block storage for different compression levels.
  5. Assuming the business has operated at this scale for a continuous period of 5 years.

For simplicity, we’ll assume that the storage used before reaching the current scale is equivalent to one year’s storage at the current scale.

This estimation serves as a simplified representation of the core concept for illustrative purposes. In practice, more complex scenarios and requirements, such as logging and data for machine learning, need to be considered.

The storage required for posts will be

 10000*365*(5+1)*50*5 + 10000*365*(5+1)*(2048*1024)*5 ≈ 209TB

where daily new posts require around 97.66GB of storage.

User storage requirements:

  1. User ID (8 Bytes): This should be sufficient to support 200 million users and provides ample scalability.
  2. Username (32 bytes): A 32-byte username length should be sufficient for most cases, allowing for relatively long usernames.
  3. Email (64 bytes): A 64-byte email address length should be sufficient to support various email address formats.
  4. Password (16 bytes): A 16-byte password length is typically enough, but ensure a secure password storage scheme, such as hashing with salt.
  5. Language (2 bytes): A 2-byte language code is a standard way to represent language and is usually sufficient.
  6. Avatar URL (200 bytes): A 200-byte avatar URL length allows for relatively long URLs for storing avatar links.

Storage Space (bytes) = Bytes per single user record x Number of users

Storage Space (bytes) = 322 bytes x 200,000,000 users = 64,400,000,000 bytes
Storage Space (GB) = 64,400,000,000 bytes / 1,073,741,824 ≈ 59.88 GB

Elasticsearch and Redis capacity estimation

For this classified platform, where user interest predominantly lies in the latest ads, a cost-effective approach involves Elasticsearch indexing only the content from the most recent three months. Concurrently, Redis will cache the top 20% of the most accessed content from the same period, employing a Least Recently Used (LRU) strategy for efficient data management. Additionally, Redis caches the top 10 pages of each category, as users often browse these top pages in their areas of interest. While caching other popular local pages is a consideration, it is currently beyond the scope of this discussion.

Daily Text Storage (Bytes)=Daily Posts×Words per Post×Bytes per Word
Given: Daily Posts = 10,000, Words per Post = 50, Bytes per Word = 5
So, Daily Text Storage=10,000×50×5=2,500,000 Bytes
Three-Month Storage=2,500,000×30×3=225,000,000 Bytes

Elasticsearch Storage Requirement (3x Original Data Size): ~0.63 GB

Elasticsearch might require up to 3 times the original data size for indexing.
Max Index Storage (GB) = Three-Month Storage×3
Max Index Storage (GB) = 225,000,000×3 / 10243 ≈ 0.63 GB

Redis Storage Requirement(assuming 108 categories):

Redis Cache Storage = Three-Month Storage × Percentage for Caching + (108 categories storage with 10 pages per category and 32 items per page)
Redis Cache Storage = 0.21GB × 0.20 + 0.00864GB = 0.05064GB

High-Level Design

Search Requests:

  • Incoming search requests are directed through the Node.js Gateway.
  • The Gateway forwards requests to Elasticsearch for search operations.
  • Elasticsearch processes the request and sends back results, optimizing search functionality.

Category and Product Page Retrieval:

  • When category page requests arrive:
    • The Node.js Gateway first checks the Redis cluster for cached results, utilizing a Redis sorted set to fulfill this requirement.
    • Cached results are immediately served if available.
    • In the absence of a cache, Node.js Gateway sends the request to the Application Service Cluster.
    • Results from the Application Service Cluster are stored in Redis for future use.

React App is hosted on an Nginx Cluster

HDFS for Static Files Access:

  • Static files, such as images and videos, are accessed through the HDFS for Static Files, which is managed by the Node.js Gateway.
  • Centralized static file services enhance manageability and scalability.

Advertisement Updates:

  • For add/update/delete requests regarding advertisements:
    • The Application Service Cluster processes requests and sends relevant data to a message queue (MQ).
    • A dedicated “Advertisement Update Service” monitors the MQ.
    • This service handles advertisement changes, ensuring synchronization with Elasticsearch and Redis Cluster.
    • Periodic or on-demand operation guarantees data consistency and updates.

Sharding

To evenly distribute Product Post storage and traffic across different shards, we’ll employ Consistent Hashing based on Product IDs. We’ve addressed high-frequency access to category pages and filtered searches with Redis and Elasticsearch.

For range queries, we query all MySQL shards, and then a Consistent Hash Load Balancer aggregates and filters Top N Items from all shards before returning the Top 10 Items to App Services or relevant calling services.

This design efficiently balances data and handles performance requirements, although it’s essential to monitor query complexity as data scales for optimal system performance.

Challenges

As evident from the previous discussions, meeting system requirements involves considering various scenarios from the outset. This entails procuring an adequate number of servers to accommodate business and user growth needs. Common challenges we’ll encounter include:

  1. Server Procurement and Deployment: How to dynamically scale server resources based on demand? There may be idle servers during off-peak hours, leading to resource wastage, yet they’re essential during peak periods.
  2. Daily Operations and Maintenance: Responsibilities encompass Gateway maintenance, cluster scaling strategies, regular backups, security patch management, monitoring, and performance adjustments.
  3. Security and Compliance: Ensuring regular credential updates to meet compliance requirements while managing data encryption, identity verification, access control (possibly through customized Redis API rate limiters), DDoS protection, IP restrictions, and more.
  4. Disaster Recovery

Although practices like automated operations and continuous integration/continuous delivery (CI/CD) can mitigate some of these challenges to a certain extent, it’s crucial to acknowledge the associated human and resource costs.

Questions

The challenges outlined earlier raise a fundamental question: How can we reduce costs, particularly in terms of operational expenses, while simultaneously enhancing system performance, and availability, ensuring data security and compliance, and meeting the demands of business growth and user expansion?

Solutions

Now, let’s delve into how AWS provides solutions to tackle these challenges.

Design Details

The AWS cloud architecture has enabled us to tackle server procurement and deployment, operational costs, security, and disaster recovery more efficiently and effectively.

Route 53

By configuring Route 53, we meet our disaster recovery (DR) requirements, implementing a multi-location DR strategy. Additionally, Route 53 helps us deliver lower-latency services by intelligently routing traffic to geographically closer locations, enhancing user experience and providing more reliable services.

AWS WAF

WAF, integrated with CloudFront, effectively enhances the security of our services. It monitors traffic through configured rules and takes actions, such as IP blocking, upon detecting abnormal traffic patterns, effectively mitigating Distributed Denial of Service (DDoS) attacks. It can also be used for IP restrictions, like limiting access from specific geographic regions.

Cost Optimization

Although capacity estimation offers more controlled cost management and handles scenarios like traffic surges, using AWS cloud greatly simplifies server procurement and deployment. Additionally, the elastic scaling feature of cloud computing ensures more efficient resource utilization.

Static and Dynamic Segregation

S3 will be used to host front-end web app files and other static files like images, effectively reducing operational costs for HDFS and Nginx clusters. Additionally, as a highly available service, S3 supports cross-region data redundancy through Cross Region Replication (CRR) to optimize latency and aid disaster recovery under critical conditions.

Optimizing Application Service Clusters

API Gateway serves as the main gateway between front-end and back-end services, offering numerous out-of-the-box features such as rate limiting, security controls, and integration with serverless services like Lambda, greatly reducing our development and operational costs. As a regional service, each API Gateway can achieve 99.9% availability. Deploying it across multiple regions can further decrease latency and improve availability.

App Services instances deployed in an Auto Scaling group managed by ALB with memory-based scaling rules allow for cost-effective operations. Using Reserved Instances for regular traffic and On-Demand Instances for peak traffic efficiently reduces operational costs, improves resource utilization, and minimizes the waste of idle server resources seen in traditional on-premises models.

Replacing existing services with AWS Application Load Balancer, Amazon Elasticsearch Service, and Amazon ElasticCache for Redis significantly lowers operational costs while enhancing availability and security.

These services can be easily configured, scaled, and monitored by teams. Configuring deployments across multiple availability zones ensures failover capabilities. Elastic scaling further enhances efficient resource utilization.

AWS Aurora

AWS Aurora will replace MySQL, capitalizing on its compatibility with MySQL and performance advantages in high-load and large-scale data processing scenarios. AWS Aurora offers not just improved performance but also superior scalability.

Moreover, the use of AWS Aurora Global Database will further enhance system availability and reduce data access latency. It automatically replicates data across multiple regions, ensuring high data availability and durability. By providing data in regions closer to users, it effectively decreases access latency.

To manage loads more efficiently, we will incorporate Lambda services at the forefront of our architecture to implement consistent hashing. Lambda’s serverless and on-demand computing capabilities are ideal for handling irregular or burst workloads, thus further boosting the overall architecture’s flexibility and scalability.

Overall, combining AWS Aurora’s high-performance database solution with the flexibility of Lambda services, this architecture is designed to improve performance, enhance availability, and optimize user experience.

Migration

Considering the challenge of migrating hundreds of terabytes of data, AWS Snowball is an ideal choice. Snowball is designed for efficient and secure large-scale data transfers. Our plan involves initially using Snowball to migrate the bulk of the data, followed by utilizing AWS Database Migration Service (DMS) for synchronizing new data updates. Once all data synchronization is complete, we will carefully execute the database switchover to ensure a seamless business transition. While this discussion does not detail the migration of other services, it’s worth mentioning that tools provided by AWS, such as Service Discovery, play a crucial role in service orchestration and server performance assessment during the migration process.

DDD(Domain Driven Design)-4 Find the software architecture to apply Domain Driven Design

In this article, I am going to discuss what architecture can ideally integrate with DDD and how it works. To achieve this goal, I am going to walk through the features, pros, and cons of the MVC architectural pattern, 3-tier architecture, 4-tier architecture, 5-tier architecture, and hexagonal architecture. You might want to read my previous post to better understand the concept of the coming sections.

Previous post

Three-tier architecture

It’s a simple layered architecture. Most projects can apply it. We can integrate it with the MVC Architectural Pattern.

Why do we need architecture layers?

All logical implementations are tightly coupled in a package, which undoubtedly brings a heavy maintenance cost and has poor scalability.

The layered architecture design aims to help us achieve high cohesion, low coupling, scalable, and reusable design. We can segregate responsibilities and limit change impact within each layered. However, it is a misunderstanding that having more layers brings more benefits to a system. Fewer layers mean that a workflow traverses less logic and less overhead to do so. We need to balance the trade-off to have a practical layer design.

There are two main types of layered architectures:

  1. Strictly layered architecture, which only allows interaction with the layer below.
    • e.g. an upper presentation layer is only allowed to interact with the business logic layer below it.
  2. Relaxed layered architecture, it allows a layer to interact with any layers below it.

What should we do as architects?

As architects, we need to define standards that other engineers follow. We will need to document them and have all associated engineers’ awareness of them so that they won’t break the design during implementation. Documentations are essential here. If we only notify engineers in a verbal manner, people would forget over time and take them less seriously.

Four-tier architecture

To better fit the business and implement our domain-driven design, layered architecture design can help to highlight the benefits of the domain model concept. However, as we can see from the three-tier architecture, the database is its starting point, which engineers generally start with database modeling, and DDD’s focus is not on the database model, but on the domain model concept. It takes domain models as the basis for analysis to design architecture hierarchy.

What is the DDD solution for this?

Eric Evans provided a traditional four-tier architecture as the DDD solution in his book, Domain-Driven Design: Tackling Complexity in the Heart of Software.

It is a Relaxed layered architecture.

We can have the following interaction:

  • User Interface –> Application, Domain, and Infrastructure
  • Application –> Domain and Infrastructure
  • Domain –> Infrastructure

Five-tier Architecture

Let’s take a look at the popular MVC Architectural Pattern.

MVC Architectural Pattern

Model–view–controller (MVC) is a software architectural pattern commonly used for developing user interfaces that divide the related program logic into three interconnected elements. 

Wikipedia

It’s highly recommended by the Spring ecosystem. However, it has certain inherent defects.

What are the problems of MVC Architectural Pattern?

  1. There’s implementation redundancy across various layers.
    • Human has the tendency to seek shortcut. MVC pattern itself doesn’t stop developers from writing business logic on Controllers, which brings redundancy across the controller layer and other layers. It is a prevalent issue that occurs during development.
    • As more logic putting into controllers, they become complicated over time.
  2. Controllers are heavily dependent on models. In terms of cross-layer interaction, Controllers are strongly dependent on services while services are heavily dependent on Daos. Models play the intermediate medium role during the interaction. This means that model prototypes need to be as simple as possible.
  3. Objects are coupled together. For example, while we send a query request to a controller, a user object is passed to an order object.
  4. It neither complies with object-oriented programming principles nor domain-driven design principles. It’s the Anemia Mode. Roles objects(referred to as domain objects in DDD) only contain get and set methods. MVC is inherently in conflict with DDD in design, where DDD emphasizes the domain concept and behavioral interaction between roles/entities.
    • MVC is more of a structural design pattern, while DDD is more of a behavioral design pattern.

What can we do to improve the MVC pattern?

The DCI(Data, Context, and Interaction) Architecture Pattern

Trygve Reenskaug, the inventor of MVC

James O. Coplien, author of Multi-Paradigm Design for C++

Trygve and James together invented and refined the formulation of DCI architecture. It’s complementary to the MVC pattern. It perfectly fits the domain-driven design.

The DCI architecture design in the above diagram has the physical separation of models and behaviors. Data objects are static objects which don’t possess any business logic and only contain the get, set, and some basic behaviors, such as sorting and pagination. It’s the Anemia Model. In addition, users’ behaviors and actions are extracted to those RoleObjs. They have a two-way binding relationship. As both a role object and a data object have their own boundaries, binding them together forms another boundary. As a result, we achieve high cohesion within the boundary.

The context layer is a simple and thin layer working as an interaction entrance. It is responsible for business scenario aggregation and carrying out interactions between contexts.

An e-commerce example of how DCI takes the benefit from both the anemia model and congestion model.

Let’s take a look at a product order workflow.

In an SOA or Microservices architecture, we generally have the order, product, and user modules for the above actions. Each of them contains related classes with some associated methods.

We normally have the following two approaches to handle this workflow:

  1. We aggregate behaviors, across several subdomains, in the current domain or specific service, e.g., a business middle platform. It’s an anemia model. Domain objects only contain some get, set, and basic methods. It’s simple and suitable for fast development.
  2. Domain models contain all associated objects’ behaviors to complete the business chain. It’s a congestion model. Domain models have lots of complicated both innate and non-innated behaviors. They might also cover some behaviors that belong to other domain models. It’s hard to define the abstract layer. As the business chain might involve interactions between several modules, data relations could be complicated, and it would be hard for us to define repositories and behavioral boundaries. It requires our dev team with robust skills and experiences in the domain and techs to manage it.

Both the anemia and congestion model have their pros and cons in scenarios like this. We can apply the DCI architecture pattern to put behaviors into those Role objects and data belonging to the same data boundaries into data objects.

Finally, we achieve the following benefits with the DCI pattern:

  1. Physical segregation and reference between Role objects and data objects to achieve high cohesion and loose coupling between data and behaviors;
  2. Resolve the issue of inconsistency between data and behaviors.

As we can see, the DCI architecture pattern is more flexible compared to the MVC pattern. It serves better for domain-driven design as it fits domain models better. We can practice domain-driven design by applying the DCI architecture design paradigm with Four-Color Modeling, an approach to analyzing requirements.

After we introduce the DCI architecture design pattern into 4-tier architecture, we have the following 5-tier architecture.

Five-tier Architecture — the derivative of the integration of the DCI architectural pattern and 4-tier architecture

It’s a Relaxed layered architecture. If business scenarios have complicated logic and interactions, we can also divide the domain layer into two adjacent layers, e.g., a transactional management layer and a context layer. The transactional management layer is responsible for the business logic chain execution while the context layer takes care of atomic action, such as a single synchronization message or roles’ interactions. As a result, we will have a six-tier architecture, which is also a variant form of five-tier architecture.

Six-tier Architecture

It is also a Relaxed layered architecture.

However, we might encounter system breakdown if there is infrastructure failure occur where no matter whether our architecture is strictly layered architecture or relaxed layered architecture. Also, in a strictly layered architecture, the upper layers break down if the layers below them malfunction.

How can we solve this problem?

We can apply the Dependency Inversion Principle to resolve this issue. The upper layers shouldn’t depend on the concrete implementation of the lower layers. Instead, they should rely on lower layers’ abstractions or interfaces. Also. the concrete implementations should also depend on abstraction.

There’s a more elegant architectural style, i.e., Hexagonal Architecture.

Hexagonal Architecture

Alistair Cockburn, one of the initiators of the agile movement in software development, invented the Hexagonal Architecture.

We can also call it ports and adapters architecture. It is a strictly layered, highly cohesive, and flexible architecture.

Features:

  • Domain Model: Core business logic
  • Public APIs: The internal application exposes public APIs for outer layer usage.
  • Two types of Adapters:
    • Driving Adapters are responsible for clients’ inbound requests.
    • Driven Adapters are responsible for outbound interactions, e.g., interaction with a Redis cluster.
  • Each adapter is paired with specific usage. (Single Responsibility Principle)
  • Interactions are strictly through APIs but not code implementation, i.e., concrete application layer implementation must not occur in the domain model.

Conclusion

We’ve looked at features of various architectural styles.

Again, we use technology to empower business. Tech is supposed to be driven by business requirements, and so do architecture design. Having more tiers doesn’t mean it’s better to serve our software system and development. We should design software architecture based on business scenarios. We need to consider the trade-off between the clear responsibility of multi-tier architecture and its cost. We need to avoid over-design.

If you are interested in DDD, I suggest you also take a look at four-color modeling and event storming.

If I have time, I might write more articles about other architectural styles and how we practice DDD with implementation and project structure.

If you have any career or business opportunities to share with me, please contact me through LinkedIn.

DDD(Domain Driven Design)-3 Key elements of DDD

In the previous post, I discussed the Domain and Subdomain concepts of DDD and what we need to avoid when we define them.

To better elaborate on my next post regarding various software architecture styles and how we can integrate DDD with some of them, I will explain the critical concepts of DDD in this article.

Ubiquitous Language

Ubiquitous Language is the Domain-driven design term that describes common, rigorous language that developers and stakeholders share within specific boundaries.

It helps teams develop standardized terms and language so that all stakeholders have a consensus. It aims to assist us as product managers/owners, QAs, developers, and architects in accurately expressing business and understanding requirements.

I used to work for a telecom company, and my project has the term MVP. MVP generally stands for minimal viable product, but within our project scope, it refers to managed voice platform. It will bring confusion if we don’t clearly define it. That’s why we need Ubiquitous Language.

Bounded Context

Bounded Context is the boundary within a domain where a particular domain model and ubiquitous language apply. It is the central pattern of DDD that helps us avoid ambiguity and uncertainty in our software development.

Domain Model

Domain model is a structured visual representation of interconnected concepts or real-world objects that incorporates vocabulary, key concepts, behaviours, and relationships of all its entities. 

In a business logic implementation, a domain model is presented as a business object model, which describes the interconnected relationship between our business objects.

Business Objects(BO)

A BO model generally has 3 main components: business roles, business entities, and business use-case.

Business Roles: The roles play in our projects/systems, possessing inherent properties and a series of responsibilities, e.g., E-commerce customer service staff responsible for answering sale inquiries, order management, refund, product return, etc.

Business Entities: The components that are necessary for business roles’ interaction, e.g., product and invoice in an E-commerce project.

Business use-case: The workflow that business roles and business entities work together to execute, e.g., the action chain of product searching, product browsing, ordering, payment, deduction from inventory, invoicing, and delivery.

These 3 components together form a business object model.

4 categories of domain models

Blood Loss Model

  1. Classes with only get and set without other logic, not even a simple sorting method.
  2. Pros: It has a simple structure.
  3. Cons:
    • It is hard to maintain with inflated business logic
    • It is hard to satisfy frequent changing requirements as it doesn’t have the Dao layer. All logics, including implementation code associated with JDBC invocation, reside in the service layer.

Anemia Model

  1. Classes added atomic domain logic based on the Blood loss model, such as sorting and pagination methods.
  2. Classes have attributes and innate behaviors/methods(i.e., belong to the domain model without persistence needs).
  3. Other logic or non-innate behaviors that require database interaction resides in the business logic layer.
  4. It is the recommended model by the Spring ecosystem.
  5. Pros:
    • It has a clear layer structure.
    • It has single-direction dependency across layers.
    • It is easy to understand and brings fast development benefits for applications with smile business logic.
  6. Cons:
    • It can’t gracefully cope with highly complicated logic and business scenarios.
    • It has low cohesion in projects’ core domains.
    • It doesn’t comply with some object-oriented programming principles.

Congestion Model

  1. It is more in line with object-oriented programming principles than the Anemia model.
  2. Domain objects contain both inherent and non-inherent logic. The business logic layer is only responsible for consolidation operations and transaction encapsulation, while domain objects take care of other operations.
  3. Pros:
    • It has a thinner business logic layer, which is more in line with the single responsibility principle.
    • The business logic layer doesn’t interact with the Dao layer.
  4. Cons:
    • It requires engineers to have robust implementation and design skills because it’s difficult to distinguish whether we should put specific business logic in the business logic layer or domain objects in particular scenarios.
      • We will need to write business logic to domain objects in some scenarios. Unfortunately, it could lead to cascading errors if we create some bugs during the process.
    • A higher cost to hire better engineers compared to using the Anemia model is one of the main reasons the Anemia model is more popular.
    • It has a higher instantiation cost, as many operations reside in the domain model.

Bloating Model

  1. It has a simple layer structure.
  2. There’s no service layer in the bloating model, only domain objects and Daos.
  3. Domain objects and their methods take care of transactional operations, authorization, etc.
  4. Pros:
    • It has a simple architecture layer.
    • It complies with object-oriented programming principles.
  5. Cons:
    • As domain objects contain business logic and many other responsibilities, the code maintainability is terrible.
    • As domain objects’ domain logic takes care of transaction encapsulation and authorization, it contains some logic that does not necessarily belong to the domain objects causing greater code complexity compared to other models.
    • Engineers might need to refactor and redesign the domain model over time as increased requirements changes would cause logical chaos and unstable domain models.

Decision-Making Overview

The Anemia model is the most popular model which Spring ecosystem recommends. In contrast, the Congestion model has higher requirements for the dev teams, better system cohesion, and is suitable for businesses with straightforward business logic, such as businesses in the insurance industry.

Preview of next post

My next blog post will discuss the features, pros, and cons of the MVC architectural pattern, 3-tier architecture, 4-tier architecture, 5-tier architecture, hexagonal architecture, and more. After the discussion, I will elaborate on the architectural pattern solution that can gracefully integrate with the DDD approach.

See you in my next post.

DDD(Domain Driven Design)-2 Apply DDD to design microservice architectures that genuinely satisfy our business evolution needs

This is a subsequent post of

How does DDD help us define business domain and model boundaries?

There are mainly two phases to achieve this. Here is a brief summary:

Strategy Design Phase:

  1. Architects, product managers/owners, and operation staff are responsible for this phrase.
  2. They start from the business perspective and circle around the business boundaries to define the domain models, domain boundaries, and bounded Context(i.e. the business boundary of microservice).

Tactical Design Phase:

  1. Business logic developers are responsible for this phrase.
  2. They start from the technical perspective and execute implementation according to outputs, such as domain models, from the strategy design phase.

Strategy Design

Domain — the scope of critical problems our system’s core business wants to solve

In DDD, we plan, execute, and resolve problems within the scope.

We can take a domain as a problem domain. Therefore, once we define the domain for the system, we already have the problem scope boundaries for our system’s core business.

It is the boundaries about what problems we solve, what we will do, to what extent, etc.

SubDomains

As a domain is a scope of wrapping critical problems together, they are big and we need to divide and conquer these problems to make things less complicated and more manageable. Hence, we have subdomains to achieve it.

There are three types of subdomains:

  1. Core subdomain — the must-have components of our system
  2. Generic subdomain — the share components of our system, similar to common.jar of our Java project
  3. Supporting subdomain — the nice-to-have components of our system, e.g. coupon service of a traditional e-commerce website.

Core subdomain

they are the most critical and most competitive subdomains of the system. They are the components that our business can’t live without.

To define core subdomains, we must understand the core value of our business. This leads to the domain vision statement. Take Alibaba as an example, its mission is to make it easy to do business anywhere. Their core value is trade empowerment. Components associated with it, such as the merchant system and transaction system, are the core subdomains.

Generic subdomain

They are the subdomains that other subdomains within the domain would like to use. The IAM(Identity and Access Management) module is one of the typical examples of it.

Supporting subdomain

They are neither core nor common features but essential in specific business scenarios. For example, coupon and last-minute trade modules are essential to particular workflows of a traditional e-commerce system, but the system can still work well without them.

Why it is a bad idea to copy domain models from similar businesses?

Let’s compare Alibaba Taobao and JD.com.

Both of them are e-commerce giants in China. However, Taobao is the premier C2C online marketplace in China while JD.com has a B2C business model. Their products have many common features.

However, copying domain models from each other would bring fundamentally wrong.

Taobao’s focus is trade empowerment while JD.com focus on quality and brand reputation.

Alibaba’s mission statement is to make it easy to do business anywhere. It invests the majority of its resources in facilitating trading between buyers and sellers. It’s the core value for the business to survive and thrive. Double 11 festival and Taobao Live are typical examples to express their core value. Ordering and merchant systems are its core subdomains.

On the other hand, JD.com dedicates most of its resources to ensuring product quality and protecting and promoting branding. JD’s delivery is super fast. It provides same-day delivery in some cities and next-day delivery in others. Not only the delivery service is fast and provides a better user experience, but it also ensures the product’s quality until the last moment before handing it to customers. Therefore, supply chain, procurement, warehousing, and delivery are all critical domains.

As they have different business models, their focuses are different. It leads to the divergence of software design and domain model design. Hence, copying models from other similar businesses wouldn’t be wise. It is also the main reason that we will need our architects to have in-depth domain knowledge to have the competence to design an ideal architecture.

Conclusion

We have looked at the Strategy Design and things to avoid when we define our domain models. In the following blog articles, I will explain more DDD concepts, such as domain models, various implementations, the pros and cons of different architecture models, and how we apply DDD to our architecture design.

Featured

DDD(Domain Driven Design)-0 architecture design with DDD

Technology empowers business. Most Internet companies are Business-Outcome-Driven Enterprise, where business determines the technology adoption, architecture adoption, and service boundaries. Most projects on the market cannot reflect the professionalism of software architecture design. We need a practical methodology to solve this problem, and DDD is an ideal formula.


DDD is a software architecture design methodology that facilitates engineers/architects to understand the business better and design tech solutions that satisfy changing business requirements. I’m about to write a series of blog articles to elaborate on this methodology. These articles cover architectural decision dilemmas, common misunderstandings about DDD, how we address these problems, DDD basic concepts, elaborate on the pros and cons of various architecture in detail, architecture design that complies with DDD practice, etc.

Directory

DDD(Domain Driven Design)-1 Apply DDD to design microservice architectures that genuinely satisfy our business evolution needs

Microservice boundary definition dilemma

Defining the boundary of services is inevitable regarding microservice architecture design. How micro does a service have to be to be considered a microservice? What are the ideal boundaries of services that can meet changing business requirements? What are the differences between SOA and microservices?

Many engineers claim that they work on microservices but apply the SOA approach.

Service-oriented Architecture(SOA) VS Microservice

SOA focuses on service reusability and resolves the information silos issue.

Microservices emphasized higher service granularity, service decoupling, and scalability.

From the above definition, it is evident that SOA has a more coarse-grained service partitioning compared to microservice.

Oversized services inevitably introduce certain redundancy, relatively poorer resource isolation, and loses some of the microservice’s benefits, while over-decomposed services bring service governance and operational difficulties. Imagine the nightmare of managing thousands of microservices for the latter case.

Microservices structure at Amazon

Therefore, we will need an approach to define the boundaries, modeling and etc.

What is the problem with starting with data modeling?

physical data model is a database-specific model that represents relational data objects. It couples with requirements. Requirements changes would cause data model changes, and the internet industry is rapidly iterating. We can expect that physical data models are no longer suitable for the business workflow if they are unthoughtfully defined. We will need something more conceptual, like context boundary, architecture design, specification, etc.

So, what approach can help us to design microservice architecture?

Define the domain model, divide the domain boundary, and then carry out the micro definition of microservices according to the outcome. As a result, the business and technical boundaries are reasonable, achieving high cohesion and loose coupling, which is the ideal state of microservices.

DDD is the formula to achieve the above outcome.

What is Domain Driven Design(DDD)?

Domain Driven Design(DDD) is an effective software design approach focusing on modeling software based on business-expert-defined business domains.

While it brings a great benefit to our software design, it has certain difficulties to implement it.

Before jumping into how to do it, the following sections will elaborate on what is required to apply the approach, why it is hard to implement it, and common mistakes we need to avoid.

What is required to apply DDD?

The crucial components in DDD are business models. Defining ideal business models requires us to have an in-depth understanding of our businesses. It brings difficulties to implement it.

Why are the difficulties to implement DDD?

  1. It requires architects to have rich experience in business domains and related techs.
    • Architects from other industries might not be able to define proper domain models.
  2. It requires each role in the team to have a considerable degree of professionalism to implement it.
    • Tech experts must be able to assist architects in completing the logical design phase; otherwise, the project will stall.
    • To implement it, all parties, including technical managers, technical experts, and developers from various levels, must be able to understand well-defined diagrams, such as conceptual diagrams.

If the project doesn’t have a particular volume and the company’s business model doesn’t have a robust business model, practicing such a design would lengthen the entire development cycle. This is because completing the holistic design following the approach consumes an obnoxious amount of time. The early conceptual design and the logical design phases are time-consuming. Many companies, especially startups, do not resources(e.g. time, experts, etc) to follow a standardized approach like this. Even worse, some startups’ core business domain varies frequently over time, and it will cost more resources to make adjustments.

However, it is still valuable to integrate multiple software design methods and DDD to formulate a design approach suitable for the scenarios of your projects and to construct solutions for complex business requirements. I am going to elaborate on how we do it later. Let’s start with some basic concepts.

Three phases to practice DDD

  1. Conceptual design (scope definition)
    • From stages between 1 to 3 in the above diagram, architects, product managers/owners, and operations staff work together to produce concept diagrams, concept classes, domain class diagrams, etc.
  2. Logical Design
    • Senior devs(e.g., tech lead, tech experts, senior developers, core developers, etc.) work with architects to do modeling according to the first step’s outputs. These outputs are business-modeling-related documentation and diagrams, such as use case diagrams, use case specifications, use case scenarios, use case views, business object models, etc.
  3. Physical Design
    • Developers work together to implement the requirements. They fix some emerging issues that they haven’t considered in previous steps. For example, some performance issues might require them to adjust models.
    • Developers and architects decide how to store data(e.g., indexing, read-write segregation, partitions, etc.).
    • Developers and architects define architecture software architecture styles(e.g., microservice, RESTful architecture, etc.)
      • We will know whether our models are suitable for microservice or not till this point.

Conclusion

We’ve looked at why we need DDD and three major phases in this blog article. In my next post, I will explain how DDD helps us to determine business domain and model boundaries, basic concepts of strategic design, and why copying models online or similar business is not a good idea.

Domain Driven Design(DDD) and Microservice decomposition

Mindset

There’s no silver bullet for software engineering.

Keep what is good for the business from various methodologies, avoid or troubleshoot the bad is my approach to tackle problems.

Why Microservices? Why decompose our monolithic system into microservices?

Technology is to enable business.

Here are the key factor and benefits that microservices bring to businesses:

  1. Ultimate goal: high cohesion, loose coupling, and these are the basement to support the following benefits.
  2. Agile software development and testing with clear team responsibility and isolated dependencies, which in result fasten the time to market providing first-mover advantage to the business
  3. Locality benefit(Principle of locality) and better resource(CPU, Memory, and etc) usage because when the application run, it doesn’t has to load unnecessary resource, such as cache, and execute useless schedule tasks.
  4. Dynamic scaling with cloud and containerisation technologies to solve issues like heavy traffic in high concurrent scenario.
  5. Better communication with ubiquitous language.

What are the potential problems regarding microservices?

  1. Performance issue, RPC(Remote procedure call) brings extra latency to process a request.
  2. Microservice governance difficulty, data consistency, link tracking and monitoring are tip.
  3. Greater complexity compared to monolithic systems in terms of development. Engineers will face those distributed systems design challenges such as distributed transaction, distributed ID, distributed locks, data consistency, and etc.

How to optimise the advantages and minimise the disadvantages?

  1. According the business requirements, unless having the necessary to have the advantages like those mentioned above, don’t decompose to microservices;
  2. Having clear bounded context;
  3. Benchmark performance
    1. Should limit to 3 to 5 microservices to process a request;
    2. When having performance issue, figure out which service is the bottleneck of the process link, applying caching, allowing certain level of data redundancy to reduce the latency or shorten the link.
    3. If the point 2 cannot solve the performance issue, then merging certain microservices should be taken into consideration.

Sample approaches:

DDD issues

While DDD has many benefits over a long term software development, it has certain severe issues that hinders its application on software development.

Here are some of the typical examples:

  • No mature and rich community support framework to implement this methodology till now.
  • Decomposition of modules is hard.
  • The aggregate root could be cumbersome. Within the domain, there could be duplicate methods. It is hard to make an abstract parent class as some of the methods has no direct relation with certain children classes. We might end up need experts with in-depth experience to extract those common methods to a common module.

Despite these problems, what we can take away from DDD?

The concept of Bounded Context

A bounded context is simply the boundary within a domain where a particular domain model applies.

25 Feb 2019 Domain analysis for microservices – Azure Architecture

In simple language, it is the boundary that a service/business can self-sufficient to achieve specific purpose with its own rules, procedures, process, and etc.

We can apply the bounded context concept to decompose our monolithic system into different domains. Sample event storming approach will be provided in later section of this artcile.

Types of Domain

Core domain: must have things, the things that all your strategic resources must invest in.

Support Subdomain: could have things, less important things that can be outsourced.

Generic Subdomain: things that everybody want. For example, authentication service that most if not all services of your system will need.

Conventional strategies to divide into subdomains

  1. bounded by business responsibility(kinda like SOA);
  2. bounded by feature, more about microservices with higher granularity compared the first one;
  3. bounded by organisational structure;
  4. the two-pizza rule

You can have a boundary large like a department store. You can also make it small like an exquisite boutique only selling specific type of goods. It really depends on what works well for your business.

Again, technology is to enable business.

The difference between hallucination and vision here is just two words, business needs.

Find the boundary with event storming

Identify domain events with 3 key factors:

  1. having business benefits
  2. in the past tense
  3. ordered by time

The Process

Strangler Vine Approach

A strangler vine grows around a tree to reach the sunlight above the forest canopy. Eventually, the host tree dies leaving a tree-shape vine.

Strangler Fig - Rain Forest Reports

Basically, with the bounded context, decompose certain part of the monolithic system into microservice. At the beginning, the extracted microservice can share db with the original service, but eventually it will has its own db and RPC to original service for some API invocation.

You might need to write glue code to assist the communication between the new microservice and the old service.

Where to start?

  1. Some modules that are simple to extract.
  2. Some modules that are frequent changed.
  3. Some modules that have the most business benefits.
  4. Some modules that implemented with heterogeneous technologies.
  5. Some modules that in your high priority.

What Patterns can apply?

Integration Patterns

  1. API Gateway
  2. Aggregator
  3. Proxy
  4. Chained
  5. Branch
  6. Client-side UI Composition Pattern

Cross-Cutting Patterns

  1. Service Discovery Pattern(Eureka, ZK)
  2. Circuit-Breaker(Hystrix)
  3. Client-side load balancing(Ribbon, Spring Cloud LB)
  4. External Configuration(Apache Apollo, Spring Cloud Config with RabbitMQ)

Database Patterns

  1. Database per service
  2. Shared Database per Service
  3. Saga

Decomposition Patterns

We’ve talked about these in previous sections.

  1. Strangler Pattern
  2. Decomposition by business capability
  3. Decompose by subdomain(DDD)

Observability Patterns

  1. Log Aggregation
  2. Distributed Trancing(Apache SkyWalking, Slueth + ZipKin)
  3. Health Check(Actuator)
  4. Metrics(SpringBoot Admin)

Distributed System Design

Distributed Transaction

  1. 2PC, 3PC, TCC
  2. Transaction Outbox Pattern(It’s local transaction table + MQ)
    1. https://masteranyfield.com/2021/07/26/dealing-distributed-transactions-with-2pc-3pc-local-transaction-table-with-mqs/ I’ve talked this pattern at the end of this post.

Distributed Lock

  1. Redis + MySQL Optimistic Lock
  2. RedLock
  3. ZK

Distributed ID

  1. UUID(Not storage friendly, not suitable for many scenarios, security issues, not recommend)
  2. MySQL Auto Increment ID(Poor approach, low performance, single node failure, not recommend)
  3. Redis (not reliable enough, AOF reduces performance, master-slave asynchronous synchronisation would result in data loss)
  4. ZK + Redis (Recommend, ZK for consistency, decent performance with data consistency)
  5. ZK with SnowFlake (Recommend)

I will go through how these techs works and their implementation details when I have time in the future.

Keep your services healthy with circuit-breaker

The circuit-breaker is not that much different from Stop Loss in stock investment.

When your Titanic has leaking holes, the entire ship will sink if you don’t have another protection layer to isolate those leaking holes.

The essence of Circuit-breaker is to isolate unavailable dependent services to avoid cascaded service unavailability.

Typical issues caused by bad dependent services

  1. Exhausting resource because of dependent service late response or no response for a long period.
    • Ever request will open certain resource in a server. The threads will be locked until the request is timeout. In high concurrent scenario, it could exhaust the server’s resource.
    • If microservice applications run on containers, this could lead to the container got killed because run out of memory.
  2. Causing MQ unavailability. If two services are interacted with each other using ActiveMQ. Service A keeps sending message to the ActiveMQ while Service B is unavailable. This could result in the ActiveMQ runs out of storage such as too much message in the DLQ. And finally, the MQ becomes unavailable.

The Circuit-Breaker Design

Other implementations, frameworks, design and approaches will be provided when I have time.