Architectural Models

The architecture of a system refers to its separate components and the relationships between them.

Major concerns: make the system - reliable - manageable - adaptable - cost-effective

Understanding the trade-offs inherent to choices similar to those identified in this section is arguably the key skill in distributed systems design.

Architectural Elements

Key questions:

What are the entities that are communicating in the distributed system?
How do they communicate? More specifically, what communication paradigm is used?
What roles and responsibilities do they have in the overall architecture?
- Will they change?
How are they mapped onto the physical distributed infrastructure?
- In other words, what is their placement?

The first two questions above are essential to an understanding of distributed systems. From a system perspective, the entities that communicate in a distributed systems are typically processes. This leads to the view of a distributed system as processes coupled with interprocess communication paradigms.

From a systems level, this is fine! From a programming perspective, other abstractions have been proposed.

objects
- objects represent natural units of decomposition for some given problem domain
- accessed via interfaces
- interface definition language (IDL) provides a specification of methods on an object
- a number of problems!
components
- components specify not only interfaces, but also the assumptions they make in terms of other components.
  - all dependencies are explicit
  - contract is more complete
  - this approach promotes higher compositionality
web services
- closely related to objects and components
  - approach based on encapsulation and access through interfaces
  - represent and discover services through web standards
- expanded on in chapter 9

Communication Paradigms

Three types:

Interprocess communication refers to the low-level support for communication between processes in distributed systems, including
- message-passing primitives
- direct access to the API offered by Internet protocols (socket programming)
- support for multicast communication
- expanded on in chapter 4
Remote invocation covers a range of techniques based on a two-way exchange between entities in a distributed system
- Remote procedure calls allow procedures in processes on remote computers to be called as if they were processes in the local address space.
  - access and location transparency
- Remote method invocation strongly resembles remote procedure calls, but for distributed objects.
  - calling object can invoke a method in a remote object
  - underlying details are hidden from the user
Request-reply protocols are a pattern imposed on an underlying message-passing service to support client-server computing.
- primitive, typically only used in embedded systems
- this is the approach used by HTTP
- most DSs will use remote procedure calls or remote method invocation

{:.note} All of these techniques have one thing in common: communication represents a two-way relationship between a sender and a receiver, with senders explicitly direction methods/invocations to the remote receivers.

Receivers are typically aware of the identity of senders and both typically exist at the same time. There are also indirect methods of communication through a third entity allowing for higher decoupling:

space uncoupling: senders don’t need to know who they’re sending to
time uncoupling: senders and receivers do not need to exist at the same time

Key techniques for indirect communication include:

Group communication: delivery of messages to a set of recipients
- relies on the abstraction of a group represented by a group identifier
Publish-subscribe systems: a large number of producers distribute information items of interest to a similarly large number of consumers
Message queues: offer a point-to-point service where producer processes send messages to a specific queue, acting as the indirection between producers and consumers
Tuple spaces: processes can place arbitrary items of structured data (tuples) in a space that can be read or removed by other processes
Distributed shared memory: provide an abstraction for sharing data between process that don’t share physical memory
- programmers gain the abstraction of reading or writing shared data structures as if they were in their local address space
- distribution transparency
- expanded on in chapter 6

Roles and Responsibilities

Each process in a distributed system takes on certain roles which establish the architectural model.

Client-server: most often cited when distributed systems are discussed
- scales poorly
Peer-to-peer: no distinction between client and server
- scales naturally with number of users
- applications are composed of large numbers of peer processes running on separate computers
- individual computers only hold a small responsibility for service

Placement

Placement of a given client/server has few universal guidelines and needs to take into account

patterns of communication between entities
reliability of given machines and current load
quality of communication between different machines
etc.

To map services to multiple users, a service may be implemented as several server processes in separate hosts. The servers may partition the set of objects on which the service is based and distribute those objects between themselves or replicate copies of them on several hosts.

An example of this architecture is the cluster.

{:.def term=“Cache”} A store of recently used data objects that is closer to one client or a particular set of clients than the objects themselves.

If a client needs an object, the caching service can check a local cache first to save retransmitting potentially large payloads.

Some applications employ mobile code, which relies on a client to download additional code to be run locally. The most common example of this is the web, for example old-school Java applets (this book’s favorite example).

Extending that example are mobile agents, running programs that travel from one computer to another in a network carrying out a task on someone’s behalf and returning with the results. - applicability may be limited

Architectural Patterns

Architectural patterns build on the elements discussed above to provide composite structures.

Layering

Layers in a complex system offer software abstractions - higher layers don’t need to be aware of the implementation details of lower layers. This equates to a vertical organization of services into service layers.

{:.def term=“Platform”} Consists of the lowest-level hardware and software layers.

{:.def term=“Middleware”} A layer of software whose purpose is to mask heterogeneity and to provide a convenient programming model to application programmers.

Primarily concerned with raising the level of communication through the support of abstractions such as RMI, event notifications, organization and replication of shared data objects, and transmission of data.

Tiered Architecture

Where layering deals with vertical organization of services, tiering organizes functionality of a given layer and places it into appropriate servers.

Consider the functional decomposition of an application:

presentation logic
- concerned with handling user interaction
application logic/business logic
data logic
- concerned with persistent storage of the application

In a two-tier solution, these aspects must be partitioned into client and server processes. This might be done by splitting the application logic, leaving some in the client and some in the server.

In a three-tier solution, each aspect has a separate logical or physical server. This allows each tier to have a well-defined role.

{:.def term=“Thin Client”} A software layer that supports a UI local to the user while executing applications or accessing services on a remote computer.

Other Patterns

proxy: support location transparency in RPC or RMI
brokerage: support interoperability in complex infrastructures
reflection: offer both
- introspection: dynamic discovery of properties of the system
- intercession: ability to dynamically modify structure/behavior

CS490 - Distributed Systems