Scalability is the ability of a system, network, or process to handle a growing amount of load by adding more resources. The adding of resource can be done in two ways
This involves adding more resources to the existing nodes. For example, adding more RAM, Storage or processing power.
This involves adding more nodes to support more users.
Any of the approaches can be used for scaling up/out an application, however the cost of adding resources (per user) may change as the volume increases. If we add resources to the system It should increase the ability of application to take more load in a proportional manner of added resources.
An ideal application should be able to serve high level of load in less resources. However, in practical, linearly scalable system may be the best option achievable.
Poorly designed applications may have really high cost on scaling up/out since it will require more resources/user as the load increases.
A cluster is group of computer machines that can individually run a software. Clusters are typically utilized to achieve high availability for a server software.
Clustering is used in many types of servers for high availability.
An app server cluster is group of machines that can run a application server that can be reliably utilized with a minimum of down-time.
A database server cluster is group of machines that can run a database server that can be reliably utilized with a minimum of down-time.
Why do you need Clustering?
Clustering is needed for achieving high availability for a server software. The main purpose of clustering is to achieve 100% availability or a zero down time in service.
A typical server software can be running on one computer machine and it can serve as long as there is no hardware failure or some other failure.
By creating a cluster of more than one machine, we can reduce the chances of our service going un-available in case one of the machine fails.
Doing clustering does not always guarantee that service will be 100% available since there can still be a chance that all the machine in a cluster fail at the same time. However it in not very likely in case you have many machines and they are located at different location or supported by their own resources.
What is MiddleTier Clustering?
Middle tier clustering is just a cluster that is used for service the middle tier in an application. This is popular since many clients may be using middle tier and a lot of heavy load may also be served by middle tier that requires it be to highly available.
Failure of middle tier can cause multiple clients and systems to fail, therefore it's one of the approaches to do clustering at the middle tier of an application.
In java world, it is really common to have EJB server clusters that are used by many clients. In general any application that has a business logic that can be shared across multiple clients can use a middle tier cluster for high availability.
Load balancing is simple technique for distributing workloads across multiple machines or clusters.
The most common and simple load balancing algorithm is Round Robin. In this type of load balancing the request is divided in circular order ensuring all machines get equal number of requests and no single machine is overloaded or underloaded.
The Purpose of load balancing is to
Most common load balancing techniques in web based applications are
Session affinity or sticky session
IP Address affinity
What is Session replication?
Session replication is used in application server clusters to achieve session failover. A user session is replicated to other machines of a cluster, every time the session data changes. If a machine fails, the load balancer can simply send incoming requests to another server in the cluster. The user can be sent to any server in the cluster since all machines in a cluster have copy of the session.
Session replication may allow your application to have session failover but it may require you to have extra cost in terms of memory and network bandwidth.
What is Sticky Session (session Affinity) Load Balancing? What do you mean by 'session Affinity'?
Sticky session or a session affinity technique another popular load balancing technique that requires a user session to be always served by an allocated machine.
In a load balanced server application where user information is stored in session it will be required to keep the session data available to all machines. This can be avoided by always serving a particular user session request from one machine.
The machine is associated with a session as soon as the session is created. All the requests in a particular session are always redirected to the associated machine. This ensures the user data is only at one machine and load is also shared.
In Java world, this is typically done by using jsessionid cookie. The cookie is sent to the client for the first request and every subsequent request by client must be containing that same cookie to identify the session.
What Are The Issues With Sticky Session?
There are few issues that you may face with this approach
The client browser may not support cookies, and your load balancer will not be able to identify if a request belongs to a session. This may cause strange behavior for the users who use no cookie based browsers.
In case one of the machine fails or goes down, the user information (served by that machine) will be lost and there will be no way to recover user session.
What is IP Address Affinity technique for Load Balancing?
IP address affinity is another popular way to do load balancing. In this approach, the client IP address is associated with a server node. All requests from a client IP address are served by one server node.
This approach can be really easy to implement since IP address is always available in a HTTP request header and no additional settings need to be performed.
This type of load balancing can be useful if you clients are likely to have disabled cookies.
However there is a down side of this approach. If many of your users are behind a NATed IP address then all of them will end up using the same server node. This may cause uneven load on your server nodes.
NATed IP address is really common, in fact anytime you are browsing from office network it's likely that you and all your coworkers are using same NATed IP address.
Fail over means switching to another server when one of the server fails.
Fail over is an important technique in achieving high availability. Typically a load balancer is configured to fail over to another machine when the main machine fails.
To achieve least down time, most load balancer support a feature of heart beat check. This ensures that target machine is responding. As soon as a hear beat signal fails, load balancer stops sending request to that machine and redirects to other machines or cluster.