Ravi Shankar's blog

Wednesday, September 1, 2021

Log shipping vs Mirroring vs Replication for Data Architects/ DBAs

Log Shipping:

It automatically sends transaction log backups from one database (Known as the primary database) to a database (Known as the Secondary database) on another server. An optional third server, known as the monitor server, records the history and status of backup and restore operations. The monitor server can raise alerts if these operations fail to occur as scheduled. 

Mirroring:

Database mirroring is a primarily software solution for increasing database availability.

It maintains two copies of a single database that must reside on different server instances of SQL Server Database Engine.

Replication:

It is a set of technologies for copying and distributing data and database objects from one database to another and then synchronizing between databases to maintain consistency. Using replication, you can distribute data to different locations and to remote or mobile users over local and wide area networks, dial-up connections, wireless connections, and the Internet.

Basic requirements for each of them

Log Shipping

The servers involved in log shipping should have the same logical design and collation setting.
The databases in a log shipping configuration must use the full recovery model or bulk-logged recovery model.
The SQL server agent should be configured to start up automatically.
You must have sysadmin privileges on each computer running SQL server to configure log shipping.

Mirroring

Verify that there are no differences in system collation settings between the principal and mirror servers.
Verify that the local windows groups and SQL Server logins definitions are the same on both servers.
Verify that external software components are installed on both the principal and the mirror servers.
Verify that the SQL Server software version is the same on both servers.
Verify that global assemblies are deployed on both the principal and mirror server.
Verify that for the certificates and keys used to access external resources, authentication and encryption match on the principal and mirror server.

Replication

Verify that there are no differences in system collation settings between the servers.
Verify that the local windows groups and SQL Server Login definitions are the same on both servers.
Verify that external software components are installed on both servers.
Verify that CLR assemblies deployed on the publisher are also deployed on the subscriber.
Verify that SQL agent jobs and alerts are present on the subscriber server, if these are required.
Verify that for the certificates and keys used to access external resources, authentication and encryption match on the publisher and subscriber server.

DB terms ABC - ACID, BASE and CAP

ACID is an acronym which is commonly used to define the properties of a relational database system, it stand for following terms

Atomicity - This property guarantees that if one part of the transaction fails, the entire transaction will fail, and the database state will be left unchanged.
Consistency - This property ensures that any transaction will bring the database from one valid state to another.
Isolation - This property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially.
Durable - means that once a transaction has been committed, it will remain so, even in the event of power loss.

BASE properties are the common properties of recently evolved NOSQL databases. According to CAP theorem, a BASE system does not guarantee consistency. This is a contrived acronym that is mapped to following property of a system in terms of the CAP theorem

Basically available indicates that the system is guaranteed to be available
Soft state indicates that the state of the system may change over time, even without input. This is mainly due to the eventually consistent model.
Eventual consistency indicates that the system will become consistent over time, given that the system doesn't receive input during that time

CAP theorem for distributed computing was published by Eric Brewer, This states that it is not possible for a distributed computer system to simultaneously provide all three of the following guarantees:

Consistency (all nodes see the same data even at the same time with concurrent updates )

Availability (a guarantee that every request receives a response about whether it was successful or failed)

Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)

The CAP acronym corresponds to these 3 guarantees. This theorem has created the base for modern distributed computing approaches. World's most high volume traffic companies (e.g. Amazon, Google, Facebook) use this as basis for deciding their application architecture. It's important to understand that only two of these three conditions can be guaranteed to be met by a system.

Shared Nothing Architecture (SN Architecture) and Sharding database

SN Architecture: A shared nothing architecture (SN) is a distributed computing approach in which each node is independent and self-sufficient, and there is no single point of contention required across the system.

This means no resources are shared between nodes (No shared memory, No shared file storage)
The nodes are able to work independently without depending on each other for any work.

Failure on one node affects only the users of that node, however other nodes continue to work without any disruption.

This approach is highly scalable since it avoid the existence of single bottleneck in the system. Shared nothing is recently become popular for web development due to its linear scalability. Google has been using it for long time.

In theory, A shared nothing system can scale almost infinitely simply by adding nodes in the form of inexpensive machines.

Sharding: is an architectural approach that distributes a single logical database system into a cluster of machines.

Sharding is Horizontal partitioning design scheme. In this database design rows of a database table are stored separately, instead of splitting into columns (like in normalization and vertical partitioning). Each partition is called as a shard, which can be independently located on a separate database server or physical location.

Sharding makes a database system highly scalable. The total number of rows in each table in each database is reduced since the tables are divided and distributed into multiple servers. This reduces the index size, which generally means improved search performance.

The most common approach for creating shards is by the use of consistent hashing of a unique id in application (e.g. user id).

The downsides of sharding are,

It requires application to be aware of the data location.
Any addition or deletion of nodes from system will require some rebalance to be done in the system.
If you require lot of cross node join queries then your performance will be really bad. Therefore, knowing how the data will be used for querying becomes really important.
A wrong sharding logic may result in worse performance. Therefore make sure you shard based on the application need.

Ravi Shankar's blog

Wednesday, September 1, 2021

Log shipping vs Mirroring vs Replication for Data Architects/ DBAs

DB terms ABC - ACID, BASE and CAP

Shared Nothing Architecture (SN Architecture) and Sharding database

Generative AI: Paving the way for Performance-Driven Enterprise Architecture

LinkedIn

Blog Archive