2 Node Cluster

Overview

In this scenario, we are going to set up two Storware Backup & Recovery servers in High Availability, Active/Passive mode. This is possible by using techniques such as a pacemaker and corosync. At least a basic understanding of these is highly desirable. This how-to is intended for RPM-based systems such as Red Hat / CentOS. If you run Storware Backup & Recovery on a different OS, you may need to refer to your distribution docs.

Our environment is built on the following assumptions:

node1 - first Storware Backup & Recovery server + Storware Backup & Recovery node, IP: 10.41.0.4
node2 - second Storware Backup & Recovery server + Storware Backup & Recovery node, IP: 10.41.0.5
Cluster IP: 10.41.0.10 - We will use this IP to connect to our active Storware Backup & Recovery service. This IP will float between our servers and will point to an active instance.
MariaDB master <-> master replication

Make sure to run all of the commands with administrative privileges. For simplicity, the following commands will be executed as root

HA cluster setup

Preparing the environment

Stop and disable the Storware Backup & Recovery server, node and database as the cluster will manage these resources.
```
systemctl disable vprotect-server vprotect-node mariadb --now
```

Enable HA repo:

dnf config-manager --enable highavailability

Use yum to check updates pending
```
yum update
```
Check the hosts file /etc/hosts, as you might find an entry such as:
```
127.0.0.1 <your_hostname_here>
```
Delete it, as this prevents the cluster from functioning properly (your nodes will not "see" each other) and add entries of your two nodes:
```
NODE1_IP <easy_to_use_node1_name>
NODE2_IP <easy_to_use_node2_name>
```
In this case, we will add:
```
10.41.0.4  node1
10.41.0.5  node2
```

Installation

Run these commands on both servers

On both servers run

yum install -y pacemaker pcs psmisc policycoreutils-python-utils

Add a firewall rule to allow HA traffic - TCP ports 2224, 3121, and 21064, and UDP port 5405 (both servers)
```
firewall-cmd --permanent --add-service=high-availability
firewall-cmd --reload
```
(Optional) While testing, depending on your environment, you may encounter problems related to network traffic, permissions, etc. While it might be a good idea to temporarily disable the firewall and SELinux, we do not recommend disabling that mechanism in the production environment, as it creates significant security issues. If you choose to disable the firewall, bear in mind that Storware will no longer be available on ports 80/443. Instead, connect to ports 8080/8181 respectively.
```
# setenforce 0
# sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config
# systemctl mask firewalld.service
# systemctl stop firewalld.service
# iptables --flush
```
Enable and start PCS daemon
```
systemctl enable pcsd.service --now
```

Cluster configuration

Installation of a pcs package automatically creates a user hacluster with no password authentication. While this may be good for running locally, you will require a password for this account to perform the rest of the configuration - configure the same password on both nodes:

Set password for hacluster

passwd hacluster
Changing password for user hacluster.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Corosync configuration

On node 1, issue a command to authenticate as a hacluster user:

[root@node1 ~]# pcs host auth node1 node2
Username: hacluster
Password:
node1: Authorized
node2: Authorized

Generate and synchronise the corosync configuration

[root@node1 ~]# pcs cluster setup sbrcluster node1 node2

Take a look at your output, which should look similar to below:

Destroying cluster on nodes: node1, node2...
node1: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node1', 'node2'
node1: successful distribution of the file 'pacemaker_remote authkey'
node2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded

Synchronizing pcsd certificates on nodes node1, node2...
node1: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node2: Success

Enable and start your new cluster

[root@node1 ~]# pcs cluster start --all && pcs cluster enable --all
node1: Starting Cluster (corosync)...
node2: Starting Cluster (corosync)...
node1: Starting Cluster (pacemaker)...
node2: Starting Cluster (pacemaker)...
node1: Cluster Enabled
node2: Cluster Enabled

OK! You have our cluster enabled. You have not created any resources (such as a floating IP) yet, but before you proceed, we still have a few settings to modify. Because you are using only two nodes, we need to disable the default quorum policy (this command should not return any output)
```
[root@node1 ~]# pcs property set no-quorum-policy=ignore
```
You should also define default failure settings These two settings combined will define how many failures can occur for a node to be marked as ineligible for hosting a resource, and after what time this restriction will be lifted. You define the defaults here, but it may be a good idea to also set these values at the resource level, depending on your experience. Run these commands:
```
[root@node1 ~]# pcs resource defaults failure-timeout=30s
[root@node1 ~]# pcs resource defaults migration-threshold=3
```
As long as you are not using any fencing device in our environment (here we are not), you need to - disable stonith. The second part of this command verifies running-config. These commands normally do not return any output. Run this command:
```
[root@node1 ~]# pcs property set stonith-enabled=false && crm_verify -L
```

Resource creation

First, you will create a resource that represents our floating IP 10.41.0.10

From this moment, you need to use this IP when connecting to your vProtect server.

Adjust your IP and cidr_netmask, and you're good to go:

[root@node1 ~]# pcs resource create "Failover_IP" ocf:heartbeat:IPaddr2 ip=10.41.0.10 cidr_netmask=22 op monitor interval=30s

Immediately, you should see our IP is up and running on one of the nodes (most likely on the one you issued this command for):

[root@node1 ~]# ip a
[..]
2: ens160:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:a6:9f:c6 brd ff:ff:ff:ff:ff:ff
    inet 10.41.0.4/22 brd 10.41.3.255 scope global ens160
       valid_lft forever preferred_lft forever
    inet 10.41.0.10/22 brd 10.41.3.255 scope global secondary ens160
       valid_lft forever preferred_lft forever
    inet6 fe80::250:56ff:fea6:9fc6/64 scope link
       valid_lft forever preferred_lft forever

As you can see, our floating IP 10.41.0.10 has been successfully assigned as the second IP of interface ens160. We should also check if the Storware Backup & Recovery web interface is up and running. You can do this by opening the web browser and typing in https://10.41.0.10.

The next step is to define a resource responsible for monitoring network connectivity. Note that you need to use your gateway IP in the host_list parameter

[root@node1 ~]# pcs resource create ping ocf:pacemaker:ping dampen=5s multiplier=1000 host_list=10.41.0.1 clone
[root@node1 ~]# pcs constraint location Failover_IP rule score=-INFINITY pingd lt 1 or not_defined pingd

You have to define a set of cluster resources responsible for other services crucial for the Storware node and the server itself. Here, we will logically link these services with our floating IP. Whenever the floating IP disappears from our server, these services will be stopped. You also have to define the proper order for services to start and stop, as for example, starting the Storware server without a running database makes little sense.
```
[root@node1 ~]# pcs resource create "vProtect-node" systemd:vprotect-node op monitor timeout=300s on-fail="stop" --group vProtect-group
[root@node1 ~]# pcs resource create "vProtect-server" service:vprotect-server op start on-fail="stop" timeout="300s" op stop timeout="300s" on-fail="stop" op monitor timeout="300s" on-fail="stop" --group vProtect-group
```
These commands do not return any output.

Define resource colocation

[root@node1 ~]# pcs constraint colocation add Failover_IP with vProtect-group

Set node preference

[root@node1 ~]# pcs constraint location Failover_IP prefers node1=INFINITY
[root@node1 ~]# pcs constraint location vProtect-group prefers node1=INFINITY

At this point, the pacemaker HA cluster is functional.

However, there is still thing we need to consider - Creating DB replication

MariaDB replication

In this section, we explain how to set up master<->master MariaDB replication.

On both nodes, if you have the firewall enabled, allow communication via port 3306
```
firewall-cmd --add-port=3306/tcp --permanent
firewall-cmd --complete-reload
```

Steps to run on the first node - in this case 10.41.0.4

This server will be the source of DB replication.

Stop the Storware server, node and database

[root@node1 ~]# systemctl stop vprotect-server vprotect-node mariadb

Copy your license and node information from the first node to the second node:

[root@node1 ~]# scp /opt/vprotect/node/.session.properties node2:/opt/vprotect/node/.session.properties
[root@node1 ~]# scp /opt/vprotect/server/license.key node2:/opt/vprotect/server/license.key

Edit the config file, enable binary logging, and start MariaDB again. Depending on your distribution, the config file location may vary. Most likely it is /etc/my.cnf or /etc/my.cnf.d/server.cnf
In the [mysqld] section, add the lines:
```
[root@node1 ~]# vi /etc/my.cnf.d/server.cnf
.
.
[mysqld]
log-bin
server_id=1
replicate-do-db=vprotect
.
.
[root@node1 ~]# systemctl start mariadb
```

Now log in to your MariaDB, create a user used for replication, and assign appropriate rights to it.

For the purpose of this task, we will set the username to 'replicator' and the password to R3pLic4ti0N

[root@node1 ~]# mysql -u root -p
Enter password:
[..]
MariaDB [(none)]> create user 'replicator'@'%' identified by 'R3pLic4ti0N';
Query OK, 0 rows affected (0.026 sec)

MariaDB [(none)]> grant replication slave on *.* to 'replicator'@'%';
Query OK, 0 rows affected (0.001 sec)

MariaDB [(none)]> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.001 sec)

Don't log out just yet, we need to check the master status and

Write down the log file name and position, as it is required for proper slave configuration.

MariaDB [(none)]> show master status;
+----------------------+----------+--------------+------------------+
| File                 | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+----------------------+----------+--------------+------------------+
| node1-bin.000007     |    46109 |              |                  |
+----------------------+----------+--------------+------------------+

Dump the vprotect database and copy it onto the second server (node2).

[root@node1 ~]# mysqldump -u root -p vprotect > /tmp/vprotect.sql
[root@node1 ~]# scp /tmp/vprotect.sql root@vprotect2:/tmp/

Steps to run on the 2nd server, node2: 10.41.0.5

Stop the vprotect server, node, and database

[root@node1 ~]# systemctl stop vprotect-server vprotect-node mariadb

Edit the MariaDB config file. Assign a different server id, for example: 2. Then start MariaDB.

[root@node2 ~]# vi /etc/my.cnf.d/server.cnf
log-bin
server_id=2
replicate-do-db=vprotect
[root@node2 ~]# systemctl start mariadb

Load the database dump copied from storware1.

[root@vprotect2 ~]# mysql -u root -p vprotect < /tmp/vprotect.sql

At this point, you have two identical databases on our two servers.

Log in to the MariaDB instance, create a replication user with a password. Use the same user as on node1. Grant the necessary permissions.
Set the master host. You must use the user_master_log_file and master_log_pos written down earlier. Change the IP of the master host to match your network configuration.
```
MariaDB [(none)]> STOP SLAVE;
MariaDB [(none)]> CHANGE MASTER TO MASTER_HOST = '10.41.0.4', MASTER_USER = 'replicator',MASTER_PASSWORD='R3pLic4ti0N',MASTER_LOG_FILE = 'vprotect1-bin.000007',MASTER_LOG_POS=46109;
Query OK, 0 rows affected (0.004 sec)
```

Start the slave, check the master status, and write down the file name and position.

MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)

MariaDB [(none)]> SHOW MASTER STATUS;
+----------------------+----------+--------------+------------------+
| File                 | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+----------------------+----------+--------------+------------------+
| node2-bin.000002     |   501051 |              |                  |
+----------------------+----------+--------------+------------------+
1 row in set (0.000 sec)

Go back to the first server (node1)

Stop the slave, then change the master host using the parameters noted down in the previous step. Also, change the master host IP to match your network configuration.

MariaDB [(none)]> stop slave;
MariaDB [(none)]> MariaDB [(none)]>  change master to master_host='10.41.0.5', master_user='replicator', master_password='R3pLic4ti0N',MASTER_LOG_FILE = 'node2-bin.000002', master_log_pos=501051;
Query OK, 0 rows affected (0.004 sec)
MariaDB [(none)]> start slave;
Query OK, 0 rows affected (0.001 sec)

At this point, you have successfully configured MariaDB master<->master replication.

Testing the setup

The fastest way to test our setup is to invoke

pcs node standby vprotect1

This puts node1 into standby mode, which prevents it from hosting any cluster resources.

After a while, you should see your resources up and running on node2.

Note that if you perform a normal OS shutdown (not a forced one), the pacemaker will wait for a long time for a node to come back online, which in fact will prevent completion of shutdown. As a result, resources will not switch correctly to the other node.

PreviousHigh Availability Next3 Node Cluster

Last updated 3 months ago

hashtagOverview

hashtagHA cluster setup

hashtagPreparing the environment

hashtagInstallation

hashtagCluster configuration

hashtagResource creation

hashtagMariaDB replication

hashtagSteps to run on the 2nd server, node2: 10.41.0.5

hashtagTesting the setup