# Gentoo clustering questions (openMosix)

## lancealtar

Hi all,

I'm planning to implement a couple of gentoo servers in a new environment. I have been researching MS's Clustering technology and I also have been researching openMosix. I was wondering if anyone knew about some of the features of openMosix. Specifically, can it replicate failback and other features of MS's clustering? For instance, if I'm running a cluster of two nodes with apache, if one node dies (God forbid), will the other node continue to run the service and load balance until the other node is back up and functioning? If openMosix isn't able to do this, is there another application that can? 

I look forward to your responses. Thanks in advance.

----------

## snoopman

Hi there,

just today, I happened to successfully set up a high availability system with two Gentoo machines for a website with 100.000+ visitors monthly. I am using heartbeat which monitors the machines. One is master, the other is taking over as soon as the first machine stops responding. The setup is quite easy and well documented in /usr/share/doc/heartbeat... Go emerge heartbeat on both machines and follow the instructions.

Go to http://www.linux-ha.org/ConfiguringHeartbeat for more information.

Since I am running a highly dynamic website, both filesystems need to be absolutely identical. I am going to implement DRBD this week to achieve this. If this would not be installed, the information would be completely outdated as soon as the second blade takes over. DRBD implemets a new device which will write any changes simultaneously on both machines.

Hope this helps, it looks like this is what you need.

----------

## lancealtar

Thanks snoopman. That looks like what I've been looking for. Let me know how your DRBD implementation works out for you.

----------

## snoopman

Hi lancealtar,

sorry for responding so late. The reason is that we have some problems with the configuration we can't seem to solve. For some reason, heartbeat keeps rebooting the main server. This must be some wrong setting in the heartbeat configuration. I'll let you know as soon as we got this problem solved.

----------

## lancealtar

awesome, thanks for keeping me up to date.

----------

## snoopman

lancealtar,

sorry for being so late again, but my assistant and me were pulling our hair out getting heartbeat to work properly. At some point, we gave up and switched to keepalived, which is pretty easy to configure and offers what we needed. So now we have the system online and stable, a failover system with keepalived and DRBD. In case of a hardware failure of the master blade, the second blade takes over all services after two seconds wit the identical file system regarding web content and database. We kept pulling out the master blade from the rack and the network cables hundreds of times, turned down the electricity and created more worst case scenarios - it all works out perfect now. We have 100.000+ visitors monthly causing awful lot of traffic. A downtime would cause us losing customers.

You can see the site at www.kontron-emea.com

Under http://www.kontron-emea.com/index.php?id=42 (near bottom), there is even a picture of our hardware rack. You will see the red ethernet cable that connects the two servers. Keepalived uses this connection to check if the hardware is up. The second ethernet ports are wired to the outside world.

I can highly recommend this software setup. I hope this post will help others to set up a high availability system as well.

Cheers, snoopman

----------

## lancealtar

Thanks much for your assistance. I'm going to setup a test environment to test out this software setup you have recommended. It sounds like since you've gone through the troubles that this will work well. I'll let you know how well it works.

----------

## Akhouk

I am doing some research into failover solutions and having read this thread about the reboot problem I am wondering if you have read this.

From http://www.linux-ha.org/HeartbeatResourceAgent.....

 *Quote:*   

> According to the LSB, stopping a resource which is already stopped is always permissible. Heartbeat will DEFINITELY stop resources it doesn't know is running. Stop failures can result in the machine being rebooted to clear up the error. Note that some Red Hat init scripts are not LSB-compliant and complain when trying to stop resources which are not running.

 

The gentoo init scripts do always fail on stopping a service that is already stopped.

```

# /etc/init.d/apache2 stop

 * ERROR:  "apache2" has not yet been started.

```

I guess therefore to use heartbeat with Gentoo init scripts we would need to write wrappers around the init scripts to catch the stopping stopped services errors.

----------

