Spyke
selfhost·Self Hosted - Self-hosting your services.byCatsrules

[Question] Proxmox Cluster and Quorum

How do you handle Proxmox clusters when you only have 1 or 2 servers?

I technically have 3 servers but I keep one offline because I don't need it 24/7 most point wasting power on a server I don't need.

I believe I read somewhere that you can force Proxmox it to a lower number but it isn't recommended. Has anyone done this and if so have you run into any issues with this?

My main issue is I want my VM to start no matter what. For example I had a power outage. When the servers came back online instead of starting they waited for the quorum number to reach 3. (it will never reach 3 because the third server wasn't turn on.) so they just waited forever until I got home and ran

pvecm expected 2

View original on lemmy.ml

I would argue that the node shouldn’t be in the cluster if its availability doesn’t match the others. If you remove the part-time node, your pvecm concerns go away.

Now, if you have a failure such that the other 2 nodes get restarted, you can manage the VM startups with delays. If one node completes booting 5 minutes before the other, then have the VMs wait 5 minutes or longer before auto-starting. That way, you’ll have your quorum when the VM starts.

2

You can use a small device like a Raspberry Pi as a Qdevice to be the third vote in quorum. It doesn’t have to be a full Proxmox server.

2

If you are not using any HA feature and only put servers into the same cluster for ease of management.

You could use the same command but with a value of 1.

The reason quorum exist is to prevent any server to arbitrarily failover VMs when it believes the other node(s) is down and create a split brain situation.

But if that risk does not exist to begin with, so do the quorum.

1
beehaw.org

I have 2 nodes and a raspberry pi as a qdevice.
I can still power off 1 node (so I have 1 node and an rpi) if I want to.
To avoid split brain, if a node can see the qdevice then it is part of the cluster. If it can't, then the node is in a degraded state.
Qdevices are only recommended in some scenarios, which I can't remember off the top of my head.

With 2 nodes, you can't set up CEPH cluster (well, I don't think you can).
But you can set up High Availability, and use ZFS snapshot replication on a 5 minute interval (so, if your VMs host goes down, the other host can start it with a potentially outdated snapshot).

This worked for my project as I could have a few stateless services that could bounce between nodes, and I had a postgres VM with streaming replication (postgres not ZFS) and failover. Which lead to a decently fault tolerant setup.

1

I will have to look into the qdevice. I do have an old PI3 setup as a software defined radio. I might be able to also set it up as a qdevice.

https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support

Looking at the documentation it isn't recommended to use a a qdevice in a odd number node. I guess I technically have.

If the QNet daemon itself fails, no other node may fail or the cluster immediately loses quorum. For example, in a cluster with 15 nodes, 7 could fail before the cluster becomes inquorate. But, if a QDevice is configured here and it itself fails, no single node of the 15 may fail. The QDevice acts almost as a single point of failure in this case.

But it seems to be more of an issue in large node clusters. In my situation I don't think this is a big deal because if the qdevice fails and my third server is offline I am in the same situation I am now.

Just out of ceriosity do you backup your PI at all? Not sure what the recovery process is if the Qdevice fails how easy is it to replace resetup.

1
lemmy.ml

Please do add a tag to your post as stated on the sublemmy sidebar! Thank you. :)

1

You'll need a QDevice to keep consensus. That wiki article will cover how to set it up and some drawbacks to QDevices. You should be able to run it on a low-power device like a Pi to keep the cluster going.

1

AFAIK forcing it to a lower number is fine if you're not doing HA. I remember reading something along those lines on a forum, but I could be remembering wrong.

If you're not using Ceph or HA, then I don't think there would be any negative effects from not having all the servers in the cluster ready.

1

Oh good, I am not using any of those at least not at the moment.

1

Oh good, I am not using any of those at least not at the moment.

1

I haven't tested this at all, it's just popped into my head, but, could you create a VM on one of the nodes and join that to the cluster?

If it does work, I wouldn't recommend it. But I'd be curious to see if that would work.

0

That leads to a chicken and egg situation. The Proxmox cluster can't turn on VM because the VM isn't on to be the third node in the cluster number :)

1

You reached the end