summaryrefslogtreecommitdiffstats
path: root/nixos/tests/consul.nix
diff options
context:
space:
mode:
authorNiklas Hambüchen <mail@nh2.me>2020-06-18 02:08:17 +0200
committerNiklas Hambüchen <mail@nh2.me>2020-06-18 02:22:31 +0200
commita59a972413cef886eb7b2f048aa8dc08a61bf1a2 (patch)
tree982450d36fe30694f3bb6684d5ebf8a34d29bc31 /nixos/tests/consul.nix
parent25d665634a1bd38515320beabf85a6e23545bac7 (diff)
consul.passthru.tests: Fix failure on current consul. Fixes #90613.
Done by setting `autopilot.min_quorum = 3`. Techncially, this would have been required to keep the test correct since Consul's "autopilot" "Dead Server Cleanup" was enabled by default (I believe that was in Consul 0.8). Practically, the issue only occurred with our NixOS test with releases >= `1.7.0-beta2` (see #90613). The setting itself is available since Consul 1.6.2. However, this setting was not documented clearly enough for anybody to notice, and only the upstream issue https://github.com/hashicorp/consul/issues/8118 I filed brought that to light. As explained there, the test could also have been made pass by applying the more correct rolling reboot procedure -m.wait_until_succeeds("[ $(consul members | grep -o alive | wc -l) == 5 ]") +m.wait_until_succeeds( + "[ $(consul operator raft list-peers | grep true | wc -l) == 3 ]" +) but we also intend to test that Consul can regain consensus even if the quorum gets temporarily broken.
Diffstat (limited to 'nixos/tests/consul.nix')
-rw-r--r--nixos/tests/consul.nix4
1 files changed, 4 insertions, 0 deletions
diff --git a/nixos/tests/consul.nix b/nixos/tests/consul.nix
index eb7dd45923fc..ffbbd835885e 100644
--- a/nixos/tests/consul.nix
+++ b/nixos/tests/consul.nix
@@ -73,6 +73,10 @@ let
extraConfig = defaultExtraConfig // {
server = true;
bootstrap_expect = numConsensusServers;
+ # Tell Consul that we never intend to drop below this many servers.
+ # Ensures to not permanently lose consensus after temporary loss.
+ # See https://github.com/hashicorp/consul/issues/8118#issuecomment-645330040
+ autopilot.min_quorum = numConsensusServers;
retry_join =
# If there's only 1 node in the network, we allow self-join;
# otherwise, the node must not try to join itself, and join only the other servers.