This release fixes a number of stability issues with 1.1.2. We recommend that all users of Chef Backend 1.1.2 and prior upgrade. ## Enhancements - Fail follower health check if the node falls too far behind leader. This is configurable with the `leaderl.health_check.max_bytes_behind_leader` option. - The following etcd configuration options are now available for customization in `/etc/chef-backend/chef-backend.rb`: - `etcd.election_timeout` - `etcd.heartbeat_interval` - `etcd.snapshot_count` - Bump etcd to 2.3.7 - Upgrade to postgresql 9.5.5 - Health Checks can now fail up to a configurable number of times before triggering a failover. Users can configure the maximum failures for each service via the following configuration settings: - `leaderl.health_check.max_elasticsearch_failures` - `leaderl.health_check.max_etcd_failures` - `leaderl.health_check.max_pgsql_failures` If you have manually set these flags in chef-backend.rb, your overrides will still be respected. Please remove these options if you'd prefer to use the new defaults instead. - Add a `--force-basebackup` option to the `pgsql-follow` command. This allows for a more straightforward recovery procedure in cases where human intervention is needed. - Report 503 from /leaderl if a user-specified number of followers haven't initiated replication connections.This is tunable with the `leaderl.required_active_followers` configuration option (default: 0) - No longer set shmmax sysctl's as they are no longer required by postgresql 9.5. If you are upgrading from a previous installation, remember to remove postgresql.shmmax and postgresql.shmall from your chef-backend.rb. - Update configuration defaults based on customer feedback. We've changed the following configuration defaults: - `leaderl.health_check.interval_seconds` from 2 to 5 seconds. - `leaderl.leaderl_ttl_seconds` from 10 to 30. - `etcd.heartbeat_interval` from 100 to 500 milliseconds. - `etcd.election_timeout` from 1000 to 5000 milliseconds. The new defaults have proven to reduce spurious failovers when deploying Chef Backend to various cloud providers. If you have manually set these flags in chef-backend.rb, your overrides will still be respected. Please remove these options if you'd prefer to use the new defaults instead. ## Bug Fixes - `chef-server-ctl cluster-status` shouldn't fail because of missing leader key. - Force a checkpoint after promoting the local postgresql instance, avoiding some cases of postgresql followers failing to start syncrounous replication after a failover. - chef-backend-ctl restore shouldn't fail with a NoMethodError. - Wait for postgresql to be started via a test connection rather than a simple process check during the promotion and follow processes. This fixes a number of cases were postgresql would fail to promote during a failover event. - Ensure that the leader key keeps the leader key as long as it is still in the `promoting` state. This fixes some cases where the cluster would become leaderless after a failover event because the elected follower was slow to promote. - Don't erroneously report etcd as down in `chef-backend-ctl status` in cases where etcd is up but there is no leaderl leader.