What : bypass the lack of healthcheck of systemd
- systemd service "what-service.service"
- systemd timer "what-service-healthcheck.timer"
- triggers a systemd service "what-service-healthcheck.service"which lanches a script "service_health_check.sh"
- script that :
- curl's heal-tcheck URL "HEALTH_CHECK_URL"
- if KO, restart the targetted service
what-service-healthcheck.timer
[Unit]
Description=Run health check every 15 seconds
[Timer]
# Wait 1 minute after boot before the first check
OnBootSec=1min
# Run the check 15 seconds after the last time it finished
OnUnitActiveSec=15s
[Install]
WantedBy=timers.target
WantedBy=timers.target
By default the timer service will trigger the unit service with the same name, no need to specify it.
what-service-healthcheck.service.j2
[Unit]
Description=Health Check for {{ what_service }}
Requires={{ what_service }}.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/service_health_check.sh
Restart=on-failure
OnFailure={{ what_service }}
service_health_check.sh
#!/bin/bash
# The health check endpoint
HEALTH_CHECK_URL="http://localhost:{{ running_port }}/health_check"
# Use curl to check the endpoint.
# --fail: Makes curl exit with a non-zero status code on server errors (4xx or 5xx).
# --silent: Hides the progress meter.
# --output /dev/null: Discards the response body.
if ! curl --silent --fail --max-time 2 --output /dev/null "$HEALTH_CHECK_URL"; then
echo "Health check failed for {{ service_name }}. Restarting..."
# Restart is performed on failure from healthcheck service
exit 1
fi
Adding through ansible (to do : fix indentation, blog isn't great for this)
<role>/tasks/main.yml
---
- name: Copy the health check script
ansible.builtin.copy:
src: service_health_check.sh
dest: /usr/local/bin/service_health_check.sh
owner: root
group: root
mode: '0755'
vars:
service_name: what-service
- name: Copy the health check systemd service file
ansible.builtin.copy:
src: what-service-healthcheck.service
dest: /etc/systemd/system/what-service-healthcheck.service
owner: root
group: root
mode: '0644'
notify: Reload systemd
- name: Copy the health check systemd timer file
ansible.builtin.copy:
src: what-service-healthcheck.timer
dest: /etc/systemd/system/what-service-healthcheck.timer
owner: root
group: root
mode: '0644'
notify: Reload systemd
- name: Enable and start the health check timer
ansible.builtin.systemd:
name: healthcheck.timer
state: started
enabled: yes
daemon_reload: yes # Ensures systemd is reloaded before starting
<role>/handlers/main.yml
---
- name: Restart what-service
ansible.builtin.service:
ansible.builtin.service:
name: what-service
state: restarted
daemon_reload: true
- name: Reload systemd
ansible.builtin.service:
daemon_reload: yes
- name: Restart what-service-healthcheck.timer
ansible.builtin.service:
name: what-service-healthcheck.timer
state: restarted
daemon_reload: true
Aucun commentaire:
Enregistrer un commentaire