jeudi 28 août 2025

Systemd healtcheck with side service and monotonic timer, auto-healing

What : bypass the lack of healthcheck of systemd

  • systemd service "what-service.service"
  • systemd timer  "what-service-healthcheck.timer"
    • triggers a systemd service "what-service-healthcheck.service"
       which lanches a script "
      service_health_check.sh"
    • script that :
      • curl's heal-tcheck URL "HEALTH_CHECK_URL"
      • if KO, restart the targetted service



what-service-healthcheck.timer

[Unit]
Description=Run health check every 15 seconds
[Timer]
# Wait 1 minute after boot before the first check
OnBootSec=1min
# Run the check 15 seconds after the last time it finished
OnUnitActiveSec=15s
[Install]
WantedBy=timers.target


By default the timer service will trigger the unit service with the same name, no need to specify it.

what-service-healthcheck.service.j2
[Unit]
Description=Health Check for {{ what_service }}
Requires={{ what_service }}.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/service_health_check.sh
Restart=on-failure
OnFailure={{ what_service }}


service_health_check.sh 
#!/bin/bash
# The health check endpoint
HEALTH_CHECK_URL="http://localhost:{{ running_port }}/health_check"
# Use curl to check the endpoint.
# --fail: Makes curl exit with a non-zero status code on server errors (4xx or 5xx).
# --silent: Hides the progress meter.
# --output /dev/null: Discards the response body.
if ! curl --silent --fail --max-time 2 --output /dev/null "$HEALTH_CHECK_URL"; then
echo "Health check failed for {{ service_name }}. Restarting..."
# Restart is performed on failure from healthcheck service
exit 1
fi



Adding through ansible (to do : fix indentation, blog isn't great for this)

<role>/tasks/main.yml
--- 

- name: Generate what-service systemd file
ansible.builtin.template:
src: what-service.service.j2
dest: /etc/systemd/system/what-service.service
mode: "0755"
notify: Restart what-service

 - name: Copy the health check script
ansible.builtin.copy:
src: service_health_check.sh
dest: /usr/local/bin/service_health_check.sh
owner: root
group: root
mode: '0755'
vars: 
  service_name: what-service

- name: Copy the health check systemd service file
ansible.builtin.copy:
src: what-service-healthcheck.service
dest: /etc/systemd/system/what-service-healthcheck.service
owner: root
group: root
mode: '0644'
notify: Reload systemd

- name: Copy the health check systemd timer file
ansible.builtin.copy:
src: what-service-healthcheck.timer
dest: /etc/systemd/system/what-service-healthcheck.timer
owner: root
group: root
mode: '0644'
notify: Reload systemd

- name: Enable and start the health check timer
ansible.builtin.systemd:
name: healthcheck.timer
state: started
enabled: yes
daemon_reload: yes # Ensures systemd is reloaded before starting


<role>/handlers/main.yml
---
- name: Restart what-service 
ansible.builtin.service:
name: what-service
state: restarted
daemon_reload: true

- name: Reload systemd
ansible.builtin.service:
daemon_reload: yes

- name: Restart what-service-healthcheck.timer
ansible.builtin.service:
name: what-service-healthcheck.timer
state: restarted
daemon_reload: true

Aucun commentaire:

Enregistrer un commentaire