Affichage des articles dont le libellé est timer. Afficher tous les articles
Affichage des articles dont le libellé est timer. Afficher tous les articles

jeudi 28 août 2025

Systemd healtcheck with side service and monotonic timer, auto-healing

What : bypass the lack of healthcheck of systemd

  • systemd service "what-service.service"
  • systemd timer  "what-service-healthcheck.timer"
    • triggers a systemd service "what-service-healthcheck.service"
       which lanches a script "
      service_health_check.sh"
    • script that :
      • curl's heal-tcheck URL "HEALTH_CHECK_URL"
      • if KO, restart the targetted service



what-service-healthcheck.timer

[Unit]
Description=Run health check every 15 seconds
[Timer]
# Wait 1 minute after boot before the first check
OnBootSec=1min
# Run the check 15 seconds after the last time it finished
OnUnitActiveSec=15s
[Install]
WantedBy=timers.target


By default the timer service will trigger the unit service with the same name, no need to specify it.

what-service-healthcheck.service.j2
[Unit]
Description=Health Check for {{ what_service }}
Requires={{ what_service }}.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/service_health_check.sh
Restart=on-failure
OnFailure={{ what_service }}


service_health_check.sh 
#!/bin/bash
# The health check endpoint
HEALTH_CHECK_URL="http://localhost:{{ running_port }}/health_check"
# Use curl to check the endpoint.
# --fail: Makes curl exit with a non-zero status code on server errors (4xx or 5xx).
# --silent: Hides the progress meter.
# --output /dev/null: Discards the response body.
if ! curl --silent --fail --max-time 2 --output /dev/null "$HEALTH_CHECK_URL"; then
echo "Health check failed for {{ service_name }}. Restarting..."
# Restart is performed on failure from healthcheck service
exit 1
fi



Adding through ansible (to do : fix indentation, blog isn't great for this)

<role>/tasks/main.yml
--- 

- name: Generate what-service systemd file
ansible.builtin.template:
src: what-service.service.j2
dest: /etc/systemd/system/what-service.service
mode: "0755"
notify: Restart what-service

 - name: Copy the health check script
ansible.builtin.copy:
src: service_health_check.sh
dest: /usr/local/bin/service_health_check.sh
owner: root
group: root
mode: '0755'
vars: 
  service_name: what-service

- name: Copy the health check systemd service file
ansible.builtin.copy:
src: what-service-healthcheck.service
dest: /etc/systemd/system/what-service-healthcheck.service
owner: root
group: root
mode: '0644'
notify: Reload systemd

- name: Copy the health check systemd timer file
ansible.builtin.copy:
src: what-service-healthcheck.timer
dest: /etc/systemd/system/what-service-healthcheck.timer
owner: root
group: root
mode: '0644'
notify: Reload systemd

- name: Enable and start the health check timer
ansible.builtin.systemd:
name: healthcheck.timer
state: started
enabled: yes
daemon_reload: yes # Ensures systemd is reloaded before starting


<role>/handlers/main.yml
---
- name: Restart what-service 
ansible.builtin.service:
name: what-service
state: restarted
daemon_reload: true

- name: Reload systemd
ansible.builtin.service:
daemon_reload: yes

- name: Restart what-service-healthcheck.timer
ansible.builtin.service:
name: what-service-healthcheck.timer
state: restarted
daemon_reload: true