Commandes en Vrac

---
plugin: google.cloud.gcp_compute
# https://docs.ansible.com/ansible/latest/collections/google/cloud/gcp_compute_inventory.html
# https://gitlab.com/gitlab-org/gitlab-environment-toolkit/-/blob/main/docs/environment_configure.md#google-cloud-platform-gcp

projects:
 - project_id

auth_kind: serviceaccount
# must match `ansible_user` below, cf. other article on how to set this up
service_account_file: ./gcp-sa.json

filters:
 # only return running instances, we won't be able to connect to sopped instances
 - status = RUNNING
 # for example, only return compute instances with label foo = foobar
 - labels.foo = foobar

keyed_groups:
  - key: labels
    prefix: label

hostnames:
  - name
  - public_ip
  - private_ip

compose:
 #<ansible variable to be set>  <data from gcp discovery>
 # Set an inventory parameter to use the Public IP address to connect to the host
 #ansible_host: public_ip
 ansible_host:           networkInterfaces[0].accessConfigs[0].natIP
 ansible_user:           "'sa_115528571027174573787'"

 # GCP compute label "activate_this" value => ansible variable "run_this" value
 run_this:    labels['activate_this']

jeudi 28 août 2025

Systemd healtcheck with side service and monotonic timer, auto-healing

What : bypass the lack of healthcheck of systemd

systemd service "what-service.service"
systemd timer "what-service-healthcheck.timer"

triggers a systemd service "what-service-healthcheck.service"
which lanches a script "service_health_check.sh"
script that :

curl's heal-tcheck URL "HEALTH_CHECK_URL"
if KO, restart the targetted service

what-service-healthcheck.timer

[Unit]

Description=Run health check every 15 seconds

[Timer]

# Wait 1 minute after boot before the first check

OnBootSec=1min

# Run the check 15 seconds after the last time it finished

OnUnitActiveSec=15s

[Install]
WantedBy=timers.target

By default the timer service will trigger the unit service with the same name, no need to specify it.

what-service-healthcheck.service.j2

[Unit]

Description=Health Check for {{ what_service }}

Requires={{ what_service }}.service

[Service]

Type=oneshot

ExecStart=/usr/local/bin/service_health_check.sh

Restart=on-failure

OnFailure={{ what_service }}

service_health_check.sh

#!/bin/bash

# The health check endpoint

HEALTH_CHECK_URL="http://localhost:{{ running_port }}/health_check"

# Use curl to check the endpoint.

# --fail: Makes curl exit with a non-zero status code on server errors (4xx or 5xx).

# --silent: Hides the progress meter.

# --output /dev/null: Discards the response body.

if ! curl --silent --fail --max-time 2 --output /dev/null "$HEALTH_CHECK_URL"; then

echo "Health check failed for {{ service_name }}. Restarting..."

# Restart is performed on failure from healthcheck service

exit 1

Adding through ansible (to do : fix indentation, blog isn't great for this)

<role>/tasks/main.yml

---

- name: Generate what-service systemd file

ansible.builtin.template:

src: what-service.service.j2

dest: /etc/systemd/system/what-service.service

mode: "0755"

notify: Restart what-service

- name: Copy the health check script

ansible.builtin.copy:

src: service_health_check.sh

dest: /usr/local/bin/service_health_check.sh

owner: root

group: root

mode: '0755'

vars:

service_name: what-service

- name: Copy the health check systemd service file

ansible.builtin.copy:

src: what-service-healthcheck.service

dest: /etc/systemd/system/what-service-healthcheck.service

owner: root

group: root

mode: '0644'

notify: Reload systemd

- name: Copy the health check systemd timer file

ansible.builtin.copy:

src: what-service-healthcheck.timer

dest: /etc/systemd/system/what-service-healthcheck.timer

owner: root

group: root

mode: '0644'

notify: Reload systemd

- name: Enable and start the health check timer

ansible.builtin.systemd:

state: started

enabled: yes

daemon_reload: yes # Ensures systemd is reloaded before starting

<role>/handlers/main.yml

---

- name: Restart what-service
ansible.builtin.service:

state: restarted

daemon_reload: true

- name: Reload systemd

ansible.builtin.service:

daemon_reload: yes

- name: Restart what-service-healthcheck.timer

ansible.builtin.service:

state: restarted

daemon_reload: true

jeudi 24 juillet 2025

apt info - ansible tasks + roles to install apt_info.py automatically along node-exporter + Grafana dashboard

Create a file with openmetrics values, so that it be exporter along node-exporter metrics.

=> script runs every 12h to report the status of apt packages to upgrade writes it in /var/lib/node_exporter/apt_info.prom

which is ingested by prometheus when calling node-exporter.

The metrics are used by a grafana dashboard available here : https://grafana.com/grafana/dashboards/23777-apt-ugrades/

```

---
- name: Monitoring probes - setup exporters running on each server
  hosts: all
  vars:
   become_user: root
   become: true

  tasks:
  # https://github.com/ncabatoff/process-exporter
   - name: Install .deb package of process-exporter
     ansible.builtin.apt:
      deb: https://github.com/ncabatoff/process-exporter/releases/download/v0.8.3/process-exporter_0.8.3_linux_amd64.deb
     become: true

   - name: Download and install apt_info.py
     ansible.builtin.get_url:
      url: https://raw.githubusercontent.com/prometheus-community/node-exporter-textfile-collector-scripts/refs/heads/master/apt_info.py
      dest: /usr/local/bin/apt_info.py
      mode: '0755'
     become: true

   - name: Install apt_info.py dependencies via apt
     ansible.builtin.apt:
      name: "{{ item }}"
      state: present
      update_cache: true
     become: true
     with_items:
      - python3-prometheus-client
      - python3-apt
      - cron

   - name: Add a cron job to run apt_info.py every 12 hours
     ansible.builtin.cron:
      name: "Run apt_info.py every 12 hours"
      minute: "0"
      hour: "*/12"
      job: "/usr/local/bin/apt_info.py > /var/lib/node_exporter/apt_info.prom"
     become: true
     ignore_errors: "{{ ansible_check_mode }}"

   - name: Ensure APT auto update is enabled
     ansible.builtin.copy:
      dest: /etc/apt/apt.conf.d/99_auto_apt_update.conf
      content: 'APT::Periodic::Update-Package-Lists "1";'
      owner: root
      group: root
      mode: '0644'
     become: true

  roles:
   # https://github.com/prometheus-community/ansible/tree/main/roles/node_exporter
   - name: prometheus.prometheus.node_exporter

   #  node_exporter_textfile_dir: "/var/lib/node_exporter" # default
```

jeudi 29 mai 2025

SRE & monitoring of distributed systems

Google SRE book, Chapter 6
https://sre.google/sre-book/monitoring-distributed-systems/

(and more globally, the whole Goolge SRE book explains a full methodology of SRE from monitoring to incident management)

2 different type of recommended monitoring : USE and RED

RED Method explained at GrafanaCon 2018 https://grafana.com/blog/2018/08/02/the-red-method-how-to-instrument-your-services/
USE Deep dive http://brendangregg.com/usemethod.html

mercredi 28 mai 2025

Some cryptographic references and blockchain applications

Boneh–Shoup A Graduate Course in Applied Cryptography

Preface: [...] This book is about exactly that: constructing practical cryptosystems for which we can argue security under plausible assumptions. The book covers many constructions for different tasks in cryptography. For each task we define a precise security goal that we aim to achieve and then present constructions that achieve the required goal. To analyze the constructions, we develop a unified framework for doing cryptographic proofs. A reader who masters this framework will be capable of applying it to new constructions that may not be covered in the book.[...]

Dan Boneh, Manu Drijvers, Gregory Neven Compact Multi-Signatures for Smaller Blockchains

Abstract: We construct new multi-signature schemes that provide new
functionality. Our schemes are designed to reduce the size of the Bitcoin
blockchain, but are useful in many other settings where multi-signatures
are needed. All our constructions support both signature compression
and public-key aggregation. Hence, to verify that a number of parties
signed a common message m, the verifier only needs a short multi-
signature, a short aggregation of their public keys, and the message m.
We give new constructions that are derived from Schnorr signatures and
from BLS signatures. Our constructions are in the plain public key model,
meaning that users do not need to prove knowledge or possession of their
secret key.

Rogue Key Attack in BLS Signature and Harmony Security

Intro: Consensus algorithm is one of the most important components in blockchain. Harmony Blockchain achieves consensus through the Fast Byzantine Fault Tolerance (FBFT) algorithm. In FBFT, instead of asking all validators to broadcast their votes, the leader runs a multi-signature signing process to collect the validators’ votes in a O(1)-sized multi-signature and then broadcast it to all validators. Consensus is reached when all the validators validate the aggregated signature against the aggregated public keys for this round of consensus.

(vrac / to edit / to format) Prometheus sandbox - demo / prometheus relabeling tool & ref / grafana demo

* The Art of Metric Relabeling in Prometheus:

https://heiioncall.com/guides/the-art-of-metric-relabeling-in-prometheus

* relabeler online testing tool :
https://relabeler.promlabs.com/

* relabeling cookbook (mostly compatible with prometheus too) https://docs.victoriametrics.com/victoriametrics/relabeling/#how-to-remove-labels-from-targets

* open / demo instance of grafana : https://play.grafana.org/

* Grafana dashboards directory : https://grafana.com/grafana/dashboards/

* Open / demo instance of prometheus :

https://prometheus.demo.prometheus.io/query

https://prometheus.demo.prometheus.io/config

global: scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 evaluation_interval: 15s external_labels: environment: demo-prometheus-io.c.macro-mile-203600.internal runtime: gogc: 75 alerting: alertmanagers: - follow_redirects: true enable_http2: true scheme: http timeout: 10s api_version: v2 static_configs: - targets: - demo.prometheus.io:9093 rule_files: - /etc/prometheus/rules/*.yml - /etc/prometheus/rules/*.yaml - /etc/prometheus/rules/*.rules scrape_config_files: - /etc/prometheus/scrape_configs/* scrape_configs: - job_name: prometheus honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true static_configs: - targets: - demo.prometheus.io:9090 - job_name: random honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/random.yml refresh_interval: 5m - job_name: caddy honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true static_configs: - targets: - localhost:2019 - job_name: grafana honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true static_configs: - targets: - demo.prometheus.io:3000 - job_name: node honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/node.yml refresh_interval: 5m - job_name: alertmanager honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/alertmanager.yml refresh_interval: 5m - job_name: cadvisor honor_timestamps: true track_timestamps_staleness: true scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/cadvisor.yml refresh_interval: 5m - job_name: blackbox honor_timestamps: true track_timestamps_staleness: false params: module: - http_2xx scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /probe scheme: http enable_compression: true follow_redirects: true enable_http2: true relabel_configs: - source_labels: [__address__] separator: ; target_label: __param_target replacement: $1 action: replace - source_labels: [__param_target] separator: ; target_label: instance replacement: $1 action: replace - separator: ; target_label: __address__ replacement: 127.0.0.1:9115 action: replace static_configs: - targets: - http://localhost:9100

Commandes en Vrac

vendredi 21 novembre 2025

UN / unodc elearning

mardi 30 septembre 2025

[ref] Chainguard 101

jeudi 11 septembre 2025

Image magick, reduce size of jpg files by 50%

lundi 8 septembre 2025

Ansible : GCP Google Cloud (Compute) ansible dynamic inventory with cache

jeudi 28 août 2025

Systemd healtcheck with side service and monotonic timer, auto-healing

jeudi 24 juillet 2025

apt info - ansible tasks + roles to install apt_info.py automatically along node-exporter + Grafana dashboard

jeudi 29 mai 2025

SRE & monitoring of distributed systems

mercredi 28 mai 2025

Some cryptographic references and blockchain applications

(vrac / to edit / to format) Prometheus sandbox - demo / prometheus relabeling tool & ref / grafana demo

Labels (alpha.)

Labels (fréq.)