jeudi 11 septembre 2025

Image magick, reduce size of jpg files by 50%

 

for file in *.jpg; do magick "$file" -resize 50% "resized/$file"; done

lundi 8 septembre 2025

Ansible : GCP Google Cloud (Compute) ansible dynamic inventory with cache

Inventory definition GCP compute

---
plugin: google.cloud.gcp_compute
# https://docs.ansible.com/ansible/latest/collections/google/cloud/gcp_compute_inventory.html
# https://gitlab.com/gitlab-org/gitlab-environment-toolkit/-/blob/main/docs/environment_configure.md#google-cloud-platform-gcp

projects:
- project_id

auth_kind: serviceaccount
# must match `ansible_user` below, cf. other article on how to set this up
service_account_file: ./gcp-sa.json

filters:
# only return running instances, we won't be able to connect to sopped instances
- status = RUNNING
# for example, only return compute instances with label foo = foobar
- labels.foo = foobar

keyed_groups:
- key: labels
prefix: label

hostnames:
- name
- public_ip
- private_ip

compose:
#<ansible variable to be set> <data from gcp discovery>
# Set an inventory parameter to use the Public IP address to connect to the host
#ansible_host: public_ip
ansible_host: networkInterfaces[0].accessConfigs[0].natIP
ansible_user: "'sa_115528571027174573787'"

# GCP compute label "activate_this" value => ansible variable "run_this" value
run_this: labels['activate_this']


jeudi 28 août 2025

Systemd healtcheck with side service and monotonic timer, auto-healing

What : bypass the lack of healthcheck of systemd

  • systemd service "what-service.service"
  • systemd timer  "what-service-healthcheck.timer"
    • triggers a systemd service "what-service-healthcheck.service"
       which lanches a script "
      service_health_check.sh"
    • script that :
      • curl's heal-tcheck URL "HEALTH_CHECK_URL"
      • if KO, restart the targetted service



what-service-healthcheck.timer

[Unit]
Description=Run health check every 15 seconds
[Timer]
# Wait 1 minute after boot before the first check
OnBootSec=1min
# Run the check 15 seconds after the last time it finished
OnUnitActiveSec=15s
[Install]
WantedBy=timers.target


By default the timer service will trigger the unit service with the same name, no need to specify it.

what-service-healthcheck.service.j2
[Unit]
Description=Health Check for {{ what_service }}
Requires={{ what_service }}.service
[Service]
Type=oneshot
ExecStart=/usr/local/bin/service_health_check.sh
Restart=on-failure
OnFailure={{ what_service }}


service_health_check.sh 
#!/bin/bash
# The health check endpoint
HEALTH_CHECK_URL="http://localhost:{{ running_port }}/health_check"
# Use curl to check the endpoint.
# --fail: Makes curl exit with a non-zero status code on server errors (4xx or 5xx).
# --silent: Hides the progress meter.
# --output /dev/null: Discards the response body.
if ! curl --silent --fail --max-time 2 --output /dev/null "$HEALTH_CHECK_URL"; then
echo "Health check failed for {{ service_name }}. Restarting..."
# Restart is performed on failure from healthcheck service
exit 1
fi



Adding through ansible (to do : fix indentation, blog isn't great for this)

<role>/tasks/main.yml
--- 

- name: Generate what-service systemd file
ansible.builtin.template:
src: what-service.service.j2
dest: /etc/systemd/system/what-service.service
mode: "0755"
notify: Restart what-service

 - name: Copy the health check script
ansible.builtin.copy:
src: service_health_check.sh
dest: /usr/local/bin/service_health_check.sh
owner: root
group: root
mode: '0755'
vars: 
  service_name: what-service

- name: Copy the health check systemd service file
ansible.builtin.copy:
src: what-service-healthcheck.service
dest: /etc/systemd/system/what-service-healthcheck.service
owner: root
group: root
mode: '0644'
notify: Reload systemd

- name: Copy the health check systemd timer file
ansible.builtin.copy:
src: what-service-healthcheck.timer
dest: /etc/systemd/system/what-service-healthcheck.timer
owner: root
group: root
mode: '0644'
notify: Reload systemd

- name: Enable and start the health check timer
ansible.builtin.systemd:
name: healthcheck.timer
state: started
enabled: yes
daemon_reload: yes # Ensures systemd is reloaded before starting


<role>/handlers/main.yml
---
- name: Restart what-service 
ansible.builtin.service:
name: what-service
state: restarted
daemon_reload: true

- name: Reload systemd
ansible.builtin.service:
daemon_reload: yes

- name: Restart what-service-healthcheck.timer
ansible.builtin.service:
name: what-service-healthcheck.timer
state: restarted
daemon_reload: true

jeudi 24 juillet 2025

apt info - ansible tasks + roles to install apt_info.py automatically along node-exporter + Grafana dashboard

Create a file with openmetrics values, so that it be exporter along node-exporter metrics.

=> script runs every 12h to report the status of apt packages to upgrade writes it in  /var/lib/node_exporter/apt_info.prom 

which is ingested by prometheus when calling node-exporter.


The metrics are used by a grafana dashboard available here : https://grafana.com/grafana/dashboards/23777-apt-ugrades/


```

---
- name: Monitoring probes - setup exporters running on each server
hosts: all
vars:
become_user: root
become: true

tasks:
# https://github.com/ncabatoff/process-exporter
- name: Install .deb package of process-exporter
ansible.builtin.apt:
deb: https://github.com/ncabatoff/process-exporter/releases/download/v0.8.3/process-exporter_0.8.3_linux_amd64.deb
become: true

- name: Download and install apt_info.py
ansible.builtin.get_url:
url: https://raw.githubusercontent.com/prometheus-community/node-exporter-textfile-collector-scripts/refs/heads/master/apt_info.py
dest: /usr/local/bin/apt_info.py
mode: '0755'
become: true

- name: Install apt_info.py dependencies via apt
ansible.builtin.apt:
name: "{{ item }}"
state: present
update_cache: true
become: true
with_items:
- python3-prometheus-client
- python3-apt
- cron

- name: Add a cron job to run apt_info.py every 12 hours
ansible.builtin.cron:
name: "Run apt_info.py every 12 hours"
minute: "0"
hour: "*/12"
job: "/usr/local/bin/apt_info.py > /var/lib/node_exporter/apt_info.prom"
become: true
ignore_errors: "{{ ansible_check_mode }}"

- name: Ensure APT auto update is enabled
ansible.builtin.copy:
dest: /etc/apt/apt.conf.d/99_auto_apt_update.conf
content: 'APT::Periodic::Update-Package-Lists "1";'
owner: root
group: root
mode: '0644'
become: true

roles:
# https://github.com/prometheus-community/ansible/tree/main/roles/node_exporter
- name: prometheus.prometheus.node_exporter

# node_exporter_textfile_dir: "/var/lib/node_exporter" # default
```

jeudi 29 mai 2025

SRE & monitoring of distributed systems


2 different type of recommended monitoring : USE and RED

mercredi 28 mai 2025

Some cryptographic references and blockchain applications

 

Preface: [...] This book is about exactly that: constructing practical cryptosystems for which we can argue security under plausible assumptions. The book covers many constructions for different tasks in cryptography. For each task we define a precise security goal that we aim to achieve and then present constructions that achieve the required goal. To analyze the constructions, we develop a unified framework for doing cryptographic proofs. A reader who masters this framework will be capable of applying it to new constructions that may not be covered in the book.[...]

 

Abstract: We construct new multi-signature schemes that provide new
functionality. Our schemes are designed to reduce the size of the Bitcoin
blockchain, but are useful in many other settings where multi-signatures
are needed. All our constructions support both signature compression
and public-key aggregation. Hence, to verify that a number of parties
signed a common message m, the verifier only needs a short multi-
signature, a short aggregation of their public keys, and the message m.
We give new constructions that are derived from Schnorr signatures and
from BLS signatures. Our constructions are in the plain public key model,
meaning that users do not need to prove knowledge or possession of their
secret key.

 

Intro: Consensus algorithm is one of the most important components in blockchain. Harmony Blockchain achieves consensus through the Fast Byzantine Fault Tolerance (FBFT) algorithm. In FBFT, instead of asking all validators to broadcast their votes, the leader runs a multi-signature signing process to collect the validators’ votes in a O(1)-sized multi-signature and then broadcast it to all validators. Consensus is reached when all the validators validate the aggregated signature against the aggregated public keys for this round of consensus.


 

(vrac / to edit / to format) Prometheus sandbox - demo / prometheus relabeling tool & ref / grafana demo


* The Art of Metric Relabeling in Prometheus: 

https://heiioncall.com/guides/the-art-of-metric-relabeling-in-prometheus


* relabeler online testing tool : 
https://relabeler.promlabs.com/



* relabeling cookbook (mostly compatible with prometheus too) https://docs.victoriametrics.com/victoriametrics/relabeling/#how-to-remove-labels-from-targets 


* open / demo instance of grafana :  https://play.grafana.org/


* Grafana dashboards directory : https://grafana.com/grafana/dashboards/


* Open / demo instance of prometheus :


https://prometheus.demo.prometheus.io/query


 https://prometheus.demo.prometheus.io/config


global: scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 evaluation_interval: 15s external_labels: environment: demo-prometheus-io.c.macro-mile-203600.internal runtime: gogc: 75 alerting: alertmanagers: - follow_redirects: true enable_http2: true scheme: http timeout: 10s api_version: v2 static_configs: - targets: - demo.prometheus.io:9093 rule_files: - /etc/prometheus/rules/*.yml - /etc/prometheus/rules/*.yaml - /etc/prometheus/rules/*.rules scrape_config_files: - /etc/prometheus/scrape_configs/* scrape_configs: - job_name: prometheus honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true static_configs: - targets: - demo.prometheus.io:9090 - job_name: random honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/random.yml refresh_interval: 5m - job_name: caddy honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true static_configs: - targets: - localhost:2019 - job_name: grafana honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true static_configs: - targets: - demo.prometheus.io:3000 - job_name: node honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/node.yml refresh_interval: 5m - job_name: alertmanager honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/alertmanager.yml refresh_interval: 5m - job_name: cadvisor honor_timestamps: true track_timestamps_staleness: true scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/cadvisor.yml refresh_interval: 5m - job_name: blackbox honor_timestamps: true track_timestamps_staleness: false params: module: - http_2xx scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /probe scheme: http enable_compression: true follow_redirects: true enable_http2: true relabel_configs: - source_labels: [__address__] separator: ; target_label: __param_target replacement: $1 action: replace - source_labels: [__param_target] separator: ; target_label: instance replacement: $1 action: replace - separator: ; target_label: __address__ replacement: 127.0.0.1:9115 action: replace static_configs: - targets: - http://localhost:9100