Affichage des articles dont le libellé est prometheus. Afficher tous les articles
Affichage des articles dont le libellé est prometheus. Afficher tous les articles

jeudi 24 juillet 2025

apt info - ansible tasks + roles to install apt_info.py automatically along node-exporter + Grafana dashboard

Create a file with openmetrics values, so that it be exporter along node-exporter metrics.

=> script runs every 12h to report the status of apt packages to upgrade writes it in  /var/lib/node_exporter/apt_info.prom 

which is ingested by prometheus when calling node-exporter.


The metrics are used by a grafana dashboard available here : https://grafana.com/grafana/dashboards/23777-apt-ugrades/


```

---
- name: Monitoring probes - setup exporters running on each server
hosts: all
vars:
become_user: root
become: true

tasks:
# https://github.com/ncabatoff/process-exporter
- name: Install .deb package of process-exporter
ansible.builtin.apt:
deb: https://github.com/ncabatoff/process-exporter/releases/download/v0.8.3/process-exporter_0.8.3_linux_amd64.deb
become: true

- name: Download and install apt_info.py
ansible.builtin.get_url:
url: https://raw.githubusercontent.com/prometheus-community/node-exporter-textfile-collector-scripts/refs/heads/master/apt_info.py
dest: /usr/local/bin/apt_info.py
mode: '0755'
become: true

- name: Install apt_info.py dependencies via apt
ansible.builtin.apt:
name: "{{ item }}"
state: present
update_cache: true
become: true
with_items:
- python3-prometheus-client
- python3-apt
- cron

- name: Add a cron job to run apt_info.py every 12 hours
ansible.builtin.cron:
name: "Run apt_info.py every 12 hours"
minute: "0"
hour: "*/12"
job: "/usr/local/bin/apt_info.py > /var/lib/node_exporter/apt_info.prom"
become: true
ignore_errors: "{{ ansible_check_mode }}"

- name: Ensure APT auto update is enabled
ansible.builtin.copy:
dest: /etc/apt/apt.conf.d/99_auto_apt_update.conf
content: 'APT::Periodic::Update-Package-Lists "1";'
owner: root
group: root
mode: '0644'
become: true

roles:
# https://github.com/prometheus-community/ansible/tree/main/roles/node_exporter
- name: prometheus.prometheus.node_exporter

# node_exporter_textfile_dir: "/var/lib/node_exporter" # default
```

mercredi 28 mai 2025

(vrac / to edit / to format) Prometheus sandbox - demo / prometheus relabeling tool & ref / grafana demo


* The Art of Metric Relabeling in Prometheus: 

https://heiioncall.com/guides/the-art-of-metric-relabeling-in-prometheus


* relabeler online testing tool : 
https://relabeler.promlabs.com/



* relabeling cookbook (mostly compatible with prometheus too) https://docs.victoriametrics.com/victoriametrics/relabeling/#how-to-remove-labels-from-targets 


* open / demo instance of grafana :  https://play.grafana.org/


* Grafana dashboards directory : https://grafana.com/grafana/dashboards/


* Open / demo instance of prometheus :


https://prometheus.demo.prometheus.io/query


 https://prometheus.demo.prometheus.io/config


global: scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 evaluation_interval: 15s external_labels: environment: demo-prometheus-io.c.macro-mile-203600.internal runtime: gogc: 75 alerting: alertmanagers: - follow_redirects: true enable_http2: true scheme: http timeout: 10s api_version: v2 static_configs: - targets: - demo.prometheus.io:9093 rule_files: - /etc/prometheus/rules/*.yml - /etc/prometheus/rules/*.yaml - /etc/prometheus/rules/*.rules scrape_config_files: - /etc/prometheus/scrape_configs/* scrape_configs: - job_name: prometheus honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true static_configs: - targets: - demo.prometheus.io:9090 - job_name: random honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/random.yml refresh_interval: 5m - job_name: caddy honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true static_configs: - targets: - localhost:2019 - job_name: grafana honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true static_configs: - targets: - demo.prometheus.io:3000 - job_name: node honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/node.yml refresh_interval: 5m - job_name: alertmanager honor_timestamps: true track_timestamps_staleness: false scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/alertmanager.yml refresh_interval: 5m - job_name: cadvisor honor_timestamps: true track_timestamps_staleness: true scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /metrics scheme: http enable_compression: true follow_redirects: true enable_http2: true file_sd_configs: - files: - /etc/prometheus/file_sd/cadvisor.yml refresh_interval: 5m - job_name: blackbox honor_timestamps: true track_timestamps_staleness: false params: module: - http_2xx scrape_interval: 15s scrape_timeout: 10s scrape_protocols: - OpenMetricsText1.0.0 - OpenMetricsText0.0.1 - PrometheusText1.0.0 - PrometheusText0.0.4 metrics_path: /probe scheme: http enable_compression: true follow_redirects: true enable_http2: true relabel_configs: - source_labels: [__address__] separator: ; target_label: __param_target replacement: $1 action: replace - source_labels: [__param_target] separator: ; target_label: instance replacement: $1 action: replace - separator: ; target_label: __address__ replacement: 127.0.0.1:9115 action: replace static_configs: - targets: - http://localhost:9100

jeudi 3 octobre 2024

json_exporter prometheus debug

 JSON url to convert : http://json_url/foobar.json

json_exporter running locally : localhost:7979 or json_exporter:7979 



Status of the json_exporter, does NOT contain the metrics from the targets, only related to the process itself (useful to know if it's up or not for example).

 localhost:7979/metrics


Check the result of a JSON URL to be converted / aka. what prometheus should scrape

curl http://json_url/foobar.json => JSON (original)

curl localhost:7979/probe?target=https://json_url/foobar.json => openmetrics converted by json_exporter



Other useful commands :


* edit, restart, wait, check json_exporter output

sudo vi /etc/json_exporter/config.yml && sudo systemctl restart json_exporter && sleep 5 && curl http://localhost:7979/probe?target=http://json_url/foobar.json


* edit, check and restart prometheus : 

sudo vi /etc/prometheus/prometheus.yml && sudo promtool check config /etc/prometheus/prometheus.yml && sudo systemctl restart prometheus.service



mardi 17 octobre 2023

prometheus, grafana, alertmanager: number of alerts

 prometheus alerts counts


from : https://jaanhio.me/blog/visualizing-alerts-metrics-grafana/ + https://community.grafana.com/t/how-to-get-the-time-range-selected-on-the-dashboard-into-a-variable/2868/3

(sum by (alertname) (changes(ALERTS_FOR_STATE[$__range]) AND ignoring(alertstate) max_over_time(ALERTS{alertstate="firing"}[$__range])) + (count by (alertname) (changes(ALERTS_FOR_STATE[$__range]) AND ignoring(alertstate) max_over_time(ALERTS{alertstate="firing"}[$__range])) * 1))


Then use a grafana panel as "Gauge" with the following options :

* Value options: show calculate, Last *

* Orientation = horizontal, and 



Number of alerts by alert name of the last 2 months

PromQL = sum by(alertname) (changes(ALERTS_FOR_STATE[65d]))



Number of alerts by instance over the last 2 months

PromQL = sum by(instance_name) (changes(ALERTS_FOR_STATE[65d]))