mercredi 1 novembre 2023
mardi 17 octobre 2023
prometheus, grafana, alertmanager: number of alerts
prometheus alerts counts
from : https://jaanhio.me/blog/visualizing-alerts-metrics-grafana/ + https://community.grafana.com/t/how-to-get-the-time-range-selected-on-the-dashboard-into-a-variable/2868/3
(sum by (alertname) (changes(ALERTS_FOR_STATE[$__range]) AND ignoring(alertstate) max_over_time(ALERTS{alertstate="firing"}[$__range])) + (count by (alertname) (changes(ALERTS_FOR_STATE[$__range]) AND ignoring(alertstate) max_over_time(ALERTS{alertstate="firing"}[$__range])) * 1))
Then use a grafana panel as "Gauge" with the following options :
* Value options: show calculate, Last *
* Orientation = horizontal, and
Number of alerts by alert name of the last 2 months
PromQL = sum by(alertname) (changes(ALERTS_FOR_STATE[65d]))
Number of alerts by instance over the last 2 months
PromQL = sum by(instance_name) (changes(ALERTS_FOR_STATE[65d]))
mercredi 6 septembre 2023
Request Bin / http endpoint for testing
In this step, we’ll set up a new contact point. This contact point will use the webhooks channel. In order to make this work, we also need an endpoint for our webhook channel to receive the alert. We will use requestbin.com to quickly set up that test endpoint. This way we can make sure that our alert is actually sending a notification somewhere.
- Browse to requestbin.com.
- Under the Create Request Bin button, click the public bin link.
Your request bin is now waiting for the first request.
- Copy the endpoint URL.
=> tool to test what is received !
jeudi 21 janvier 2021
Opsgenie webinar / ressources
opsgenie is a tool allowing filtering and routing of monitoring-triggered alerts (nagios, AWS SNS, datadog, ...) to specific channels (SMS, phone-call, Slack, Jira, ...).
Main features on top of this :
- time-table (who's on-call)
- alerts / incident resolution centralization
- third party integrations with 100+ tools
Opsgenie Learning Center : https://docs.opsgenie.com/
[video] Opsgenie : "What do we do?" https://www.youtube.com/watch?v=yphtZ9z2TtA&feature=youtu.be
[video] Opsgenie: "First Look" https://www.youtube.com/watch?v=pyM2dROKn6g
Opsgenie Pricing : https://www.atlassian.com/software/opsgenie/pricing
Implement nagios to opsgenie Heartbeats :
- basic demo : https://www.youtube.com/watch?v=wsN2E_ZHlkE&feature=youtu.be
- https://docs.opsgenie.com/docs/monitoring-nagios
- https://docs.opsgenie.com/docs/heartbeat-monitoring
- https://docs.opsgenie.com/docs/heartbeat-api
mardi 26 janvier 2016
Monitoring : POC around Monit + M/Monit
Monit : "Agent" or "Slave", running on each server where monit his used.
https://mmonit.com/monit/
M/Monit : "Master" allowing to connect, get and coordinate events and actions to&from all monit agents connected.
https://www.mmonit.com/
mmonit manual :
https://mmonit.com/documentation/mmonit_manual.pdf
https://mmonit.com/wiki/Monit/ConfigurationExamples
idea 1 : how to enhance this project : contribute a "log snippet" =
along side with the "start/stop program" in the config file, add a "logfile path" configuration setup that would watch this file(s) and make it available to the agent, and then to the master.
idea 2 : interface monit & elasticsearch (or implement monit within elasticsearch ?)
-----
Other monitoring tools :
* Prometheus "
https://github.com/prometheus/prometheus
* Sensu : A monitoring framework that aims to be simple, malleable, and scalable
https://sensuapp.org/
https://github.com/sensu/sensu
* Ganglia
http://ganglia.info/
vendredi 19 juin 2015
Ansible vs. Chef vs. Puppet vs. Salt
- [infoworld] Puppet vs. Chef : http://www.infoworld.com/article/2614204/data-center/puppet-or-chef--the-configuration-management-dilemma.html
- [infoworld] Review : Puppet vs. Chef vs. Ansible vs. Salt : http://www.infoworld.com/article/2609482/data-center/data-center-review-puppet-vs-chef-vs-ansible-vs-salt.html
- [infoworld] Review: Ansible orchestration is a veteran Unix admin's dream : http://www.infoworld.com/article/2612397/data-center/review--ansible-orchestration-is-a-veteran-unix-admin-s-dream.html
- [infoworld] Review: Salt keeps server automation simple : http://www.infoworld.com/article/2612536/data-center/review--salt-keeps-server-automation-simple.html
- Review: Puppet 3.0 pulls more strings http://www.infoworld.com/article/2611099/data-center/review--puppet-3-0-pulls-more-strings.html