Affichage des articles dont le libellé est monitoring. Afficher tous les articles
Affichage des articles dont le libellé est monitoring. Afficher tous les articles

mardi 17 octobre 2023

prometheus, grafana, alertmanager: number of alerts

 prometheus alerts counts


from : https://jaanhio.me/blog/visualizing-alerts-metrics-grafana/ + https://community.grafana.com/t/how-to-get-the-time-range-selected-on-the-dashboard-into-a-variable/2868/3

(sum by (alertname) (changes(ALERTS_FOR_STATE[$__range]) AND ignoring(alertstate) max_over_time(ALERTS{alertstate="firing"}[$__range])) + (count by (alertname) (changes(ALERTS_FOR_STATE[$__range]) AND ignoring(alertstate) max_over_time(ALERTS{alertstate="firing"}[$__range])) * 1))


Then use a grafana panel as "Gauge" with the following options :

* Value options: show calculate, Last *

* Orientation = horizontal, and 



Number of alerts by alert name of the last 2 months

PromQL = sum by(alertname) (changes(ALERTS_FOR_STATE[65d]))



Number of alerts by instance over the last 2 months

PromQL = sum by(instance_name) (changes(ALERTS_FOR_STATE[65d]))

mercredi 6 septembre 2023

Request Bin / http endpoint for testing

from : https://grafana.com/tutorials/grafana-fundamentals/#create-a-contact-point-for-grafana-managed-alerts

 

 In this step, we’ll set up a new contact point. This contact point will use the webhooks channel. In order to make this work, we also need an endpoint for our webhook channel to receive the alert. We will use requestbin.com to quickly set up that test endpoint. This way we can make sure that our alert is actually sending a notification somewhere.

  1. Browse to requestbin.com.
  2. Under the Create Request Bin button, click the public bin link.

Your request bin is now waiting for the first request.

  1. Copy the endpoint URL.


=> tool to test what is received !

 

jeudi 21 janvier 2021

Opsgenie webinar / ressources

opsgenie is a tool allowing filtering and routing of monitoring-triggered alerts (nagios, AWS SNS, datadog, ...) to specific channels (SMS, phone-call, Slack, Jira, ...).

Main features on top of this :

  • time-table (who's on-call) 
  • alerts / incident resolution centralization
  • third party integrations with 100+ tools


Opsgenie Learning Center :  https://docs.opsgenie.com/


[video] Opsgenie : "What do we do?"  https://www.youtube.com/watch?v=yphtZ9z2TtA&feature=youtu.be

[video] Opsgenie: "First Look" https://www.youtube.com/watch?v=pyM2dROKn6g

Opsgenie Pricing : https://www.atlassian.com/software/opsgenie/pricing



Implement nagios to opsgenie Heartbeats :





mardi 26 janvier 2016

Monitoring : POC around Monit + M/Monit





Monit + M/Monit
OpenSouce, on bitbucket. https://bitbucket.org/tildeslash/monit/



Monit : "Agent" or "Slave", running on each server where monit his used.
https://mmonit.com/monit/

M/Monit : "Master" allowing to connect, get and coordinate events and actions to&from all monit agents connected.
https://www.mmonit.com/



mmonit manual :
https://mmonit.com/documentation/mmonit_manual.pdf
https://mmonit.com/wiki/Monit/ConfigurationExamples



idea 1 : how to enhance this project : contribute a "log snippet" =
along side with the "start/stop program" in the config file, add a "logfile path" configuration setup that would watch this file(s) and make it available to the agent, and then to the master.

idea 2 : interface monit & elasticsearch (or implement monit within elasticsearch ?)





-----
Other monitoring tools :

* Prometheus "
An open-source service monitoring system and time series database."
http://prometheus.io/docs/introduction/getting_started/
 https://github.com/prometheus/prometheus


 * Sensu : A monitoring framework that aims to be simple, malleable, and scalable
https://sensuapp.org/
https://github.com/sensu/sensu

 * Ganglia
 http://ganglia.info/

vendredi 19 juin 2015

Ansible vs. Chef vs. Puppet vs. Salt

There are currently various tools to maintain automatically an infrastructure. The four listed below seem to be the main ones.


Ansible is an IT automation tool. It can configure systems, deploy software, and orchestrate more advanced IT tasks such as continuous deployments or zero downtime rolling updates.


·         Chef https://www.chef.io/
“Chef turns infrastructure into code. With Chef, you can automate how you build, deploy, and manage your infrastructure. Your infrastructure becomes as versionable, testable, and repeatable as application code."


·         Puppet https://puppetlabs.com
“Puppet is a configuration management solution that allows you to define the state of your IT infrastructure, and then automatically enforces the desired state. Puppet automates every step of the software delivery process, from provisioning of physical and virtual machines to orchestration and reporting; from early-stage code development through testing, production release and updates.”


·         Salt : http://saltstack.com
“SaltStack takes a new approach to infrastructure management by developing software that is easy enough to get running in seconds, scalable enough to manage tens of thousands of servers, and fast enough to control and communicate with them in milliseconds. SaltStack delivers a dynamic infrastructure communication bus used for orchestration, remote execution, configuration management and much more. The Salt project was launched in 2011 and today is the fastest-growing, most-active infrastructure orchestration and configuration management open source project in the world. The SaltStack community is committed to keeping the Salt project focused, friendly, healthy and open.”




And some comparisions :