Skip Navigation

What do y’all use to monitor many linux servers?

I’m hoping to find something that:

  • has a nice dashboard
  • is quick and simple to install
  • is very lightweight and unobtrusive
  • can send alerts via http request
51 comments
    • Base ansible role installs Prometheus node exporter, configured with the text file collector
    • VM automations push DNS records so that the Prometheus dns-sd automatically discovers them
    • Ansible roles for add Cron jobs that generate metrics for specific systems and dump them for the text file collector
    • Grafana for dashboards
    • Karma as a UI in front of Prometheus alert manager
    • Any chance you'd be willing to share playbooks or point me toward any resources you used?

      I use Ansible to manage config across all my workstations/servers but I haven't gotten around to automating log shipping yet or aggregating system metrics.

    • Cron jobs that generate metrics for specific systems and dump them for the text file collector

      Details please

        • https://github.com/prometheus/node_exporter?tab=readme-ov-file#textfile-collector - which makes node exporter watch a specific directory for files that contain metrics, then re-export them back to the central Prometheus server
        • Some systems have their own metrics endpoints - instead of getting Prometheus to scrape these directly I set up a Cron job to curl these into files for node exporter - this means I don't need extra config in Prometheus to find the endpoints, and don't need to mess with firewall rules
        • Other systems don't directly expose metrics in a format Prometheus can use - in this case I will write/find a script that can do the conversation, then either set it up to write the metrics file directly and run it on a Cron, or run it as a service and another Cron job to do the scrape
51 comments