Table of contents of the article:
We know better than anyone how difficult it can be to follow the trend, the state of health, the availability of services and servers when they grow out of all proportion.
You always start with a handful of servers, you end up managing thousands of them, in multiple datacenters, in multiple locations and regions around the world, on different suppliers and technologies.
Often the nightmare and challenge of every self-respecting system administrator, devops, sysadmin is to have everything under control, manage and tame the services and not be overwhelmed by them.
Over time, various metrics and data monitoring and collection systems have been devised, among which we mention for professional and informative purposes ZABBIX and NAGIOS, the two most popular solutions on the market.
However, both solutions have many problems if you are looking for bundled all in one solutions that allow you to do their job very well quickly, quickly and in which the commissioning and installation and configuration on each system takes 30 seconds / 1 minute at the most.
If you are looking for a highly professional solution capable of collecting the data of the entire fleet on a single screen, the solution we would like to recommend is definitely Netdata.
What is Netdata?
Network data is a highly optimized Linux utility that provides real-time (per second) performance monitoring for Linux systems , FreeBSD, applications, SNMP devices, etc. and shows complete interactive graphs that render absolutely all the values collected on the web browser to analyze them.
Netdata helps system administrators, SREs, DevOps engineers and IT professionals collect all possible metrics from systems and applications, visualize these metrics in real time, and solve complex performance problems.
Netdata's solution uses two components, Netdata Agent e Netdata Cloud, to provide real-time performance and health monitoring for both individual nodes and the entire infrastructure.
It was developed to be installed on any Linux system, without interrupting the current applications running on it. You can use this tool to monitor and get an overview of what is happening in real time, and what just happened, on your Linux systems and applications.
This is what it monitors:
- Total and per-core CPU utilization, interruptions, softirq and frequency.
- Total usage of memory, RAM, swap and kernel.
- Disk I / O (per disk: bandwidth, operations, backlog, usage, etc.).
- Monitor network interfaces including: bandwidth, packets, errors, drops, etc.).
- Monitor connections, events, errors, etc. Linux Netfilter / iptables firewall.
- Processes (running, frozen, forking, active, etc.).
- Process tree system applications (CPU, memory, swap, disk reads / writes, threads, etc.).
- Apache and Nginx status monitoring with mod_status.
- MySQL database monitoring: queries, updates, freezes, problems, threads, etc.
- Postfix mail server message queue.
- Squid proxy server bandwidth and request monitoring.
- Hardware sensors (temperature, voltage, fans, power, humidity, etc.).
- SNMP devices.
Netdata's Distributed Monitoring Agent collects thousands of metrics from systems, hardware and applications without any configuration. It works permanently on all your physical / virtual servers, containers, cloud deployments and edge / IoT devices.
You can install Netdata on most Linux distributions (Ubuntu, Debian, CentOS and others), container / microservice platforms (Kubernetes cluster, Docker) and many other operating systems (FreeBSD, macOS).
Netdata Cloud is a web application that gives you real-time visibility for the entire infrastructure. With Netdata Cloud, you can view key metrics, in-depth graphs and active alarms from all your nodes in a single web interface. When an anomaly occurs, seamlessly log into any node to troubleshoot and find the root cause with Netdata's familiar dashboard.
Netdata Cloud is free ! You can add an entire infrastructure of nodes, invite all of your colleagues, and view any number of metrics, graphs, and alerts - all for free.
While Netdata Cloud offers a centralized method to monitor your agents, the metrics data is not stored or centralized in any way. The metric data remains with your nodes and is streamed to your browser, via the Cloud, only when you view the Netdata Cloud interface.
What can you do with Netdata Cloud?
Netdata is designed to be simple to use and flexible for every monitoring, viewing and troubleshooting use case:
- Collect : Netdata collects all available metrics from your system and applications with over 300 collectors, Kubernetes service detection and deep container monitoring, all using just 1% CPU and a few MB of RAM. It also collects metrics from Windows machines.
- Immagine - The dashboard features significant graphs to help you understand the relationships between hardware, operating system, running apps / services, and the rest of your infrastructure. Add nodes to Netdata Cloud for a complete view of your infrastructure from a single pane of glass.
- Monitor : Netdata's health watchdog uses hundreds of pre-configured alarms to alert you via Slack, email, PagerDuty and more when an anomaly occurs. Customize with dynamic thresholds, hysteresis, alarm patterns and role-based notifications.
- Troubleshooting - 1s granularity allows you to detect and analyze anomalies that other monitoring platforms may not have detected. Interactive visualizations reduce your dependency on the console and historical metrics help you trace the root cause of the problems.
- Store : Netdata's efficient database engine efficiently stores metrics per second for days, weeks or even months. Each distributed node stores metrics locally, simplifying implementation, reducing costs and enriching Netdata's interactive dashboards.
- Export - Integrate metrics per second with other time series databases like Graphite, Prometheus, InfluxDB, TimescaleDB and others with Netdata's interoperable and extensible core.
- Stream - Aggregates metrics from any number of distributed nodes into one location for in-depth analysis, including temporary nodes in a Kubernetes cluster.
Netdata takes a different approach to helping people build extraordinary infrastructure. It was created out of frustration with existing monitoring tools which are too complex, too expensive and they don't help their users solve complex performance and health problems.
Simple to deploy
- Distribution on one line for Linux distributions, as well as support for Kubernetes / Docker infrastructures.
- No configuration and maintenance request to collect thousands of parameters, every second, from the underlying operating system and running applications.
- Predefined graphics and alarms report common anomalies and performance problems without manual configuration.
- Distributed storage to simplify the cost and complexity of archiving metric data from any number of nodes.
Powerful and scalable
- 1% CPU utilization, a few MB of RAM, and minimal disk I / O to run the monitoring agent on bare metal, virtual machines, containers and even IoT devices.
- Granularity per second for an unlimited number of metrics based on the hardware and applications running on the nodes.
- The interoperable exporters allow you to link Netdata's metrics per second with an existing monitoring stack and other time series databases.
Optimized for troubleshooting
- Detection of visual anomalies with UI / UX emphasizing relationships between graphs.
- Customizable dashboards to identify related metrics, respond to incidents, and help you streamline workflows.
- Metrics distributed in a centralized interface to help users or teams trace complex problems across distributed nodes.
Comparison with other solutions
Netdata offers many advantages over the existing monitoring landscape, be it expensive SaaS products or other open source tools.
|netdata||Others (open source and commercial)|
|High resolution metrics (granularity 1s)||Low resolution metrics (max 10 seconds of granularity)|
|Collects thousands of metrics per node||It only collects some metrics|
|Fast user interface optimized for the anomaly detection||The user interface is only good for an abstract view|
|Long-term and self-contained storage with one second granularity||Centralized metrics in an expensive data lake with 10-second granularity|
|Meaningful presentation , to help you understand the metrics||You need to know the metrics before starting|
|Install and get results immediately||Long sales process and complex installation process|
|Use it for solve performance issues||It just collects past performance statistics|
|Kills the console for tracking performance issues||The console is always needed for troubleshooting|
|Does not require dedicated resources||It requires large dedicated resources|
We have just introduced and examined a very fast and powerful system for obtaining a lot of metrics and displaying them on a fast, performing and captivating dashboard. The validity of the tool that is absolutely adequate for almost all the needs of those who work in the server sector and above all web servers that offer web services is immediately understood.
We as a Hosting and systems engineering company found it faster, more accurate, and easier to install than the previous Zabbix which today is probably the most complete system if you want to invest time (waste time) to install and configure it.
To give an idea of the commissioning of a Zabbix system, let's say that only the installation of the master node requires an hour by an expert systems engineer and each additional machine at least 15 minutes if you go very fast and you have mastery as well as some degree of manual automation.
The implementation of Netdata and the Netdata Cloud visualization requires an average of 1 hour of work for every 50 machines installed. Obviously, if you work with multiple terminals, in a heterogeneous environment the time can be drastically reduced.