Skip to content

Alerts and charts using Victoria Metrics#1633

Open
gsanchietti wants to merge 37 commits into
nethsecurity-8.8from
alerts_victoria
Open

Alerts and charts using Victoria Metrics#1633
gsanchietti wants to merge 37 commits into
nethsecurity-8.8from
alerts_victoria

Conversation

@gsanchietti
Copy link
Copy Markdown
Member

@gsanchietti gsanchietti commented Apr 30, 2026

This pull request migrates network monitoring and reporting functionality from Netdata to Telegraf and Victoria Metrics,
It raises alerts using Mimir as alertmanager for Victoria Metrics.
It keeps netdata with its own alerting for backward compatibility.

Main changes:

  • ns.dashboard and ns.report APIs now uses Victoria Metrics statistics
  • add a new Metrics page where most relevant metrics are shown (this should replace Netdata UI)
  • ping latency monitor had been ported from Netdata to Telegraf: ns.telegraf replaces ns.netdata, output and API name for the UI is unchanged
  • implement mwan alerts using a new Telegraf plugin
  • implement service alerts using a new Telegraf plugin: only most relevant services are monitored
  • port storage and backup alerts to Telegraf plugin
  • send alerts to current my.nethesis.it and my.nethserver.com using a local proxy that mimic Alertmanager behavior
  • remove Netdata
  • add Mimir support for next-gen my.nethesis.it

Replaces #1601
Reference: #1638

@gsanchietti gsanchietti self-assigned this May 4, 2026
@gsanchietti gsanchietti marked this pull request as draft May 5, 2026 06:00
@gsanchietti gsanchietti force-pushed the alerts_victoria branch 5 times, most recently from d25e6e0 to 6f9cfe3 Compare May 5, 2026 14:51
@Tbaile Tbaile force-pushed the alerts_victoria branch 2 times, most recently from b918ffa to aaaed16 Compare May 5, 2026 15:58
@gsanchietti gsanchietti changed the base branch from monitoring to nethsecurity-8.8 May 6, 2026 08:11
@gsanchietti gsanchietti force-pushed the alerts_victoria branch 7 times, most recently from 254a63b to 3ddb538 Compare May 6, 2026 15:42
@gsanchietti gsanchietti changed the title [EXPERIMENTAL] Alerts and charts using Victoria Metrics Alerts and charts using Victoria Metrics May 7, 2026
@gsanchietti gsanchietti mentioned this pull request May 7, 2026
@Tbaile Tbaile force-pushed the nethsecurity-8.8 branch from c798b54 to 32e7d39 Compare May 7, 2026 06:53
@gsanchietti gsanchietti force-pushed the alerts_victoria branch 4 times, most recently from 2c4d550 to f161aca Compare May 7, 2026 12:07
@Tbaile Tbaile force-pushed the nethsecurity-8.8 branch from 32e7d39 to 5a5b451 Compare May 7, 2026 13:16
@Tbaile Tbaile force-pushed the alerts_victoria branch from 84ebded to caf52d1 Compare May 7, 2026 14:49
@gsanchietti gsanchietti marked this pull request as ready for review May 12, 2026 04:27
@gsanchietti gsanchietti requested a review from Tbaile May 12, 2026 04:27
@Tbaile Tbaile linked an issue May 12, 2026 that may be closed by this pull request
@Tbaile Tbaile force-pushed the nethsecurity-8.8 branch from bec5c03 to 12597d6 Compare May 12, 2026 14:10
@Tbaile Tbaile force-pushed the alerts_victoria branch from e9d9670 to de95740 Compare May 12, 2026 14:22
Tbaile and others added 10 commits May 12, 2026 18:01
Replace Netdata alerting with vmalert:

- add vmalert init script (vmalert.initd) to start/stop vmalert service
- add vmalert UCI configuration file (vmalert.conf) with datasource settings
- add comprehensive alert rules
- update Makefile to install vmalert configuration and rules
- add detailed documentation of vmalert setup and metrics mapping
- support for Mimir integration when configured via ns-plug
- add ns-plug-alert-proxy that listens on 127.0.0.1:9095 and receives notifications
  from vmalert: the proxy verify if an alert is firing or resolved
  Then it translates selected alerts to the legacy portal format and forwards
  them to my.nethesis.it or my.nethserver.com
- if Mimir credentials are present in ns-plug UCI config, the Mimir
  alertmanager endpoint is added as a second notifier alongside the proxy
- port to Victoria Metrics also alert about non-encrypted backup
- add telegraf-mwan Python script that reads /var/run/mwan3/iface_state/
  to collect WAN interface connectivity state.
- add telegraf-services Python script that queries ubus to collect the
  running state of all procd-managed services. Outputs JSON for

Assisted-by: Copilot:Sonnet4.6
Changes:
- migrate ping monitoring from netdata's fping plugin to telegraf's native
  ping input plugin
- expose metrics to the UI

The ping plugin uses native method (method="native") which sends ICMP
packets directly without external ping command, requiring CAP_NET_RAW
capability or root privileges. Metrics are tagged with
influxdb_db="ping-metrics" for proper InfluxDB database routing.

Assited-by: Copilot:Sonnet4.6
These plugins are required to replace all Netdata features
Netdata has been replaced by Victoria Metrics.
@Tbaile Tbaile force-pushed the alerts_victoria branch from de95740 to facce07 Compare May 13, 2026 09:45
Copy link
Copy Markdown
Collaborator

@Tbaile Tbaile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like that we use vm_query for bridging requests over to the API and format the data, we could allow later on directly the UI to do the requests directly to the victoria metrics instance. This will need an edit to the ns-api-server, which I'm not that fond of, leave the merge for the moment, going to check for issues on the UI and merge everything all together.

Check my latest commits for a few changes, nothing big but another couple of eyes never hurt.

@gsanchietti
Copy link
Copy Markdown
Member Author

I really don't like that we use vm_query for bridging requests over to the API and format the data, we could allow later on directly the UI to do the requests directly to the victoria metrics instance

Me neither but I think is good enough for most cases. Let's keep it as is, we are going to improve the implementation if it's too heavy on real machines.

Check my latest commits for a few changes, nothing big but another couple of eyes never hurt.

It's good for me.

Please go with the merge when the UI part is good for you (it's a bit ugly right now, but it works and for me we can merge as is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Monitoring: adding long time storage for metrics

2 participants