Documentation
In Alertmanager, the group_by field allows you to define which alert labels to group together, which is useful to avoid alert flooding and to consolidate notifications about similar issues. Here are some common and useful options for group_by:
Common group_by Options
-
alertname: Groups alerts by their name. This is useful to ensure that alerts with the same name (e.g.,HighMemoryUsage) are grouped together, so you don’t get multiple notifications for the same type of alert.group_by: ['alertname'] -
job: Groups alerts by job. This is helpful if you want alerts grouped by the application or service that generated them, as defined in Prometheus jobs (e.g.,node_exporter,api_server).group_by: ['job'] -
instance: Groups alerts by instance (e.g.,server1.example.com). This is useful if you want all alerts from the same instance grouped together, which helps to monitor issues specific to each server or node.group_by: ['instance'] -
severity: Groups alerts by severity level (e.g.,critical,warning). This allows you to separate critical alerts from warnings, so you can prioritize and handle them accordingly.group_by: ['severity'] -
cluster: Groups alerts by cluster, which is useful in multi-cluster environments to manage alerts based on clusters and keep them organized.group_by: ['cluster'] -
environment: Groups alerts by environment (e.g.,production,staging). This is useful if you monitor multiple environments and want to distinguish between alerts from production versus staging or development.group_by: ['environment'] -
service: Groups alerts by a specific service name or type, which is useful if you have different services (e.g.,database,web,cache) and want to handle their alerts independently.group_by: ['service'] -
region: Groups alerts by region. This is helpful if you’re running a distributed setup across multiple regions and want to separate alerts based on geographical locations.group_by: ['region'] -
datacenter: Groups alerts by datacenter, which is useful for separating alerts in a multi-datacenter setup, especially when each datacenter may have unique characteristics.group_by: ['datacenter'] -
__name__: Groups by the metric name itself. This can be helpful if you have custom metrics and want to group by specific metric names across your infrastructure.group_by: ['__name__']
Combining Multiple Options
You can combine multiple labels for group_by to fine-tune the grouping. For example, grouping by alertname, job, and instance would mean that alerts with the same alert name, job, and instance will be grouped together:
group_by: ['alertname', 'job', 'instance']Special Option: group_by: ['...']
Setting group_by to ['...'] disables alert grouping entirely. Each alert will be sent as an individual notification, regardless of its labels. This is generally not recommended for high-volume alert environments, as it may lead to excessive notifications.
group_by: ['...']Example: Comprehensive Grouping
Here’s an example combining multiple labels to group alerts based on several relevant attributes:
group_by: ['alertname', 'severity', 'environment', 'service', 'instance']This setup would group alerts by their name, severity level, environment, service type, and specific instance, which is useful in complex environments with multiple applications and services running across different instances and environments.