# 監控要告警啊 AlertManager

上一篇，有提到這段敘述：

* **AlertManager**：收集來自`Prometheus Server`的`Alert event`，並可整合第三方、自訂的告警模式來發送警報，例如：**Slack**、**E-mail**、與其他 **Webhook** 等等。
* **AlertManager** 可定義收到的告警事件如何分類、處理重複性告警、發送管道等等。

下列我們將簡略提到告警配置的概念。

## 安裝

稍早透過HELM安裝`stable/prometheus-operator`時，其指定的`values.yaml`中啟用`alertmanager`即可。

```
alertmanager:
  ## Deploy alertmanager
  ##
  enabled: true
```

## 告警配置

### Prometheus Server

首先需在`Prometheus Server`定義`ScrapeConfig`(`Targets`)監控對象。\
凡不符合下列規則的監控目標，即會發送 **alert event** 到`AlertManeger`服務。

* **job\_name**：**Scrape** 目標，以`job`為配置單位。
* **static\_configs**：設定監控目標。
* **relabel\_configs**：相關`label`指定、替換規則。

```
    additionalScrapeConfigs:
    - job_name: "web-service"
      scrape_interval: 15s
      metrics_path: /probe
      params:
        module: [http_2xx]
      static_configs:
      - targets:  # 要檢查的網址
        # 檢查監控端狀態的服務網址
        - https://grafana.url.com.tw
        - https://kibana.url.com.tw
        - https://prometheus.url.com.tw
        - https://es-client.url.com.tw
      relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox-exporter:9115

```

### Alert Manager

再來是`AlertManager`面向的設定

* global：預設配置參數。
* route：收到符合match條件的規則，透過receivers發送alert event。
* receivers：定義接受告警的管道，與吿警內容格式定義。

```yaml
  config:
    global:
      resolve_timeout: 2m  # 未收到標記告警通知，等待 timeout 時間之後事件標記為 resolve。
    route:
      group_by: [prometheus, alertname]
      group_wait: 30s
      group_interval: 5m  # 重複發送告警的間隔時間
      repeat_interval: 12h
      receiver: slack-receiver
      routes:
      # 範例：符合 alertname = Watchdog 條件者，不進行通知。
      - receiver: 'null'
        match:
          alertname: Watchdog
      # 範例：符合 prometheus = 指定特定環境，將告警事件透過 k8s-receiver 發送通知。
      - receiver: 'k8s-receiver'
        group_wait: 30s
        match:
          prometheus: namespace/prometheus-operator-prometheus
    # 設定通知管道
    receivers:
      # 黑洞
      - name: 'null'
      # 預設 Slace channel
      - name: slack-receiver
        slack_configs:
          - api_url: "https://hooks.slack.com/services/dlksdjfio/sljfidjo/dlksjjsioij"
            channel: "#channel-name"
            title: "{{ .CommonAnnotations.env }}: {{ .CommonAnnotations.summary }}"
            text: "{{ range .Alerts }}{{ .Annotations.message }}\n{{ end }}"
            send_resolved: true
      # 設定另一組 Slack channel
      - name: k8s-receiver
        slack_configs:
          - api_url: "https://hooks.slack.com/services/dlksdjfio/djofjido/alsjdiodjoj"
            channel: "#channel-name2"
            title: "K8s : {{ .CommonAnnotations.summary }}"
            text: "{{ range .Alerts }}{{ .Annotations.message }}\n{{ end }}"
            send_resolved: true

```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://fufu.gitbook.io/kk8s/task-memory/14.alertmanager-info.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
