2026-02-12
kubernetes
0
请注意,本文编写于 101 天前,最后修改于 101 天前,其中某些信息可能已经过时。

目录

k8s+prometheus+grafana

kube-prometheus是一个完整的监控解决方案,可以轻松地将其部署到 Kubernetes 集群中,它包括以下内容

  1. Prometheus 用于度量收集
  2. Alertmanager 用于指标警报和通知
  3. Grafana 用于图形用户界面
  4. 一组特定于K8s的exporters,用作指标收集代理
  5. 使用 Prometheus Operator 来简化和自动化该堆栈的设置

下载

执行kubectl version 查看k8s 版本,下载对应版本

wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/tags/v0.15.0.tar.gz

解压

tar -zxvf v0.15.0.tar.gz && cd kube-prometheus-0.15.0

修改镜像地址

网络原因,某些镜像在国内无法下载,因此需要修改地址

shell
[root@k8s-master kube-prometheus-0.15.0]# grep "image:" manifests/*yaml manifests/alertmanager-alertmanager.yaml: image: quay.io/prometheus/alertmanager:v0.28.1 manifests/blackboxExporter-deployment.yaml: image: quay.io/prometheus/blackbox-exporter:v0.26.0 manifests/blackboxExporter-deployment.yaml: image: ghcr.io/jimmidyson/configmap-reload:v0.15.0 manifests/blackboxExporter-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.19.1 manifests/grafana-deployment.yaml: image: grafana/grafana:12.0.1 manifests/kubeStateMetrics-deployment.yaml: image: harbor.local.com/local/kube-state-metrics:v2.15.0 manifests/kubeStateMetrics-deployment.yaml: image: harbor.local.com/local/kube-rbac-proxy:v0.19.1 manifests/kubeStateMetrics-deployment.yaml: image: harbor.local.com/local/kube-rbac-proxy:v0.19.1 manifests/nodeExporter-daemonset.yaml: image: quay.io/prometheus/node-exporter:v1.9.1 manifests/nodeExporter-daemonset.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.19.1 manifests/prometheusAdapter-deployment.yaml: image: registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.12.0 manifests/prometheusOperator-deployment.yaml: image: quay.io/prometheus-operator/prometheus-operator:v0.83.0 manifests/prometheusOperator-deployment.yaml: image: quay.io/brancz/kube-rbac-proxy:v0.19.1 manifests/prometheus-prometheus.yaml: image: quay.io/prometheus/prometheus:v3.4.0

image-20250710231416537

获取需镜像列表

[root@k8s-master kube-prometheus-0.15.0]# grep "image:" manifests/*yaml | awk '{print $3}' quay.io/prometheus/alertmanager:v0.28.1 quay.io/prometheus/blackbox-exporter:v0.26.0 ghcr.io/jimmidyson/configmap-reload:v0.15.0 quay.io/brancz/kube-rbac-proxy:v0.19.1 grafana/grafana:12.0.1 harbor.local.com/local/kube-state-metrics:v2.15.0 harbor.local.com/local/kube-rbac-proxy:v0.19.1 harbor.local.com/local/kube-rbac-proxy:v0.19.1 quay.io/prometheus/node-exporter:v1.9.1 quay.io/brancz/kube-rbac-proxy:v0.19.1 registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.12.0 quay.io/prometheus-operator/prometheus-operator:v0.83.0 quay.io/brancz/kube-rbac-proxy:v0.19.1 quay.io/prometheus/prometheus:v3.4.0

下载镜像并推送至harbor

拉取镜像 docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/prometheus/alertmanager:v0.28.1 标记镜像 docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/prometheus/alertmanager:v0.28.1 harbor.local.com/local/alertmanager:v0.28.1 推送镜像 docker push harbor.local.com/local/alertmanager:v0.28.1 删除本地 docker rmi swr.cn-north-4.myhuaweicloud.com/ddn-k8s/quay.io/prometheus/alertmanager:v0.28.1

image-20250710232648855

image-20250710232249483

修改Yaml中镜像地址为harbor地址

[root@k8s-master kube-prometheus-0.15.0]# grep "image:" manifests/*yaml | awk -F':' '{print $1}' manifests/alertmanager-alertmanager.yaml manifests/blackboxExporter-deployment.yaml manifests/blackboxExporter-deployment.yaml manifests/blackboxExporter-deployment.yaml manifests/grafana-deployment.yaml manifests/kubeStateMetrics-deployment.yaml manifests/kubeStateMetrics-deployment.yaml manifests/kubeStateMetrics-deployment.yaml manifests/nodeExporter-daemonset.yaml manifests/nodeExporter-daemonset.yaml manifests/prometheusAdapter-deployment.yaml manifests/prometheusOperator-deployment.yaml manifests/prometheusOperator-deployment.yaml manifests/prometheus-prometheus.yaml

使用创建的harbor-registry-secret拉取 harbor.local.com上的私有镜像,需要修改yaml文件,在 spec部分添加 imagePullSecrets字段

yaml
imagePullSecrets: - name: registry-secret

image-20250710235810720

查看镜像地址

[root@k8s-master kube-prometheus-0.15.0]# grep "image:" manifests/*yaml manifests/alertmanager-alertmanager.yaml: image: harbor.local.com/local/alertmanager:v0.28.1 manifests/blackboxExporter-deployment.yaml: image: harbor.local.com/local/blackbox-exporter:v0.26.0 manifests/blackboxExporter-deployment.yaml: image: harbor.local.com/local/configmap-reload:v0.15.0 manifests/blackboxExporter-deployment.yaml: image: harbor.local.com/local/kube-rbac-proxy:v0.19.1 manifests/grafana-deployment.yaml: image: harbor.local.com/local/grafana:12.0.1-security-01 manifests/kubeStateMetrics-deployment.yaml: image: harbor.local.com/local/kube-state-metrics:v2.15.0 manifests/kubeStateMetrics-deployment.yaml: image: harbor.local.com/local/kube-rbac-proxy:v0.19.1 manifests/kubeStateMetrics-deployment.yaml: image: harbor.local.com/local/kube-rbac-proxy:v0.19.1 manifests/nodeExporter-daemonset.yaml: image: harbor.local.com/local/node-exporter:v1.9.1 manifests/nodeExporter-daemonset.yaml: image: harbor.local.com/local/kube-rbac-proxy:v0.19.1 manifests/prometheusAdapter-deployment.yaml: image: harbor.local.com/local/prometheus-adapter:v0.12.0 manifests/prometheusOperator-deployment.yaml: image: harbor.local.com/local/prometheus-operator:v0.83.0 manifests/prometheusOperator-deployment.yaml: image: harbor.local.com/local/kube-rbac-proxy:v0.19.1 manifests/prometheus-prometheus.yaml: image: harbor.local.com/local/prometheus:v3.4.0

image-20250710235906472

访问配置

为了可以从外部访问 PrometheusGrafanaAlertmanager,需要修改 service 类型为 NodePort 类型。

修改 Prometheus 的 service
[root@k8s-master kube-prometheus-0.15.0]# cat manifests/prometheus-service.yaml apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: prometheus app.kubernetes.io/instance: k8s app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus app.kubernetes.io/version: 3.4.0 name: prometheus-k8s namespace: monitoring spec: ports: - name: web nodePort: 31922 #增加 port: 9090 targetPort: web - name: reloader-web nodePort: 30981 port: 8080 targetPort: reloader-web selector: app.kubernetes.io/component: prometheus app.kubernetes.io/instance: k8s app.kubernetes.io/name: prometheus app.kubernetes.io/part-of: kube-prometheus sessionAffinity: ClientIP type: NodePort #增加
修改 Grafana 的 service
[root@k8s-master kube-prometheus-0.15.0]# cat manifests/grafana-service.yaml apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus app.kubernetes.io/version: 12.0.1 name: grafana namespace: monitoring spec: ports: - name: http nodePort: 30300 #增加 port: 3000 targetPort: http selector: app.kubernetes.io/component: grafana app.kubernetes.io/name: grafana app.kubernetes.io/part-of: kube-prometheus type: NodePort #增加
修改 Alertmanager 的 service
cat manifests/alertmanager-service.yaml apiVersion: v1 kind: Service metadata: labels: app.kubernetes.io/component: alert-router app.kubernetes.io/instance: main app.kubernetes.io/name: alertmanager app.kubernetes.io/part-of: kube-prometheus app.kubernetes.io/version: 0.28.1 name: alertmanager-main namespace: monitoring spec: ports: - name: web nodePort: 30200 #增加 port: 9093 targetPort: web - name: reloader-web nodePort: 31160 port: 8080 targetPort: reloader-web selector: app.kubernetes.io/component: alert-router app.kubernetes.io/instance: main app.kubernetes.io/name: alertmanager app.kubernetes.io/part-of: kube-prometheus sessionAffinity: ClientIP type: NodePort #增加

安装

在kube-prometheus-0.15.0目录下执行以下命令进行安装

text
kubectl apply --server-side -f manifests/setup

image-20250711000640212

[root@k8s-master kube-prometheus-0.15.0]# kubectl apply -f manifests/ alertmanager.monitoring.coreos.com/main created networkpolicy.networking.k8s.io/alertmanager-main created poddisruptionbudget.policy/alertmanager-main created prometheusrule.monitoring.coreos.com/alertmanager-main-rules created secret/alertmanager-main created service/alertmanager-main created serviceaccount/alertmanager-main created servicemonitor.monitoring.coreos.com/alertmanager-main created clusterrole.rbac.authorization.k8s.io/blackbox-exporter created clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created configmap/blackbox-exporter-configuration created deployment.apps/blackbox-exporter created networkpolicy.networking.k8s.io/blackbox-exporter created service/blackbox-exporter created serviceaccount/blackbox-exporter created servicemonitor.monitoring.coreos.com/blackbox-exporter created secret/grafana-config created secret/grafana-datasources created configmap/grafana-dashboard-alertmanager-overview created configmap/grafana-dashboard-apiserver created configmap/grafana-dashboard-cluster-total created configmap/grafana-dashboard-controller-manager created configmap/grafana-dashboard-grafana-overview created configmap/grafana-dashboard-k8s-resources-cluster created configmap/grafana-dashboard-k8s-resources-multicluster created configmap/grafana-dashboard-k8s-resources-namespace created configmap/grafana-dashboard-k8s-resources-node created configmap/grafana-dashboard-k8s-resources-pod created configmap/grafana-dashboard-k8s-resources-windows-cluster created configmap/grafana-dashboard-k8s-resources-windows-namespace created configmap/grafana-dashboard-k8s-resources-windows-pod created configmap/grafana-dashboard-k8s-resources-workload created configmap/grafana-dashboard-k8s-resources-workloads-namespace created configmap/grafana-dashboard-k8s-windows-cluster-rsrc-use created configmap/grafana-dashboard-k8s-windows-node-rsrc-use created configmap/grafana-dashboard-kubelet created configmap/grafana-dashboard-namespace-by-pod created configmap/grafana-dashboard-namespace-by-workload created configmap/grafana-dashboard-node-cluster-rsrc-use created configmap/grafana-dashboard-node-rsrc-use created configmap/grafana-dashboard-nodes-aix created configmap/grafana-dashboard-nodes-darwin created configmap/grafana-dashboard-nodes created configmap/grafana-dashboard-persistentvolumesusage created configmap/grafana-dashboard-pod-total created configmap/grafana-dashboard-prometheus-remote-write created configmap/grafana-dashboard-prometheus created configmap/grafana-dashboard-proxy created configmap/grafana-dashboard-scheduler created configmap/grafana-dashboard-workload-total created configmap/grafana-dashboards created deployment.apps/grafana created networkpolicy.networking.k8s.io/grafana created prometheusrule.monitoring.coreos.com/grafana-rules created service/grafana created serviceaccount/grafana created servicemonitor.monitoring.coreos.com/grafana created prometheusrule.monitoring.coreos.com/kube-prometheus-rules created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created deployment.apps/kube-state-metrics created networkpolicy.networking.k8s.io/kube-state-metrics created prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created service/kube-state-metrics created serviceaccount/kube-state-metrics created servicemonitor.monitoring.coreos.com/kube-state-metrics created prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created servicemonitor.monitoring.coreos.com/kube-apiserver created servicemonitor.monitoring.coreos.com/coredns created servicemonitor.monitoring.coreos.com/kube-controller-manager created servicemonitor.monitoring.coreos.com/kube-scheduler created servicemonitor.monitoring.coreos.com/kubelet created clusterrole.rbac.authorization.k8s.io/node-exporter created clusterrolebinding.rbac.authorization.k8s.io/node-exporter created daemonset.apps/node-exporter created networkpolicy.networking.k8s.io/node-exporter created prometheusrule.monitoring.coreos.com/node-exporter-rules created service/node-exporter created serviceaccount/node-exporter created servicemonitor.monitoring.coreos.com/node-exporter created clusterrole.rbac.authorization.k8s.io/prometheus-k8s created clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created networkpolicy.networking.k8s.io/prometheus-k8s created poddisruptionbudget.policy/prometheus-k8s created prometheus.monitoring.coreos.com/k8s created prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s-config created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created service/prometheus-k8s created serviceaccount/prometheus-k8s created servicemonitor.monitoring.coreos.com/prometheus-k8s created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created clusterrole.rbac.authorization.k8s.io/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created configmap/adapter-config created deployment.apps/prometheus-adapter created networkpolicy.networking.k8s.io/prometheus-adapter created poddisruptionbudget.policy/prometheus-adapter created rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created service/prometheus-adapter created serviceaccount/prometheus-adapter created servicemonitor.monitoring.coreos.com/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/prometheus-operator created clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created deployment.apps/prometheus-operator created networkpolicy.networking.k8s.io/prometheus-operator created prometheusrule.monitoring.coreos.com/prometheus-operator-rules created service/prometheus-operator created serviceaccount/prometheus-operator created servicemonitor.monitoring.coreos.com/prometheus-operator created

访问monitoring 空间,查看部署状态

shell
kubectl get pods -n monitoring

image-20250711012351802

image-20250711013056877

image-20250711013249825

Prometheus验证

选一台 node 节点ip+31922,即可访问prometheus的 Web UI

image-20250711014628381

Alertmanager验证

选一台 node 节点ip+30200,即可访问alertmanager的 Web UI,可以看到有一些报警,由于alertmanager的报警配置比较复杂同时对国内的通讯工具支持有限,因此可以使用PrometheusAlert进行告警配置

image-20250711014721502

Grafana验证

选一台 node 节点ip+30300,即可访问grafana的 Web UI,默认用户名密码:admin/admin,登录会提示更改密码Admin12345,登录以后,可以看到已经内置了不少监控大盘

image-20250711021720241

image-20250711022220138

Grafana密码重置

[root@k8s-master kube-prometheus-0.15.0]# kubectl exec -it -nmonitoring grafana-9988b94d4-rwngx -- grafana-cli admin reset-admin-password Admin@12345

image-20250711095456721

如果出现无法访问grafan、prometheus以及alertmanger,因为prometheus operator内部默认配置了NetworkPolicy,需要删除对应的资源,才可以通过外网访问 kubectl delete -f manifests/prometheus-networkPolicy.yaml kubectl delete -f manifests/grafana-networkPolicy.yaml kubectl delete -f manifests/alertmanager-networkPolicy.yaml

部署后dashboard监控指标不显示

shell
[root@k8s-master kube-prometheus-0.15.0]# kubectl delete -f ../metrics-server.yaml [root@k8s-master kube-prometheus-0.15.0]# kubectl apply -f ../metrics-server.yaml

image-20250711025831906