kube-prometheus监控 K8s 集群外服务
前言
实际生产中,并不是所有组件都在 K8S 集群内, 如: LB、DB、中间件服务等等...
监控 K8s 集群外服务方案
针对此类服务, 有以下监控方案:
- 通过
Prometheus Operator CR - prometheus spec
方式 - 这种方案和 Prometheus 其他配置耦合性较高 - 通过
Service + Endpoint + ServiceMonitor
方式 - 这种方案的适应性较强, 耦合性也较低,后续也是这种方式新增监控
方案一
prometheus spec模式
简而言之, 就是直接在 prometheus spec
中加入类似这样的静态配置(static_configs):
static_configs:
- targets:
- SERVICE-FQDN
通过 Kube-prometheus
自定义参数模式
添加 prometheus-additional.yaml
- job_name: 'node-exporter-others'
static_configs:
- targets:
- 192.168.100.10:9100
metrics_path: /metrics
通过 prometheus-additional.yaml
文件生成 Secret
kubectl create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml -n monitoring
编辑 prometheus-prometheus.yaml
添加额外的抓取参数
additionalScrapeConfigs:
name: additional-scrape-configs
key: prometheus-additional.yaml
prometheus-prometheus.yaml
在 manifests 目录下,执行:kubectl apply -f prometheus-prometheus.yaml
方案二
通过 ServiceMonitor
方式,即:Service + Endpoint + ServiceMonitor
生产实际执行例子,如获取node-exporter
:
Service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: monitoring-external-node-exporter
app.kubernetes.io/name: node-exporter
release: monitoring
name: monitoring-external-node-exporter
namespace: monitoring
spec:
type: ClusterIP
ports:
- name: http-metrics
port: 39100
protocol: TCP
targetPort: 39100
---
apiVersion: v1
kind: Endpoints
metadata:
name: monitoring-external-node-exporter
labels:
app: monitoring-external-node-exporter
app.kubernetes.io/name: node-exporter
release: monitoring
namespace: monitoring
subsets:
- addresses:
- ip: 10.194.24.53
ports:
- name: http-metrics
port: 39100
protocol: TCP
external-node-exporter-basic-auth-secret.yaml
为了安全,设置了数据需要认证
apiVersion: v1
kind: Secret
metadata:
name: external-node-exporter-basic-auth
namespace: monitoring
data:
password: base64加密内容
user: base64加密内容
type: Opaque
ServiceMonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: monitoring-external-node-exporter
release: monitoring
name: monitoring-external-node-exporter-sm
namespace: monitoring
spec:
endpoints:
- interval: 30s
port: http-metrics
path: /metrics
authorization:
basicAuth:
username:
name: external-node-exporter-basic-auth
key: user
password:
name: external-node-exporter-basic-auth
key: password
jobLabel: node-exporter
namespaceSelector:
matchNames:
- monitoring
selector:
matchLabels:
app: monitoring-external-node-exporter
release: monitoring
这种方式虽然绕了一些, 但是可以保证修改组件 A 的监控的时候,完全不会影响到组件 B 的配置。
另外, 也不会影响到 Prometheus 其他的监控.
配置更精确; 粒度更细; 耦合度更低.
其他例子
例子1
---
apiVersion: v1
kind: Service
metadata:
name: external-node-exporter
namespace: monitoring
labels:
app: external-node-exporter
app.kubernetes.io/name: node-exporter
spec:
type: ClusterIP
ports:
- name: metrics
port: 9100
protocol: TCP
targetPort: 9100
---
apiVersion: v1
kind: Endpoints
metadata:
name: external-node-exporter
namespace: monitoring
labels:
app: external-node-exporter
app.kubernetes.io/name: node-exporter
subsets:
- addresses:
- ip: 192.168.10.10 # 这里是外部的资源列表
ports:
- name: metrics
port: 9100
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: external-node-exporter
namespace: monitoring
labels:
app: external-node-exporter
release: prometheus
spec:
selector:
matchLabels: # Service选择器
app: external-node-exporter
namespaceSelector: # Namespace选择器
matchNames:
- monitoring
endpoints:
- port: metrics # 采集节点端口(svc定义)
interval: 10s # 采集频率根据实际需求配置,prometheus默认15s
path: /metrics # 默认地址/metrics
例子2
Stack Overflow上的例子
apiVersion: v1
kind: Endpoints
metadata:
name: confluent-cloud-telemetry-svc
namespace: monitoring
labels:
app: confluent-cloud-telemetry
release: prometheus-operator
subsets:
- addresses:
- ip: 54.149.69.190
ports:
- name: confluent-cloud-telemetry-port
protocol: TCP
port: 443
---
apiVersion: v1
kind: Service
metadata:
name: confluent-cloud-telemetry-svc
namespace: monitoring
labels:
app: confluent-cloud-telemetry
release: prometheus-operator
spec:
type: ExternalName
externalName: api.telemetry.confluent.cloud
ports:
- name: confluent-cloud-telemetry-port
protocol: TCP
port: 443
targetPort: 443
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: confluent-cloud-telemetry-sm
namespace: monitoring
labels:
app: confluent-cloud-telemetry
release: prometheus-operator
spec:
selector:
matchLabels:
app: confluent-cloud-telemetry
release: prometheus-operator
namespaceSelector:
matchNames:
- monitoring
endpoints:
- basicAuth:
password:
name: kafka-basic-auth
key: password
username:
name: kafka-basic-auth
key: user
port: confluent-cloud-telemetry-port
interval: 60s
honorLabels: true
scheme: https
path: /v2/metrics/cloud/export
tlsConfig:
insecureSkipVerify: true
relabelings:
- sourceLabels: [__address__]
targetLabel: __address__
regex: (.*)
replacement: "api.telemetry.confluent.cloud:443"
action: replace
params:
resource.kafka.id:
- <YOUR_CLUSTER_ID>
---
apiVersion: v1
kind: Secret
metadata:
name: kafka-basic-auth
data:
password: <YOUR_SECRET_BASE64>
user: <YOUR_KEY_BASE64>
type: Opaque
参考
https://blog.51cto.com/liubin0505star/5767918
https://blog.51cto.com/ewhisper/5897711
https://prometheus-operator.dev/docs/operator/api/#monitoring.coreos.com/v1.ServiceMonitorSpec
https://cloud.tencent.com/developer/ask/sof/107041525/answer/116863579