kube-prometheus 监控Redis
因为此次是使用K8s部署redis的,因此就采用边车模式(sidecar)新增一个redis-exporter监控容器,如果是集群外的redis,可以参考之前kube-prometheus监控 K8s集群外服务
笔记。
操作步骤
部署单机版redis过程省略,此次是sidecar模式部署,redis-deply.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: redis-single-node
name: redis-single-node
namespace: tools
spec:
progressDeadlineSeconds: 600 #定义 deploy 升级的最大时间。
replicas: 1
revisionHistoryLimit: 2 #定义保留的升级记录数。
selector:
matchLabels:
app: redis-single-node
template:
metadata:
labels:
app: redis-single-node
spec:
imagePullSecrets:
- name: hub
containers:
- command:
- sh
- -c
- redis-server "/mnt/redis.conf"
env:
- name: TZ
value: Asia/Shanghai
- name: LANG
value: C.UTF-8
image: 10.194.24.53/tools/redis:6.2.13-alpine
imagePullPolicy: IfNotPresent
name: redis-single-node
ports:
- containerPort: 6379
name: addr
protocol: TCP
resources:
limits:
cpu: '1'
memory: '2Gi'
requests:
cpu: 100m
memory: 10Mi
securityContext: #上下文参数
privileged: false #特权,最高权限
runAsNonRoot: false #禁止以root用户启动容器 true为禁止
volumeMounts:
- mountPath: /mnt
name: redis-conf
readOnly: true
- mountPath: /data
name: redis-data
- name: redis-exporter
image: 10.194.24.53/k8s-component/oliver006/redis_exporter:v1.54.0
env:
# - name: REDIS_ADDR
# value: "redis-single-node:6379"
- name: REDIS_PASSWORD
value: "redis密码,如果为空则可不填写"
securityContext:
runAsUser: 59000
runAsGroup: 59000
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 250m
memory: 180Mi
ports:
- containerPort: 9121
name: redis-exporter
restartPolicy: Always
volumes:
- configMap:
defaultMode: 420
name: redis-config
name: redis-conf
- name: redis-data
persistentVolumeClaim:
claimName: redis-pvc
redis-exporter-svc.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: redis-exporter
name: redis-exporter-svc
namespace: tools
spec:
ports:
- name: http-metrics
port: 9121
protocol: TCP
targetPort: 9121
type: ClusterIP
selector:
app: redis-single-node
创建servicemonitor的crd对象
redis-exporter-sm.yaml
# ServiceMonitor 服务自动发现规则
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor # prometheus-operator 定义的CRD
metadata:
labels:
app: redis-exporter
release: monitoring
name: redis-exporter
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
port: http-metrics # 拉去metric的端口,这个写的是 service的端口名称,即 service yaml的spec.ports.name
interval: 30s
path: /metrics
jobLabel: redis-exporter # 监控数据的job标签指定为metrics label的值,即加上数据标签job=redis-exporter
namespaceSelector:
# matchNames: # 配置需要自动发现的命名空间,可以配置多个
# - default
any: true
selector:
matchLabels:
app: redis-exporter
监控Kubernetes集群外的redis_exporter
cat > redis-monitor.yaml << 'EOF'
apiVersion: v1
kind: Endpoints
metadata:
name: redis-metrics
namespace: monitoring
labels:
k8s-app: redis-metrics
subsets:
- addresses:
- ip: 172.16.3.225
ports:
- name: redis-exporter
port: 9121
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: redis-metrics
namespace: monitoring
labels:
k8s-app: redis-metrics
spec:
type: ClusterIP
clusterIP: None
ports:
- name: redis-exporter
port: 9121
protocol: TCP
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: redis-metrics
namespace: monitoring
labels:
app: redis-metrics
k8s-app: redis-metrics
prometheus: kube-prometheus
release: kube-prometheus
spec:
endpoints:
- port: redis-exporter
interval: 15s
selector:
matchLabels:
k8s-app: redis-metrics
namespaceSelector:
matchNames:
- monitoring
EOF
查看target
prometheus
已自动发现了redis
登录grafana,导入模板
面板地址https://grafana.com/grafana/dashboards/11835
解决内存监控显示为∞
问题
原因:
因为没有给Redis设置最大内存,所以redis_memory_max_bytes
是0,所以计算结果是无穷大。下面是计算公式:100 * (redis_memory_used_bytes / redis_memory_max_bytes)
解决:
1.命令行设置最大内存
10.211.11.110:6379> CONFIG SET maxmemory 1024mb
10.211.11.110:6379> CONFIG SET maxmemory-policy volatile-ttl
2.redis.conf
配置文件添加
maxmemory 2048mb
maxmemory-policy volatile-ttl
配置prometheusrule
redis-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: redis-rules
namespace: monitoring
spec:
groups:
- name: redis.rules
rules:
- alert: RedisDown
expr: redis_up == 0
for: 0m
labels:
severity: critical
annotations:
summary: Redis down (instance {{ $labels.instance }})
description: "Redis instance is down\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisMissingMaster
expr: (count(redis_instance_info{role="master"}) or vector(0)) < 1
for: 0m
labels:
severity: critical
annotations:
summary: Redis missing master (instance {{ $labels.instance }})
description: "Redis cluster has no node marked as master.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisTooManyMasters
expr: count(redis_instance_info{role="master"}) > 1
for: 0m
labels:
severity: critical
annotations:
summary: Redis too many masters (instance {{ $labels.instance }})
description: "Redis cluster has too many nodes marked as master.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisDisconnectedSlaves
expr: count without (instance, job) (redis_connected_slaves) - sum without (instance, job) (redis_connected_slaves) - 1 > 1
for: 0m
labels:
severity: critical
annotations:
summary: Redis disconnected slaves (instance {{ $labels.instance }})
description: "Redis not replicating for all slaves. Consider reviewing the redis replication status.\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisReplicationBroken
expr: delta(redis_connected_slaves[1m]) < 0
for: 0m
labels:
severity: critical
annotations:
summary: Redis replication broken (instance {{ $labels.instance }})
description: "Redis instance lost a slave\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisClusterFlapping
expr: changes(redis_connected_slaves[1m]) > 1
for: 2m
labels:
severity: critical
annotations:
summary: Redis cluster flapping (instance {{ $labels.instance }})
description: "Changes have been detected in Redis replica connection. This can occur when replica nodes lose connection to the master and reconnect (a.k.a flapping).\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisMissingBackup
expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
for: 0m
labels:
severity: critical
annotations:
summary: Redis missing backup (instance {{ $labels.instance }})
description: "Redis has not been backuped for 24 hours\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
# The exporter must be started with --include-system-metrics flag or REDIS_EXPORTER_INCL_SYSTEM_METRICS=true environment variable.
- alert: RedisOutOfSystemMemory
expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
for: 2m
labels:
severity: warning
annotations:
summary: Redis out of system memory (instance {{ $labels.instance }})
description: "Redis is running out of system memory (> 90%)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisOutOfConfiguredMaxmemory
expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 90
for: 2m
labels:
severity: warning
annotations:
summary: Redis out of configured maxmemory (instance {{ $labels.instance }})
description: "Redis is running out of configured maxmemory (> 90%)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisTooManyConnections
expr: redis_connected_clients > 500
for: 2m
labels:
severity: warning
annotations:
summary: Redis too many connections (instance {{ $labels.instance }})
description: "Redis instance has too many connections\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
# - alert: RedisNotEnoughConnections
# expr: redis_connected_clients < 5
# for: 2m
# labels:
# severity: warning
# annotations:
# summary: Redis not enough connections (instance {{ $labels.instance }})
# description: "Redis instance should have more connections (> 5)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: RedisRejectedConnections
expr: increase(redis_rejected_connections_total[1m]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: Redis rejected connections (instance {{ $labels.instance }})
description: "Some connections to Redis has been rejected\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
参考
1.官方redis_exporter仓库:https://github.com/oliver006/redis_exporter
2.https://blog.51cto.com/wutengfei/5997105
3.https://blog.51cto.com/u_14440843/5759684