K8S 资源指标监控-部署metrics-server

mac2026-04-08  7

K8S资源指标获取工具:metrics-server 自定义指标的监控工具:prometheus,k8s-prometheus-adapter

prometheus:prometheus能够收集各种维度的资源指标,比如CPU利用率,网络连接的数量,网络报文的收发速率,包括进程的新建及回收速率等等,能够监控许许多多的指标,而这些指标K8S早期是不支持的,所以需要把prometheus能采集到的各种指标整合进k8s里,能让K8S根据这些指标来判断是否需要根据这些指标来进行pod的伸缩。

prometheus既作为监控系统来使用,也作为某些特殊的资源指标的提供者来使用。但是这些指标不是标准的K8S内建指标,称之为自定义指标,但是prometheus要想将监控采集到的数据作为指标来展示,则需要一个插件,这个插件叫k8s-prometheus-adapter,这些指标判断pod是否需要伸缩的基本标准,例如根据cpu的利用率、内存使用量去进行伸缩。

随着prometheus和k8s-prometheus-adapter的引入,新一代的k8s架构也就形成了。

K8S新一代架构

核心指标流水线:由kubelet、metrics-server以及由API server提供的api组成;CPU累积使用率、内存的实时使用率、pod的资源占用率及容器的磁盘占用率;

监控流水线:用于从系统收集各种指标数据并提供给终端用户、存储系统以及HPA,包含核心指标以及其他许多非核心指标。非核心指标本身不能被K8S所解析。所以需要k8s-prometheus-adapter将prometheus采集到的数据转化为k8s能理解的格式,为k8s所使用。

核心指标监控

之前使用的是heapster,但是1.12后就废弃了,之后使用的替代者是metrics-server;metrics-server是由用户开发的一个api server,用于服务资源指标,而不是服务pod,deploy的。metrics-server本身不是k8s的组成部分,是托管运行在k8s上的一个pod,那么如果想要用户在k8s上无缝的使用metrics-server提供的api服务,因此在新一代的架构中需要这样去组合它们。如图,使用一个聚合器去聚合k8s的api server与metrics-server,然后由群组/apis/metrics.k8s.io/v1beta1来获取。 之后如果用户还有其他的api server都可以整合进aggregator,由aggregator来提供服务,如图。

查看k8s默认的api-version,可以看到是没有metrics.k8s.io这个组的 当你部署好metrics-server后再查看api-versions就可以看到metrics.k8s.io这个组了。

部署metrics-server 进到kubernetes项目下的cluster下的addons,找到对应的项目下载下来

[root@master bcia]# mkdir metrics-server -p [root@master bcia]# cd metrics-server/ [root@master metrics-server]# for file in auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml ; do wget https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/metrics-server/$file;done //一次性下载所有文件 --2019-11-02 10:18:10-- https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/metrics-server/auth-delegator.yaml Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.228.133 Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.228.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 398 [text/plain] Saving to: ‘auth-delegator.yaml’ 100%[==========================================================================>] 398 --.-K/s in 0s ...省略... [root@master metrics-server]# ls auth-delegator.yaml metrics-apiservice.yaml metrics-server-service.yaml auth-reader.yaml metrics-server-deployment.yaml resource-reader.yaml [root@master metrics-server]# kubectl apply -f . //一次性运行所有文件 clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created serviceaccount/metrics-server created configmap/metrics-server-config created deployment.apps/metrics-server-v0.3.6 created service/metrics-server created clusterrole.rbac.authorization.k8s.io/system:metrics-server created clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created

运行后发现报错,一次性删除所有,修改几处地方,如图 1、metrics-server-deployment.yaml metrics-server的command中加上 - --kubelet-insecure-tls 表示不验证客户端的证书,注释掉端口10255,注释后会使用10250,通过https通信

addon-resizer的command中写上具体的cpu、memory、extra-memory的值,注释掉minClusterSize={{ metrics_server_min_cluster_size }} 2、resource-reader.yaml 加上nodes/stats,如图 修改后的metrics-server-deployment.yaml和resource-reader.yaml文件内容放在了本文的最后。

测试是否可使用

//查看pods是否正常运行 [root@master metrics-server]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-8686dcc4fd-bzgss 1/1 Running 0 9d coredns-8686dcc4fd-xgd49 1/1 Running 0 9d etcd-master 1/1 Running 0 9d kube-apiserver-master 1/1 Running 0 9d kube-controller-manager-master 1/1 Running 0 9d kube-flannel-ds-amd64-52d6n 1/1 Running 0 9d kube-flannel-ds-amd64-k8qxt 1/1 Running 0 8d kube-flannel-ds-amd64-lnss4 1/1 Running 0 9d kube-proxy-4s5mf 1/1 Running 0 8d kube-proxy-b6szk 1/1 Running 0 9d kube-proxy-wsnfz 1/1 Running 0 9d kube-scheduler-master 1/1 Running 0 9d kubernetes-dashboard-76f6bf8c57-rncvn 1/1 Running 0 8d metrics-server-v0.3.6-677d79858c-75vk7 2/2 Running 0 18m tiller-deploy-57c977bff7-tcnrf 1/1 Running 0 7d20h

查看api-versions,会看到多出了metrics.k8s.io/v1beta1 查看node及pod监控指标

//查看node及pod监控指标 [root@master metrics-server]# kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% master 145m 3% 1801Mi 11% node2 697m 17% 12176Mi 77% node3 838m 20% 12217Mi 77% [root@master metrics-server]# kubectl top pods NAME CPU(cores) MEMORY(bytes) account-deploy-6d86f9df74-khv4v 5m 444Mi admin-deploy-55dcf4bc4d-srw8m 2m 317Mi backend-deploy-6f7bdd9bf4-w4sqc 4m 497Mi crm-deploy-7879694578-cngzp 4m 421Mi device-deploy-77768bf87c-ct5nc 5m 434Mi elassandra-0 168m 4879Mi gateway-deploy-68c988676d-wnqsz 4m 379Mi jhipster-alerter-74fc8984c4-27bx8 1m 46Mi jhipster-console-85556468d-kjfg6 3m 119Mi jhipster-curator-67b58477b9-5f8br 1m 11Mi jhipster-logstash-74878f8b49-mpn62 59m 860Mi jhipster-zipkin-5b5ff7bdbc-bsxhk 1m 1571Mi order-deploy-c4c846c54-2gxkp 5m 440Mi pos-registry-76bbd6c689-q5w2b 442m 474Mi recv-deploy-5dd686c947-v4qqh 5m 424Mi store-deploy-54c994c9b6-82b8z 6m 493Mi task-deploy-64c9984d88-fqxqq 6m 461Mi wiggly-cat-redis-ha-sentinel-655f7b5f9d-bbrz6 4m 4Mi wiggly-cat-redis-ha-sentinel-655f7b5f9d-bj4bq 4m 5Mi wiggly-cat-redis-ha-sentinel-655f7b5f9d-f9pdd 4m 5Mi wiggly-cat-redis-ha-server-b58c8d788-6xlwk 3m 11Mi wiggly-cat-redis-ha-server-b58c8d788-r949h 3m 8Mi wiggly-cat-redis-ha-server-b58c8d788-w2gtb 3m 22Mi

至此,metrics-server部署结束。下一篇写Prometheus

apiVersion: v1 kind: ServiceAccount metadata: name: metrics-server namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile --- apiVersion: v1 kind: ConfigMap metadata: name: metrics-server-config namespace: kube-system labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: EnsureExists data: NannyConfiguration: |- apiVersion: nannyconfig/v1alpha1 kind: NannyConfiguration --- apiVersion: apps/v1 kind: Deployment metadata: name: metrics-server-v0.3.3 namespace: kube-system labels: k8s-app: metrics-server kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile version: v0.3.3 spec: selector: matchLabels: k8s-app: metrics-server version: v0.3.3 template: metadata: name: metrics-server labels: k8s-app: metrics-server version: v0.3.3 annotations: scheduler.alpha.kubernetes.io/critical-pod: '' seccomp.security.alpha.kubernetes.io/pod: 'docker/default' spec: priorityClassName: system-cluster-critical serviceAccountName: metrics-server containers: - name: metrics-server image: gcr.azk8s.cn/google-containers/metrics-server-amd64:v0.3.3 command: - /metrics-server - --metric-resolution=30s # - --kubeconfig=/key/kubeconfig # These are needed for GKE, which doesn't support secure communication yet. # Remove these lines for non-GKE clusters, and when GKE supports token-based auth. #- --kubelet-port=10255 # - --deprecated-kubelet-completely-insecure=true #- --source=kubernetes.summary_api:https://kubernetes.default.svc?kubeletHttps=true&kubeletPort=10250&useServiceAccount=true&insecure=true - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP ports: - containerPort: 443 name: https protocol: TCP - name: metrics-server-nanny image: gcr.azk8s.cn/google-containers/addon-resizer:1.8.4 resources: limits: cpu: 100m memory: 300Mi requests: cpu: 5m memory: 50Mi env: - name: MY_POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: MY_POD_NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace volumeMounts: - name: metrics-server-config-volume mountPath: /etc/config command: - /pod_nanny - --config-dir=/etc/config - --cpu=80m - --extra-cpu=0.5m - --memory=80Mi - --extra-memory=8Mi - --threshold=5 - --deployment=metrics-server-v0.3.3 - --container=metrics-server - --poll-period=300000 - --estimator=exponential # Specifies the smallest cluster (defined in number of nodes) # resources will be scaled to. # - --minClusterSize={{ metrics_server_min_cluster_size }} volumes: - name: metrics-server-config-volume configMap: name: metrics-server-config tolerations: - key: "CriticalAddonsOnly" operator: "Exists" apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: system:metrics-server labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile rules: - apiGroups: - "" resources: - pods - nodes - nodes/stats - namespaces verbs: - get - list - watch - apiGroups: - "extensions" resources: - deployments verbs: - get - list - update - watch --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:metrics-server labels: kubernetes.io/cluster-service: "true" addonmanager.kubernetes.io/mode: Reconcile roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:metrics-server subjects: - kind: ServiceAccount name: metrics-server namespace: kube-system
最新回复(0)