Prometheus怎么用来帮助解决性能问题之入门篇

mac2026-02-09  12

Prometheus是啥?Prometheus做什么的?

https://prometheus.io/

就是个开源的监控系统,显示数据的图表,可自定义查询数据规则,集成多种语言客户端

 

哦,就是跑job,从自己的客户端里抓数据(自己的项目里还得集成它的client,侵入了啊喂),显示成图表嘛,大家莫慌,直接上

 

Prometheus怎么用?不要怂直接上官网教程撸一遍

下载解压版

https://prometheus.io/download/

或者 docker

https://hub.docker.com/u/prom

 

我无耻的下载了windows版本的。

https://github.com/prometheus/prometheus/releases/download/v2.13.1/prometheus-2.13.1.windows-amd64.tar.gz

 

不先启动下吗?

双击prometheus.exe,windows提示有风险,点击more info,选择run anyway

或者命令行运行吧,这玩意儿启动时真的快

 

看到打印信息,习惯性的扫一遍,它能给我们看到的是

level=info ts=2019-11-01T02:14:28.256Z caller=main.go:296 msg="no time or size retention was set so using the default time retention" duration=15d

默认保留15天数据

level=info ts=2019-11-01T02:14:28.257Z caller=main.go:332 msg="Starting Prometheus" version="(version=2.13.1, branch=HEAD, revision=6f92ce56053866194ae5937012c1bec40f1dd1d9)"

level=info ts=2019-11-01T02:14:28.258Z caller=main.go:333 build_context="(go=go1.13.1, user=root@88e419aa1676, date=20191017-13:31:33)"

level=info ts=2019-11-01T02:14:28.258Z caller=main.go:334 host_details=(windows)

level=info ts=2019-11-01T02:14:28.258Z caller=main.go:335 fd_limits=N/A

level=info ts=2019-11-01T02:14:28.258Z caller=main.go:336 vm_limits=N/A

level=info ts=2019-11-01T02:14:28.261Z caller=main.go:657 msg="Starting TSDB ..."

level=info ts=2019-11-01T02:14:28.261Z caller=web.go:450 component=web msg="Start listening for connections" address=0.0.0.0:9090

level=info ts=2019-11-01T02:14:28.264Z caller=head.go:514 component=tsdb msg="replaying WAL, this may take awhile"

level=info ts=2019-11-01T02:14:28.281Z caller=head.go:562 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0

level=info ts=2019-11-01T02:14:28.282Z caller=main.go:672 fs_type=unknown

level=info ts=2019-11-01T02:14:28.282Z caller=main.go:673 msg="TSDB started"

启动了个TSDB,现在做mertrics的都用它,time-series database,可以按时间点和时间区间索引数据

level=info ts=2019-11-01T02:14:28.283Z caller=main.go:743 msg="Loading configuration file" filename=prometheus.yml

加载了个配置文件prometheus.yml,配置文件这咱熟悉,一会肯定是要玩配置了

level=info ts=2019-11-01T02:14:28.295Z caller=main.go:771 msg="Completed loading of configuration file" filename=prometheus.yml

level=info ts=2019-11-01T02:14:28.295Z caller=main.go:626 msg="Server is ready to receive web requests."

 

那就打开配置文件先看看,也不是很复杂

# my global config 全局的 global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration 报警的 alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. # 定义生成图表用的规则 rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. # 还有它自己的,这里可以扩展到其他app scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] # 它启动的时候不告诉我们访问的端口,感情这放在配置里呢,浏览器贴入localhost:9090 # 看到页面,反正是起来了,下面继续看看怎么用吧

 

要怎么用呢?当然是配置配置文件啦

Prometheus 通过HTTP 抓数据,而且还抓自己的数据。

 

那我们就来抓抓除它自己以外的app的数据

进入prometheus.yml:

 

在global下面添加,不添加也行,不影响我们的小demo,注释里也说了是和外部app交互时用的label

# Attach these labels to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: monitor: 'codelab-monitor'

 

 

然后添加在自己的scrape_configs下面添加新的job,叫做example-random。

你看,加job拉数据了又。

scrape_configs: - job_name: 'prometheus' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s static_configs: - targets: ['localhost:9090'] - job_name: 'example-random' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s static_configs: - targets: ['localhost:8080', 'localhost:8081'] labels: group: 'production' - targets: ['localhost:8082'] labels: group: 'canary'

 

官网自己用的是go写的,先尝个鲜,后面性能调试的时候,我们直接用springboot做例子。

Ensure you have the Go compiler installed and have a working Go build environment (with correct GOPATH) set up.

# Fetch the client library code and compile example. git clone https://github.com/prometheus/client_golang.git cd client_golang/examples/random go get -d go build # Start 3 example targets in separate terminals: ./random -listen-address=:8080 ./random -listen-address=:8081 ./random -listen-address=:8082

 

这就可以了,在9090页面里,在执行按钮后面选择一个metric的label,就可以看数据了。

 

但我们一般都是要count avg啥的,看简单的不合适,在线上做这个就不合适了,在线计算耗时占资源,最好先定义好了,这时候就是rules入场了

创建prometheus.rules.yml,和prometheus.yml在同一个目录

添加如下代码

groups: - name: example rules: - record: job_service:rpc_durations_seconds_count:avg_rate5m expr: avg(rate(rpc_durations_seconds_count[5m])) by (job, service)

在prometheus.yml下添加,存在别的地方的就按照自己的路径改下

rule_files: - 'prometheus.rules.yml'

 

 

重启,还是老方式,打开9090,输入job_service:rpc_durations_seconds_count:avg_rate5m

执行,选择graph和console都看看。

 

其实在status菜单下面configuration,rules,targets,直接打开点击就送,不用手动输入的。

 

好了,现在算是入了门了。

下篇文章讲讲如何用prometheus来帮助解决性能问题。

 

 

最新回复(0)