Prometheus是啥?Prometheus做什么的?
https://prometheus.io/
就是个开源的监控系统,显示数据的图表,可自定义查询数据规则,集成多种语言客户端
哦,就是跑job,从自己的客户端里抓数据(自己的项目里还得集成它的client,侵入了啊喂),显示成图表嘛,大家莫慌,直接上
Prometheus怎么用?不要怂直接上官网教程撸一遍
下载解压版
https://prometheus.io/download/
或者 docker
https://hub.docker.com/u/prom
我无耻的下载了windows版本的。
https://github.com/prometheus/prometheus/releases/download/v2.13.1/prometheus-2.13.1.windows-amd64.tar.gz
不先启动下吗?
双击prometheus.exe,windows提示有风险,点击more info,选择run anyway
或者命令行运行吧,这玩意儿启动时真的快
看到打印信息,习惯性的扫一遍,它能给我们看到的是
level=info ts=2019-11-01T02:14:28.256Z caller=main.go:296 msg="no time or size retention was set so using the default time retention" duration=15d
默认保留15天数据
level=info ts=2019-11-01T02:14:28.257Z caller=main.go:332 msg="Starting Prometheus" version="(version=2.13.1, branch=HEAD, revision=6f92ce56053866194ae5937012c1bec40f1dd1d9)"
level=info ts=2019-11-01T02:14:28.258Z caller=main.go:333 build_context="(go=go1.13.1, user=root@88e419aa1676, date=20191017-13:31:33)"
level=info ts=2019-11-01T02:14:28.258Z caller=main.go:334 host_details=(windows)
level=info ts=2019-11-01T02:14:28.258Z caller=main.go:335 fd_limits=N/A
level=info ts=2019-11-01T02:14:28.258Z caller=main.go:336 vm_limits=N/A
level=info ts=2019-11-01T02:14:28.261Z caller=main.go:657 msg="Starting TSDB ..."
level=info ts=2019-11-01T02:14:28.261Z caller=web.go:450 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2019-11-01T02:14:28.264Z caller=head.go:514 component=tsdb msg="replaying WAL, this may take awhile"
level=info ts=2019-11-01T02:14:28.281Z caller=head.go:562 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
level=info ts=2019-11-01T02:14:28.282Z caller=main.go:672 fs_type=unknown
level=info ts=2019-11-01T02:14:28.282Z caller=main.go:673 msg="TSDB started"
启动了个TSDB,现在做mertrics的都用它,time-series database,可以按时间点和时间区间索引数据
level=info ts=2019-11-01T02:14:28.283Z caller=main.go:743 msg="Loading configuration file" filename=prometheus.yml
加载了个配置文件prometheus.yml,配置文件这咱熟悉,一会肯定是要玩配置了
level=info ts=2019-11-01T02:14:28.295Z caller=main.go:771 msg="Completed loading of configuration file" filename=prometheus.yml
level=info ts=2019-11-01T02:14:28.295Z caller=main.go:626 msg="Server is ready to receive web requests."
那就打开配置文件先看看,也不是很复杂
# my global config 全局的 global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Alertmanager configuration 报警的 alerting: alertmanagers: - static_configs: - targets: # - alertmanager:9093 # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. # 定义生成图表用的规则 rule_files: # - "first_rules.yml" # - "second_rules.yml" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. # 还有它自己的,这里可以扩展到其他app scrape_configs: # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['localhost:9090'] # 它启动的时候不告诉我们访问的端口,感情这放在配置里呢,浏览器贴入localhost:9090 # 看到页面,反正是起来了,下面继续看看怎么用吧
要怎么用呢?当然是配置配置文件啦
Prometheus 通过HTTP 抓数据,而且还抓自己的数据。
那我们就来抓抓除它自己以外的app的数据
进入prometheus.yml:
在global下面添加,不添加也行,不影响我们的小demo,注释里也说了是和外部app交互时用的label
# Attach these labels to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: monitor: 'codelab-monitor'
然后添加在自己的scrape_configs下面添加新的job,叫做example-random。
你看,加job拉数据了又。
scrape_configs: - job_name: 'prometheus' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s static_configs: - targets: ['localhost:9090'] - job_name: 'example-random' # Override the global default and scrape targets from this job every 5 seconds. scrape_interval: 5s static_configs: - targets: ['localhost:8080', 'localhost:8081'] labels: group: 'production' - targets: ['localhost:8082'] labels: group: 'canary'
官网自己用的是go写的,先尝个鲜,后面性能调试的时候,我们直接用springboot做例子。
Ensure you have the Go compiler installed and have a working Go build environment (with correct GOPATH) set up.
# Fetch the client library code and compile example. git clone https://github.com/prometheus/client_golang.git cd client_golang/examples/random go get -d go build # Start 3 example targets in separate terminals: ./random -listen-address=:8080 ./random -listen-address=:8081 ./random -listen-address=:8082
这就可以了,在9090页面里,在执行按钮后面选择一个metric的label,就可以看数据了。
但我们一般都是要count avg啥的,看简单的不合适,在线上做这个就不合适了,在线计算耗时占资源,最好先定义好了,这时候就是rules入场了
创建prometheus.rules.yml,和prometheus.yml在同一个目录
添加如下代码
groups: - name: example rules: - record: job_service:rpc_durations_seconds_count:avg_rate5m expr: avg(rate(rpc_durations_seconds_count[5m])) by (job, service)在prometheus.yml下添加,存在别的地方的就按照自己的路径改下
rule_files: - 'prometheus.rules.yml'
重启,还是老方式,打开9090,输入job_service:rpc_durations_seconds_count:avg_rate5m
执行,选择graph和console都看看。
其实在status菜单下面configuration,rules,targets,直接打开点击就送,不用手动输入的。
好了,现在算是入了门了。
下篇文章讲讲如何用prometheus来帮助解决性能问题。
