架构图

Prometheus-01

安装客户端

安装包准备:

agent安装包bao.tar.gz,上传至/tmp目录

安装agent

解压,tar -zxvf bao.tar.gz

cd bao/
cp *.service /etc/systemd/system/
mkdir -p /opt/monitor
cp -a node_exporter/ /opt/monitor

启动agent

systemctl start node-exporter

安装联邦服务端

联邦服务信息

私有云
10.152.35.71
阿里公有云
10.252.100.168
腾讯公有云
10.229.12.148
腾讯专有云
10.238.19.14
跳转服务
10.152.67.132

采集逻辑

4个云每个云都有一个服务端用于采集各自云的agent信息,最终汇聚到私有云服务端。

阿里公有云:10.252.100.168上的数据通过代理10.152.67.132汇聚到私有云上10.152.35.71。

腾讯公有云:10.229.12.148上的数据汇聚到专有云的10.238.19.14上,再汇聚到私有云上10.152.35.71。

腾讯专有云:10.238.19.14上的数据可以直接汇聚到私有云上10.152.35.71。

私有云:10.152.35.71可以直接采集私有云agent的数据。

安装包准备:

prometheus-2.30.3.linux-amd64.tar.gz 上传至/tmp 下。

安装server

解压,tar -zxvf prometheus-2.30.3.linux-amd64.tar.gz.tar.gz

mv prometheus-2.30.3.linux-amd64 prometheus
mv prometheus /usr/local/

配置启动服务

vi /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target

[Service]
Restart=on-failure
WorkingDirectory=/usr/local/prometheus
ExecStart=/usr/local/prometheus/prometheus \
--web.enable-lifecycle \
--storage.tsdb.path=/data1/log/prometheus \
--storage.tsdb.retention.time=30d \
--config.file=/usr/local/prometheus/prometheus.yml
ExecStop=/bin/kill -s TERM $MAINPID

[Install]
WantedBy=multi-user.target

加载配置文件

systemctl daemon-reload

启动停止

#启动
systemctl start prometheus
#停止
systemctl stop prometheus

配置文件

阿里公有云配置

prometheus.yml

global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

alerting:
alertmanagers:
- static_configs:
- targets:
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
scrape_configs:

#基础资源
- job_name: "drm-aliyun"
file_sd_configs:
- files: ['node/aliyun_node.yml']
refresh_interval: 30s

#黑盒监控
- job_name: "drm-ali-blackbox_tcp"
metrics_path: "/probe"
params:
module: [tcp_connect]
file_sd_configs:
- files: ['/usr/local/blackbox_exporter/aliyun_bl.yml']
refresh_interval: 30s
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.252.100.168:9115

job_name: 'drm-ali' 定义了一个抓取任务的名字。

static_configs: 定义了一组静态配置的目标。

- targets: [ip列表] 列出了 Prometheus 将要抓取数据的目标列表。

files:文件配置文件

aliyun_node.yml基础资源监控列表:

- targets:
- '10.252.100.144:9100'
- '10.252.100.118:9100'
- '10.252.100.149:9100'
- '10.252.100.150:9100'
- '10.252.100.120:9100'
- '10.252.100.147:9100'
- '10.252.100.146:9100'
- '10.252.100.117:9100'
- '10.252.100.148:9100'
- '10.252.100.119:9100'
- '10.252.100.168:9100'
labels:
drm_server: 10.252.100.168

aliyun_bl.yml黑盒监控列表:

- targets:
- '10.252.100.149:8080'
- '10.252.100.149:80'
- '10.252.100.150:8719'
- '10.252.100.150:9206'
- '10.252.100.150:8720'
- '10.252.100.150:9304'
- '10.252.100.120:8719'
- '10.252.100.120:9304'
- '10.252.100.120:8720'
- '10.252.100.120:9206'
- '10.252.100.144:9401'
- '10.252.100.144:8720'
- '10.252.100.144:9402'
- '10.252.100.144:9101'
- '10.252.100.144:8721'
- '10.252.100.144:9102'
- '10.252.100.144:8719'
- '10.252.100.144:9201'
- '10.252.100.144:8722'
- '10.252.100.144:80'
- '10.252.100.144:443'
- '10.252.100.118:9101'
- '10.252.100.118:8720'
- '10.252.100.118:9102'
- '10.252.100.118:8719'
- '10.252.100.118:8721'
- '10.252.100.118:9201'
- '10.252.100.118:9401'
- '10.252.100.118:9402'
- '10.252.100.118:80'
- '10.252.100.118:443'
- '10.252.100.147:7848'
- '10.252.100.147:8848'
- '10.252.100.147:9848'
- '10.252.100.147:9849'
- '10.252.100.146:7848'
- '10.252.100.146:8848'
- '10.252.100.146:9848'
- '10.252.100.146:9849'
- '10.252.100.117:7848'
- '10.252.100.117:8848'
- '10.252.100.117:9848'
- '10.252.100.117:9849'
- '10.252.100.148:1920'
- '10.252.100.148:8220'
- '10.252.100.148:5465'
- '10.252.100.148:5466'
- '10.252.100.119:1920'
- '10.252.100.119:8220'
- '10.252.100.119:5465'
- '10.252.100.119:5466'

腾讯公有云配置

prometheus.yml

global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

#基础资源
- job_name: "drm-tengxungongyouyun"
file_sd_configs:
- files: ['node/txgongyouyun_node.yml']
refresh_interval: 30s

#黑盒监控
- job_name: "drm-txgyy-blackbox_tcp"
metrics_path: "/probe"
params:
module: [tcp_connect]
file_sd_configs:
- files: ['/usr/local/blackbox_exporter/txgongyouyun_bl.yml']
refresh_interval: 30s
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.229.12.148:9115

txgongyouyun_node.yml基础资源列表:

- targets:
- '10.229.12.177:9100'
- '10.229.12.80:9100'
- '10.229.12.167:9100'
- '10.229.12.153:9100'
- '10.229.12.164:9100'
- '10.229.12.172:9100'
- '10.229.12.174:9100'
- '10.229.12.162:9100'
- '10.229.12.67:9100'
- '10.229.12.69:9100'
- '10.229.12.70:9100'
- '10.229.12.77:9100'
- '10.229.12.81:9100'
- '10.229.12.66:9100'
- '10.229.12.163:9100'
- '10.229.12.169:9100'
- '10.229.12.75:9100'
- '10.229.12.72:9100'
- '10.229.12.148:9100'
labels:
drm_server: 10.229.12.148

txgongyouyun_bl.yml黑盒监控列表:

- targets:
- '10.229.12.148:9090'
- '10.229.12.148:9115'
- '10.229.12.164:7848'
- '10.229.12.164:8848'
- '10.229.12.164:9848'
- '10.229.12.164:9849'
- '10.229.12.70:7848'
- '10.229.12.70:8848'
- '10.229.12.70:9848'
- '10.229.12.70:9849'
- '10.229.12.172:7848'
- '10.229.12.172:8848'
- '10.229.12.172:9848'
- '10.229.12.172:9849'
- '10.229.12.77:7848'
- '10.229.12.77:8848'
- '10.229.12.77:9848'
- '10.229.12.77:9849'
- '10.229.12.174:7848'
- '10.229.12.174:8848'
- '10.229.12.174:9848'
- '10.229.12.174:9849'
- '10.229.12.81:7848'
- '10.229.12.81:8848'
- '10.229.12.81:9848'
- '10.229.12.81:9849'
- '10.229.12.167:8719'
- '10.229.12.167:9201'
- '10.229.12.67:8719'
- '10.229.12.67:9201'
- '10.229.12.153:80'
- '10.229.12.153:443'
- '10.229.12.153:8719'
- '10.229.12.153:9102'
- '10.229.12.153:8720'
- '10.229.12.153:9101'
- '10.229.12.69:80'
- '10.229.12.69:443'
- '10.229.12.69:8719'
- '10.229.12.69:9102'
- '10.229.12.69:8720'
- '10.229.12.69:9101'
- '10.229.12.177:8719'
- '10.229.12.177:9401'
- '10.229.12.177:9402'
- '10.229.12.80:8719'
- '10.229.12.80:9401'
- '10.229.12.80:9402'
- '10.229.12.162:8719'
- '10.229.12.162:9304'
- '10.229.12.162:8720'
- '10.229.12.162:9206'
- '10.229.12.66:8719'
- '10.229.12.66:9304'
- '10.229.12.66:8720'
- '10.229.12.66:9206'

腾讯专有云配置

prometheus.yml

global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

#基础资源
- job_name: "drm-zhuanyouyun"
file_sd_configs:
- files: ['node/zyy_node.yml']
refresh_interval: 30s

#黑盒监控
- job_name: "drm-zyy-blackbox_tcp"
metrics_path: "/probe"
params:
module: [tcp_connect]
file_sd_configs:
- files: ['/usr/local/blackbox_exporter/zyy_bl.yml']
refresh_interval: 30s
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.238.19.14:9115

#公有云汇聚
- job_name: "drm-tengxungongyouyun"
scrape_interval: 10s
honor_labels: true
metrics_path: "/federate"
params:
'match[]':
- '{job=~".*"}'
static_configs:
- targets:
- '10.229.12.148:9090'

zyy_node.yml

- targets:
- '10.238.18.81:9100'
- '10.238.18.35:9100'
- '10.238.18.13:9100'
- '10.238.18.20:9100'
- '10.238.19.75:9100'
- '10.238.19.81:9100'
- '10.238.19.121:9100'
- '10.238.19.61:9100'
- '10.238.18.75:9100'
- '10.238.19.29:9100'
- '10.238.18.94:9100'
- '10.238.19.85:9100'
- '10.238.18.5:9100'
- '10.238.19.14:9100'
- '10.238.19.98:9100'
labels:
drm_server: 10.238.19.14

zyy_bl.yml

- targets:
- '10.238.19.14:9090'
- '10.238.19.14:9115'
- '10.238.18.81:1920'
- '10.238.18.81:8220'
- '10.238.18.81:80'
- '10.238.18.81:5465'
- '10.238.18.81:5466'
- '10.238.18.35:1920'
- '10.238.18.35:8220'
- '10.238.18.35:80'
- '10.238.18.35:5465'
- '10.238.18.35:5466'
- '10.238.18.13:1920'
- '10.238.18.13:8220'
- '10.238.18.13:80'
- '10.238.18.13:5465'
- '10.238.18.13:5466'
- '10.238.18.20:1920'
- '10.238.18.20:8220'
- '10.238.18.20:80'
- '10.238.18.20:5465'
- '10.238.18.20:5466'
- '10.238.19.75:1920'
- '10.238.19.75:8220'
- '10.238.19.75:80'
- '10.238.19.75:5465'
- '10.238.19.75:5466'
- '10.238.19.98:1920'
- '10.238.19.98:8220'
- '10.238.19.98:80'
- '10.238.19.98:5465'
- '10.238.19.98:5466'
- '10.238.19.81:1920'
- '10.238.19.81:8220'
- '10.238.19.81:80'
- '10.238.19.81:5465'
- '10.238.19.81:5466'
- '10.238.19.121:1920'
- '10.238.19.121:8220'
- '10.238.19.121:80'
- '10.238.19.121:5465'
- '10.238.19.121:5466'
- '10.238.18.75:80'
- '10.238.19.61:80'
- '10.238.19.85:80'
- '10.238.19.29:80'
- '10.238.18.94:80'
- '10.238.18.5:80'

服务端私有云配置

该配置是私有云最终配置

vim /usr/local/prometheus/prometheus.yml

prometheus.yml

global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.

#基础资源
- job_name: "drm-siyouyun"
file_sd_configs:
- files: ['node/siyouyun_node.yml']
refresh_interval: 30s

#黑盒监控
- job_name: "drm-syy-blackbox_tcp"
metrics_path: "/probe"
params:
module: [tcp_connect]
file_sd_configs:
- files: ['/usr/local/blackbox_exporter/siyouyun_bl.yml']
refresh_interval: 30s

relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 10.152.35.71:9115

#阿里云
- job_name: "drm-aliiyun"
scrape_interval: 10s
honor_labels: true
metrics_path: "/federate"
params:
'match[]':
- '{job=~".*"}'
proxy_url: http://10.152.67.132:80
static_configs:
- targets:
- '10.252.100.168:9090'

#腾讯专有云公有云
- job_name: "drm-zhuanyouyun"
scrape_interval: 10s
honor_labels: true
metrics_path: "/federate"
params:
'match[]':
- '{job=~".*"}'
static_configs:
- targets:
- '10.238.19.14:9090'

siyouyun_node.yml

- targets:
- '10.152.2.65:9100'
- '10.152.2.66:9100'
- '10.152.2.67:9100'
- '10.152.2.68:9100'
- '10.152.2.69:9100'
- '10.152.2.70:9100'
- '10.152.67.129:9100'
- '10.152.67.130:9100'
- '10.152.67.132:9100'
- '10.152.67.133:9100'
- '10.152.35.65:9100'
- '10.152.35.66:9100'
- '10.152.35.68:9100'
- '10.152.35.69:9100'
- '10.152.35.71:9100'
labels:
drm_server: 10.152.35.71

siyouyun_bl.yml

- targets:
- '10.152.2.65:80'
- '10.152.2.65:9090'
- '10.152.2.66:80'
- '10.152.2.66:9090'
- '10.152.2.67:8100'
- '10.152.2.67:80'
- '10.152.2.67:9090'
- '10.152.2.68:8100'
- '10.152.2.68:80'
- '10.152.2.68:9090'
- '10.152.2.69:80'
- '10.152.2.69:9090'
- '10.152.2.70:9090'
- '10.152.35.65:80'
- '10.152.35.65:443'
- '10.152.35.66:80'
- '10.152.35.66:443'
- '10.152.35.68:80'
- '10.152.35.68:443'
- '10.152.35.69:80'
- '10.152.35.69:443'
- '10.152.35.71:9090'
- '10.152.35.71:9115'
- '10.152.67.129:80'
- '10.152.67.129:443'
- '10.152.67.130:80'
- '10.152.67.130:443'
- '10.152.67.132:80'
- '10.152.67.133:80'

跳转机配置

跳转机10.152.67.132

vim /etc/nginx/conf.d/dmz-132-133-out.conf


server {
listen 80;
server_name localhost;

location / {
proxy_next_upstream off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_read_timeout 120s;
proxy_pass http://10.252.100.162;
}
location /federate {
proxy_next_upstream off;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $host;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_read_timeout 120s;
proxy_pass http://10.252.100.168:9090/federate;
}

}

黑盒服务配置

端口监控服务使用black-box

安装black-box

上传安装包blackbox_exporter.tar.gz

在每台的数据监控机器上安装:

阿里云:10.252.100.168

腾讯云:10.238.19.14

私有云:10.152.35.71

安装路径:

/usr/local/blackbox_exporter 或者
/usr/local/prometheus/blackbox_exporter

配置:

cat blackbox.yml

modules:
tcp_connect:
prober: tcp
http_2xx:
prober: http
http:
method: GET
http_post_2xx:
prober: http
http:
method: POST

启动服务配置:

cat /etc/systemd/system/blackbox_exporter.service


# cat /lib/systemd/system/blackbox-exporter.service
[Unit]
Description=Prometheus Blackbox Exporter
After=network.target

[Service]
Type=simple
User=root
Group=root
ExecStart=/usr/local/blackbox_exporter/blackbox_exporter --config.file=/usr/local/blackbox_exporter/blackbox.yml --web.listen-address=:9115
Restart=on-failure

[Install]
WantedBy=multi-user.target

启动服务

systemctl start blackbox-exporter.service