Bumping alertmanager to 0.17.0Problem:
alertmanager 0.16.0 unable to send email notification without SMTP authentication
Solution:
upgrade to alertmanger 0.17.0 which solve this problem
Issue:
https://github.com/rancher/rancher/issues/20060
Remove nginx proxy_buffer from Monitoring**Problem:**
Buffering proxy data causes client to receive slow
**Solution:**
Disable buffer via `proxy_buffering off;` to response client directly
**Issue:**
https://github.com/rancher/rancher/issues/19689
Keep the operator charts version to 0.0.2**Problem:**
Upgrade the operator would have errors because we change the version of
the operator charts.
**Solution:**
Keep the old version 0.0.2 the operator used before.
Fix error expression for container resources query**Problem:**
There are two kubelet scraping targets on Prometheus, one is scraping `/metrics`, another one is scraping `/metrics/cadvisor`.
The metrics from `/metrics` endpoint are not including `container_name`.
So the `container_*` expression will double the actual mount without
`conatiner_name!=""`
**Solution:**
Add `container_name!=""` into the expression
**Issue:**
https://github.com/ran...
Fix error expression for fluentd query**Problem:**
The `Flunetd` pane from `Rancher Components` dashboard cannot show
the right counting of fluentd Pods
**Solution:**
Change `sum(kube_pod_info{pod=~"fluentd.*"})` to `sum(kube_pod_info{pod=~".*fluentd.*",pod!~".*aggregator.*"})`
**Issue:**
https://github.com/rancher/rancher/issues/19722
The node exporter should be listening 0.0.0.0 instead of pod ipFix the issue that node exporter crash when deploying
into the node without internal ip.
Upgrade to 0.0.3 version- Embed operator as sub charts
+ Support to configure operator like other charts
+ Adjust operator default limit
- Add permission to kube-state exporter
- Replace localhost by 127.0.0.1 on prometheus-auth
- Increase Nginx proxy buffers
- Configure PVC name of Prometheus or Alertmanager
+ Allow to configure PVC name of Prometheus or Alertmanager via `prometheus.persistence.name` or `...
Start prometheus proxy nginx in process 1**Problem:**
When we start nginx in our start-up script, the nginx process would
become the child process of start-up script process and not process 1.
In this case, the kill signal from kubelet/docker will be sent to
start-up script instead of nginx so the nginx process won't stop after
kill.
**Solution:**
Change the proxy command and let nginx start at process 1.
Use repository for image name key in templatesIn system-charts, we need to use the `repository` and `tag` to define
container's image name. After that, we can collect them together and
provide an images list we need for system charts.
upgrade fluentd image versionproblem:
before fluentd 1.3.1 version can't support add client cert for fluentd output
Solution:
upgrade fluentd to 1.3.3, but the related kafka gem also upgrade small
version, tested fluentd and kafka after upgrade version
Issue:
https://github.com/rancher/rancher/issues/18396
Changes done to upstream chart.- Adding checksum over secrets to ensure change in secrets upgrades deployment
- Using rancher image for ensuring airgap case works too
- Adding nodeSelector to ensure the workloads never schedule to the Windows node
- Adding resource limits
- Add private image registry for airgap case
Copy of Upstream Helm chart for external-dnsWe will be keeping up with the upstream chart
https://github.com/helm/charts/tree/master/stable/external-dns
Consist label in service monitor and logging charts**Problem:**
Enable logging and monitoring in `rancher/rancher:master`, but can't see fluentd metric
**Solution:**
Consist label and endpoint name in `system-chart/rancher-monitoring:v0.0.2``
**Issue:**
https://github.com/rancher/rancher/issues/18327
**Patch:**
https://github.com/rancher/system-charts/pull/17
Should consist label in service monitor and logging chartsProblem:
enable logging and moinitoring but can't see fluentd metric
Solution:
consist label and endpoint name
Issue:
https://github.com/rancher/rancher/issues/18327
Support choosing Prometheus sync mode between federate and remote**Problem:**
- Remote reader mode only allow `project-level` Prometheus to share the
metrics from `cluster-level` Prometheus
- Remote reader mode cannot save the namespace-related metrics from
`cluster-level` Prometheus
**Solution:**
- Add `prometheus.sync.mode` to choose
- Add a "federate" scrape job when deploying federation mode
**Issue:**
https://github.com/rancher/rancher/issues/17390
Support random serviceSelectorLabels label name**Problem:**
Cannot input like `x.y.z/k` label name into serviceSelectorLabels
**Solution:**
Use array instead of object as values
Support random nodeSelector label name**Problem:**
Cannot input like `x.y.z/k` label name into nodeSelector
**Solution:**
Use array instead of object as values
**Issue:**
https://github.com/rancher/rancher/issues/17340
Add ability to use private image registry when deploying monitoring toolsproblem:
We can not deploy monitoring tools in an
air gap environment.
Solution:
Add the ability to use the private image registry when deploying
monitoring tools
Issue:
https://github.com/rancher/rancher/issues/17842
Deploy Prometheus into Project level**Problem:**
- Previous charts cannot satisfy the project level monitoring deploying design
- Grafana cannot be restarted after password changed
- node-exporter cannot be scheduled to `controlpane` or `etcd` role nodes
- Prometheus cannot be started with PVC that provided by some storage provisioner which don't respect the `SecurityContext`
**Solution:**
- Deploy "project level" monitoring wit...
Add ability to use private image registry when deploying logging toolsproblem:
After we refactored the logging, we can not deploy logging tools in an
air gap environment.
Solution:
Add the ability to use the private image registry when deploying logging
tools
Issue:
https://github.com/rancher/rancher/issues/17568
Fix Grafana PV can't be mounted (Patch)**Problem:**
Grafana still can't mount PV correctly in Cloud, because there are 2
`securityContext` and `fsGroup` only work on Pod spec.
**Solution:**
Change the `securityContext` to Pod spec.
**Issue:**
https://github.com/rancher/rancher/issues/16953
Add Rancher-Monitoring Chart (+) Only use for Rancher 2.0 Monitoring and Alerting
(+) Support Grafana to proxy with authorization bearer token to
Prometheus-Auth agent
(+) Support Prometheus web to proxy with authorization bearer token
to Prometheus-Auth agent
(+) Rich metrics for Kubernetes and Rancher
Co-authored-by: aiwantaozi <michelia.feng@gmail.com>
Co-authored-by: ora...