Introduction
In our next part of this blog series we are going to talk about proactive monitoring and maintenance tasks.
For those who have missed the previous posts, here are the links:
- Part 1: Management Packs
- Part 2: Pre-work & Importing Management Packs
- Part 3: Exploring discoveries
- Part 4: Fine-tuning methodologies
In many cases actions are taken when the environment already has an issue and end-users are suffering from an outage. However, System Center Operations Manager and more specifically our Management Pack is written around pro-active monitoring. Instead of the traditional fire-fighting that happens in many environments, you can use the different MPs and their views and reports to monitor pro-actively and solve issues before they actually happen.
Maintenance tasks
Proactive monitoring can be as easy as repeating certain tasks on a daily, weekly or monthly basis. No, it won’t protect you from sudden drops in performance or sudden crashes, but if you perform this well, you will manage to solve quite some incidents before they actually happen.
In this example we jump back to the Veeam MP. In those management packs, there are quite some views that you can investigate on a daily basis. Let’s look at one of the possibilities
Daily Tasks
In the Veeam MP there is a view called Top Hosts.
This view can be used to see quickly what you hosts are doing on 4 important performance counters. As you can see, at this moment everything is green (good) but when there would be another color like yellow or red, you would be able to drill-through and search for potential upcoming problems.
Many management packs have such views that you can use to look at every day. Another real-life example. One of the big issues I once had in a certain company that many engineers got called out of their bed because there was a sudden alert stating that a disk was running full on a server. Of course I wanted one of my engineers to be notified of this, but after a couple of weeks in service I wanted to know why we never “saw it coming”…
It didn’t took me long to figure out what really happened. The monitor for hard drives (in the base OS management pack) had a warning and critical state. Because during the day everybody was busy fixing the critical issues, nobody watched the warning ones. This meant that when the free space dropped from warning to critical, the notification came (mostly at night because of certain processes that ran at that time).
Solution, I build a custom dashboard (at that time rather primitive) and forced a daily task on my engineers too look at that dashboard every day. If they saw a warning, it needed to be fixed at once before it could become critical. When now somebody got called because of a disk running out of disk space issue, the responsible engineer had to come explain how he or she missed it the day before (I know, there are sudden drops but that will always happen…)
Weekly / Monthly tasks
Some dashboards can be looked at daily, others you might look at weekly or even monthly. But besides dashboards or views you can also work with reporting. In SCOM you can work with historical data (default a year worth of data) and perform trend analysis, capacity planning and forecasting. Some management packs have that type of reporting out of the box, for other management packs you will need to build them yourself.
This report (for example) can be reviewed every Friday to view what our storage on our virtualization platforms is doing. This should give us insight whether we are suffering on IOPS, latency, free space or others way before it will actually downgrades the environment.
Conclusion
By simply using the standard views and quickly reviewing them on a daily, weekly or monthly basis you can avoid problems before they actually “cause” problems.
Update: