Introduction
In part 1 of these series we talked about MPs and in part 2 we did our preparations, imported an MP and created an override MP. In part 3 we explored the discoveries.
Now that we understand the discoveries, we can start by looking at various other parts of the management packs. One of them will become very crucial and that is overriding monitors and rules. Overriding doesn’t mean disabling or enabling alone. It can also mean adapting the default thresholds so that they match your infrastructure better.
There are different ways of doing this or starting with this. I have seen different people being successful in the fine-tuning process each using a different method. It depends on how you prefer to work and what fits into the workflow / methodology of your company.
The ‘I’ll fine-tune when it comes’ methodology
Not really my favorite one, but proven to be very useful in many environments I have worked in. After importing the management pack we wait until alarms are showing up. At that time, we investigate if this is a true alarm or the alarm isn’t adapted to our environment. For example, after importing the Veeam Management Pack for System Center we might see alarms based on snapshots of our VMs. After discussion with the teams involved we decide that the default thresholds are not correct for our environment. Therefore we decide to adapt the thresholds to our specific needs. You will see the procedure later
Fine-tuning by reviewing the monitors first
This is my preferred way of working. From the moment we have imported the management pack we review all the different monitors and rules and fine tune them immediately. When you do this, you have again 2 methods. Some management packs have all the information necessary in the documentation, others haven’t. In the end, it isn’t that difficult of running through the rules based on the specific management packs and fine-tuning them. More difficult is making the decisions. Not only will you have to decide on severity, priority, thresholds and sometimes additional parameters, but you should also decide if your override is for the entire environment or for a select group of objects.
So let’s take one monitor as an example. The monitor is Veeam HyperV: VM Checkpoint Analysis and has the following configuration (as you can find in the manual)
Target | Default Threshold | Description |
Hyper-V VM | Checkpoint AgeHours > 48
Checkpoints SizeMB>2048 State always = Warning. Priority always = Low (overridable) |
This monitor tracks threshold breaches for the following checkpoint properties:
· Checkpoint Age Hours – the age of the oldest checkpoint in hours · Checkpoint Size MB – total size of all checkpoints in MB |
Whenever there is a checkpoint on a VM that is older than 48 hours or larger than 2 GB it will trigger a warning with a low priority. In my specific situation, after agreeing with the virtualization team, I want to change this to an alert with high priority for my production VMs but I want it as information with a low priority only for my developer VMs. I also want to increase the hours to 168 and the size to 10 GB for that environment.
Now I build 2 dynamic groups, one for my production VMs, one for my developer VMs and put my VMs in the correct group. Either manually or dynamically based on parameters. For more information on dynamic groups visit: http://technet.microsoft.com/en-us/library/hh298605.aspx
After that, I can select the override and make two overrides on it. One for the production group and one for the development group. Make sure you save your overrides in your previously created override management pack.
I can either override the alert from the alerts folder if that alert already exist, or go to Authoring -> Monitors and search it there.
By right-clicking on that monitor I can select Overrides -> Override the Monitor -> For a group…
Conclusion
What kind of method you prefer to use is up to you and should match the requirements of your specific environment and methodology used in your company. However it is very important that you perform this task and that you don’t start disabling monitors by default when you assume they are false positives. Only a clear and agreed on framework / methodology to do this will make sure that you won’t have issues afterwards. And finally, make sure management is convinced that it needs to invest the time and effort in this matter. They have invested in the System Center suite and it would be a shame if that investment got lost because of stopping further investments.
Update: