The simplest example you can relate to when it comes to monitoring is your car dashboard. A car dashboard reveals various interesting/essential details to the driver. Some of the common things include
Your car dashboard is not just an accessory like having leather seats. It’s one of the core components of the car similar to your engine and tyres. The consequences of not having a dashboard is fatal, it provides you with some important notifications. Example: If your fuel level is running low, you are conscious you need to stop in your next fuel station. It’s impossible for a driver to perform all these checks every time before he starts the car or keep an eye while driving, without the monitoring and notification system.
In a middleware system like BizTalk server, monitoring is identical to having a car dashboard. BizTalk server may be running with tons of core components like messaging engine, orchestration engine, throttling, etc but without having monitoring in place your system is prone to accidents. It’s all about how you can capture the event well in advance and react to it.
Monitoring is all about 3 key items Events, Threshold and Notification. In the car example, running out of fuel is an event, your car manufacturer might configure less than 50 kms as threshold to alert user, and your dashboard becomes a notification platform.
In a technical example, it would be: running out of disk space is an event, 20% free space available could be the threshold, once the free space goes below 20% then send notification in the form of email/sms can be triggered.
The important challenge is configuring the correct threshold values so that you are not creating too much noise (alerts) or too little so event gets unnoticed. It’s also important you notify the right person. In the car case you should notify the driver, not the passenger sleeping in the rear seat.
Now let’s take a BizTalk example and see how having proper monitoring in place can save the accident about to happen.
Spool size, i.e. number of messages in the spool table in BizTalk Message box database is one of the key metric in BizTalk environment. Any in-flight messages will have an entry in the spool table. Number of records in this table will fluctuate based on the load in your environment. In an ideal situation, if your environment is not processing anything there shouldn’t be any data in this table. The depth of the spool table should not grow linearly. If in case the table grows non-stop, then there is a problem in your environment.
Note: You can monitor the depth of the spool table by using the performance counter “Message Box:General Counters /Spool size”.
It’s important you monitor the growth of the spool table. In the below picture, it’s important to monitor and notify corresponding users when the spool size depth is at warning level, rather than leaving it to grow till it reaches the threshold error point.
The best monitoring solution is to spot and react to the situation before it gets worst. In the above scenario, by having a proper monitoring solution in place users will start getting notifications when the system enters warning state, where immediate action can mitigate the issue and bring the system to a stable state.
Organisations should take monitoring seriously if they are serious about their business. Having proper monitoring solution in place is like having cops to protect your house and neighborhood. It’s much easier and safe to avoid someone breaking into your house, than trying to find the thief after the incident. Here are the key advantages proper monitoring brings for organizations
When your systems are down, your company credibility is at risk. Once lost it may be difficult to restore customer confidence.
When you have a monitoring system, your IT department can focus on other more important business tasks, than tending to whether your infrastructure or application is down.
Consistently monitoring your systems can reduce your costs and positively impact your bottom line because you will have better insight into how your infrastructure and applications are performing.
The main objective of this article is to highlight the key best practices when it comes to BizTalk monitoring.
Let’s take an example scenario where your BizTalk solution is taking advantage of tracking and business activity monitoring (BAM). For some reason the SSIS task responsible for moving the data from BAM primary import table to BAM Archive table was shut down for some reason (may be user under which its running changed the password). This is going to result in out of disk space at some point in your SQL storage area. The consequence of this scenario could be dreadful. The easy fix in this scenario would be to monitor both the health of the SSIS package execution and the availability of disk space and react to alerts proactively.
It’s important to understand your application service level agreement (SLA) and availability rating. BizTalk server being a middleware product, it’s also important to understand the dependent systems SLA’s. Example: If you are integrating a user facing web application with your SAP back end system both with the SLA of 99.99, then your BizTalk environment availability rating should be 99.999 or higher.
In most of the organizations infrastructure and application monitoring will be handled by completely isolated teams. There will be a dedicated IT team (ex: WinTel) looking after the infrastructure, and there will be separate application support teams.
Typical infrastructure monitoring items will include things like Disks, NT Services, Cluster Availability, Server Availability, SQL Jobs, BizTalk Host Instances availability, performance counters etc.
Application monitoring will involve running your functional business aspects of your BizTalk application, example: Handling suspended instances, checking for failed messages, checking for transaction threshold violation (ex: PO over $1m), etc.
When talking about monitoring windows event log becomes one of the important stores. Most of the critical events that happens in the environment will get logged to the event viewer. Event Log is shared by all the applications and OS itself, so it’s essential to make sure the developers don’t pollute it with unnecessary information.
During development clearly define the event ids for each application. Example: Event Id range 20001 to 20050 for customer application, 30001 to 30020 for dealing application etc. Once you got clear policies around event ids, then you can send monitoring alerts based on event id range to appropriate teams.
Insist developers not to log informational messages like “Message Processed”, “Refreshed Cache” etc., in the event viewer. Instead they can use log files to do that.
Define clear policies on what needs to be treated as warning and what needs to be treated as errors. Rule of thumb will be, if something can’t be processed further like proper exceptions then it should be treated as errors. If something that works, but may not be appropriate configuration, then it should be treated as warning.
When it comes to application monitoring none of the external monitoring solution will be able to provide insight into your business data. With BizTalk applications you should take advantage of the BAM infrastructure provided by BizTalk server.
Once the BAM infrastructure is in place for your application, you can then take advantage of BAM alerting capabilities to alert if the system violates certain thresholds. Example: Purchase order value greater than $5million
Message Box Viewer or MBV for short is the tool from Microsoft PSS. The tool basically contains over 400 rules, Microsoft support people encountered during customer engagements. Example: If your environment is not running on appropriate service packs, or required hot fixes not installed, or too many instances in the message box etc.
Monitoring tools like SCOM, HP Openview etc are very generic and won’t have the same level of knowledge as MBV when it comes to BizTalk. You can configure custom monitoring solution like BizTalk360 to monitor for threshold violation, example: if MBV reports more than 5 critical errors or 10 non critical errors.
Most of the modern day monitoring solutions are capable of monitoring 100’s of things. It’s important you to find the right balance. If you monitor too much, you may end up getting 10’s of alerts each day, eventually resulting in not looking in any of those alerts. That’s pretty much equivalent to not having any monitoring.
It’s important to have policy and clear understanding of who is receiving what. There is no point in diverting an infrastructure related alert to an application support person and vice versa. If the alert is not relevant, that person is going to skip it and not take any action upon. This pretty much defeats the whole point of monitoring and notification.
Your environment is not static, the threshold you set few months ago may not be appropriate today. Example: You might have set a spool depth threshold of 5000 when your system went live, but now you got more applications running in your environment and that value no longer valid. Instead of simply neglecting the alert, take some effort to correct it and assign appropriate value.
The threshold in one environment may not be the same in other. For example, you may have 2 BizTalk production environments, one used for low latency transactions and the other for normal transactions. The threshold values may not be identical in both environments. Maintain appropriate threshold values per environment so that the alerts make sense, rather than simply ignoring them.
Monitoring solutions like SCOM, HP Openview etc are capable of executing some tasks in the event of threshold violation in addition to notifications. Lot of the infrastructure issues could be intermittent ones like temporary loss of connectivity, access denied etc. Sometimes it’s possible to resolve issues by simply restarting a service. Note: Care should be taken when taking this route, so that you don’t lose any sessions. Example: Resetting IIS will result in losing active sessions.
BizTalk server being a middleware product is going to rely on the healthy functioning of all the systems it’s integrating with. Health of an underlying system can directly impact the health of your BizTalk environment. Wherever possible, maintain a minimum level of monitoring for your external LOB systems/applications.
These days the cost of LCD monitors is negligible, most of the monitoring solutions comes with lot of interesting dashboards fully configurable. You should take advantage of it, place few monitors each displaying health of various parts of the system and assign monitoring responsibilities to corresponding team/team member.
In some organizations the work done by monitoring teams goes unnoticed, because there is no business failure and everything works smoothly. The top management tend to spot team/people only in the event of disaster and someone helps to recover the system. Monitoring teams work hard, not to get to that state, so it important to appreciate and reward your monitoring team/people.Download the complete guide as PDF document BizTalk Monitoring Best Practices