How Do You Monitor a Hosted Application?
With the rise of Hosted Applications there is an increasing need to understand how these applications are being delivered to the users. Many hosted services are sold on the basis that the supplier has a shiny data centre, full of shiny new switches and ESX hosts, upon which your services are going to run beautifully. The problems begin with this assumption that nothing will go wrong now and you are in the hands of experts rather than you’re over worked, under funded IT team. The reality is proving rather different; the same issues you had before usually reappear but now there are third parties to deal with and more places to shift the blame. There are ways of approaching this and the first thing to say is don’t assume the hosted supplier has any better information better than you. Typically they will use SNMP style tools to check memory, CPU and processes, if those are OK they may report it’s not their problem and many customers stumble at this point. Facing your WAN supplier can be a similar result when queried. Often simple utilisation style statistics are offered claiming that this is proof that everything is fine. Again don’t be fooled by that either. In reality all these suppliers are handling multiple customers across the same infrastructure and trying to make the economics of shared investment work in their favour. They are in it to make money, hence will have the minimum they need to survive their SLA, just the same as your IT department did before things were hosted. They will also only care about their bit, so complete delivery of the service to the end user is still your problem, even if you don’t own the majority of the service delivery any more. Here is our guide to approaching this ………
1. Split it into three
- a. The hosted data centre
- b. The WAN carrier
- c. Your LAN network
2. Have some way of measuring delivery times
This is the only thing the users care about. Someone in a DC saying their CPU and memory levels are fine is not enough. Other factors affecting delivery times are shared resources, jitter, latency, application turns ratios, DB look up, number of threads etc etc. Very few solutions look at all these factors and for good reason, its complicated stuff and if for 99% of the time it works, people have better things to do. The most important thing here is that the page/data turns up consistently within an acceptable time to the users location, so measure that first and worry about all the details later.
3. Don’t assume suppliers have much more than utilisation statistics for their service
We have walked into a number of DataCentres hosting hundreds of different peoples’ applications and it’s often monitored on freeware polling every few minutes and not much more. If you think you are being fobbed off, ask for delivery time data or the TCP error logs and see if they can give you anything more. TCP is the carrier mechanism here, so anything over 2% errors means it is struggling to get the data through. There are a number of TCP error types and they can give good clues as to where the problem is.
4. Look for devices being shared by a number of different resources
When something is running slow people look for a busy interface. With devices such as ESX and Firewalls/Routers being shared by large numbers of different services, the problems often lie in the internal resources of these devices even though all the interfaces seem ok. Often just pinging a local device and seeing how consistently it replies can be a good clue to how stable it is.
Things that might help
- Observer Live – simple probe solution that targets hosts pages and measures how long it take to get them back. Contains first level diagnostics as well
- Observer Apex – TCP response time analyser for looking at the conversations and splitting out the delays
- PathView – Measures the path between the users and the hosted application and points out which hops are loosing performance.
- AppView Web - See detailed timings on destination webpages, broken down into the contributing elements.