Do what the telcos do

Communications News, April, 1998 by Kevin Payne

Trying to set up a service-level agreement (SLA)? Remote pings can give valuable insights into how a network is performing.

Service-level agreements (SLAs) are an increasingly essential part of service management. They allow a provider to document and showcase service achievements and a user to ensure that service levels stay at their expected level.

Considering the highly competitive environment in which they operate, it is hardly surprising that telcos have been among the first to implement SLAs in their organizations. It is instructive to look at some of their experiences in developing SLAs.

MEASUREMENT PROBLEMS

An agreement must be based on an ability to quantify service and to prove whether or not obligations are being met. But finding a method of measuring service has been a significant hurdle. SLA requirements have been refined to two simple questions:

* Can I get there'? (availability)

* How fast can I get there? (response time)

An agreement also requires network paths to be grouped into different classes of service defined by "tiers." A tier is a classification for a particular location. For instance, large backbone sites are classified as a Tier 1 site. A small sales office is considered a Tier 4 site. Tier 2 and Tier 3 lie somewhere between. Network paths are defined by a pair of tiers.

A agreement could state that for all Tier 1 to Tier 2 paths, availability should be more than 99.99% a day, and average response time should be within 50 milliseconds for example. Four tiers result in 10 separate groups of paths, each with their own class of service: Tier 1 - Tier 1, Tier 1 - Tier 2, etc., to Tier 4 - Tier 4. Each class of service has its own SLA.

A SOLUTION

The telcos decided to use remote pings to directly measure response time and availability. The results are used as the basis of their service level measures. The idea is to actually test a network path by sending a packet of data along the path. The round trip response time can be measured, and success of the transmission indicates an available path.

Using remote ping has important advantages. 1) it is universally accepted and supported by any IP device, and 2) by remotely requesting one device to ping another, you are able to test many different paths from many different originating devices.

The remote ping solution must be flexible. Most network analysts would agree that just because a ping is unsuccessful does not mean that a network path is unavailable for connection oriented protocols such as TCP/IP. But if there are 20 consecutive unsuccessful pings, then the path is probably unavailable to connection-oriented protocols as well. This is the formula currently used.

The formula can be modified to change the number of consecutive unsuccessful pings that would define unavailability. The goal is to establish a formula representing the end users' actual experience.

With this method of obtaining a measure of service, the whole network can be completely monitored without the use of probes or other special equipment. It also measures that part of a computing infrastructure that is important to a telco.

A reporting example is shown in Figure 1. The top graph shows the average availability of all paths in the Tier 1 to Tier 2 classification. There are a pair of measures for each day. Each represented by a bar.

[Figure 1 ILLUSTRATION OMITTED]

The bar on the right is the availability measure as stated in the SLA. The bar on the left is the actual result for the day. The graph spans a week. This allows the manager or customer to take a look at the performance of the network for the past week.

The bottom graph shows the availability for each path in the Tier 1 to Tier 2 classification. The availability measures span one day and correspond to the last day on the above graph. If the Tier 1 to Tier 2 classification misses its SLA target, the bottom graph is used to identify which path was at fault. Further analysis can then be done to identify the time of the SLA violation and the cause.

TRADEOFFS

Using an ICMP (Internet Control Message Protocol) ping has disadvantages. It generally is handled as a low priority protocol, so response-time service-level measurements based on ICMP pings will deteriorate before response times for connection-oriented protocols, and the measurements may not accurately reflect customer experiences. The risk is that funds will be spent needlessly to upgrade a network.

If the ping response time measurement is viewed as a leading indicator, however, then problems may be identified before customers experience them.

Another disadvantage is that maintaining the paths involves significant effort. The number of paths that can be defined in a large network can be overwhelming.

To avoid complexity, only those paths which have important end points are included. If a remote site in Salem, Ore., communicates primarily with the Seattle, Wash., regional office, there is no reason to measure service levels to all other regional offices.

COPYRIGHT 1998 Nelson Publishing
COPYRIGHT 2008 Gale, Cengage Learning
 

BNET TalkbackShare your ideas and expertise on this topic

Please add your comment:

  1. You are currently: a Guest |
  2.  

Basic HTML tags that work in comments are: bold (<b></b>), italic (<i></i>), underline (<u></u>), and hyperlink (<a href></a)

advertisement
CXO UnpluggedSmart Business interviews on BNET

See and hear how senior level executives across the Asia Pacific are developing smart business ideas across a variety of sectors. The focus is on the future, and on how businesses need to evolve.

advertisement
  • Click Here
  • Click Here
  • Click Here
  • Click Here
advertisement
Click Here

Content provided in partnership with Thompson Gale