Find Articles in:
All
Business
Reference
Technology
News
Lifestyle

Managing 100,000+ online users is no game: network monitoring tool helps track traffic and locate infrastructure problems - Internet

Communications News, July, 2003

Sony Online Entertainment (SOE), a worldwide leader in the multiplayer online gaming market, boasts a distributed network community of more than 13 million registered users. The SOE Web site hosts games and player communities that span numerous genres. SOE's EverQuest game alone boasts more than 430,000 subscribers and supports from 75,000 to 106,000 online players at any one time.

With EverQuest a profitable revenue-generator, SOE has more than a passing interest in keeping the game up and running 24/7. Given the complexity of the application and of the extensive array of servers, systems software, business-layer applications and network infrastructure that support it, the company depends heavily on automated monitoring and management tools for assistance.

"EverQuest subscribers expect access upon demand," says Adam Joffe, chief technology officer for SOE. "It's critical for our network operations center (NOC) employees to be able to pinpoint developing problems quickly no matter where they're located in the network. To do that means consistently monitoring the application and the infrastructure that's in place to support it.

EverQuest runs all over the world, Joffe explains. SOE has more than 1,000 servers in its headquarters in San Diego, as well as several hundred servers deployed at six sites in Europe and in Asia. There are more than a hundred switches and routers across this environment, as well as a variety of operating systems and servers that support the business-layer functionality of our online service. "We monitor it all," he says.

MONITORING THE INFRASTRUCTURE

SOE uses a custom tool, developed in-house, to monitor the EverQuest application, and Fidelia NetVigil to monitor and manage the infrastructure upon which it depends. This includes a Unix-based service that handles business applications, SOE's Web server infrastructure and all network devices, including switches and routers.

The mix of tools SOE uses today to monitor its environment is greatly streamlined from earlier days, when it was dependent on freeware, homegrown scripts and simple alerting tools. Such tools came with severe limitations.

"There was no consolidated interface, very little reporting, no physical analysis, no graphs, no hierarchical organization, no console--it was all reactionary," Joffe says. "If something went wrong, it would send a page to alert us, but we couldn't do any historical analysis or capacity planning with it. And to build the plethora of applications we needed to test all of our various devices would have been too much work to do ourselves."

Plus, supporting the massive SOE environment using these tools would have been a full-time job for some of SOE's NOC employees. "By adding NetVigil to our toolkit, we are able to manage nearly 11,000 object tests on our distributed IT infrastructure with a fraction of the resources required before," Joffe offers. "The solution provides us with real-time network and infrastructure monitoring, and a consolidated view of the IT environment. It also offers scalability we did not have with the early freeware tools."

The ability to set thresholds on tests is an important consideration, he adds. "With so many servers in our network, not having to dedicate a person to watch the throughput on all of these servers all the time is very valuable to us."

With subscriber traffic, as well as the complexity of SOE's network infrastructure, expected to increase through the addition of new products, the importance of having a tool that can monitor a large number of devices is critical for SOE, according to Joffe.

"As we move forward, this solution will monitor the network infrastructure, including all of the servers that are deployed to operate these new games. It's a core tool to our 24/7 operations group, with an interface that is central to the heads-up display that runs in our NOC," he says.

SERVICES-BASED APPROACH

Isolated revenue streams from new games will present a different challenge for SOE, which will adopt a services-based approach to infrastructure management. This will allow the company to correlate its underlying network applications, servers and infrastructure to its business offerings.

"The way we organize things now, it's based on functionality--from the network devices to the hosts," Joffe explains. "When we bring up the new servers, the first thing we'll do is implement the device hierarchies in NetVigil that will allow us to set up service-based views. Instead of looking at a list of network devices or hosts, I'll be able to look at all of my infrastructures for the various games, and know the health of all the devices that are in play for use in serving each of those games.

"Service is what customers are looking for. Our ability to support that service is key to what our operations staff does. Being able to group and organize devices in such a way that shows how certain events affect a particular service means we're able to focus on that service and make sure it's readily available to the user."

 

BNET TalkbackShare your ideas and expertise on this topic

The following tags are supported in BNET comments:
<b></b> <i></i> <u></u> <pre></pre>

Leave a Reply

  1. You are currently a guest | Login?
advertisement
CIO SessionsVision Series on ZDNet

See and hear what CIOs the world over thinks about the business of technology and how it's changing the way we live and work.

Go
advertisement
  • Click Here
  • Click Here
advertisement

Content provided in partnership with Thompson Gale