When talk around the water cooler turns to the latest and greatest technology trends, network monitoring and management usually don’t come up. Yet these venerable back-end IT activities play a critical role in keeping data flowing freely through an organization's communication pipeline.
Meanwhile, behind the scenes, demands on the network have grown more complicated. Convergence, the consumerization of IT, virtualized data centers and cloud computing create a variety of new service demands that require new approaches to maintaining the network performance users need to do their jobs effectively.
Fortunately, there’s help. Makers of network monitoring tools have added capabilities to deliver more stringent Quality of Service (QoS) requirements — as needed for voice and video traffic over data networks — while also addressing the needs of bring your own device (BYOD) policies, cloud computing and other emerging areas. In addition, best practices are emerging about how to use the newest tools effectively.
Mapping out a cohesive network infrastructure is becoming increasingly difficult. A spate of new applications traveling over the network — many with different priorities and requirements — create a variety of bandwidth, infrastructure and security issues — particularly as systems become more intertwined.
Consider just two examples and the impact they have on network operations:
Now, combine these new technology solutions with the traditional management areas that IT shops already monitor to keep their networks humming, and the network administrator’s job becomes more complicated than ever.
“Network management is a multiheaded beast,” says Jim Frey, managing research director for the consulting firm Enterprise Management Associates.
Network management veterans say network administrators must adopt six factors if they hope to build a solid tool set and implementation strategy for monitoring and managing today’s mission-critical networks.
Why are the latest IT initiatives so challenging for network monitoring efforts?
One reason is they make it difficult to nail down performance levels at a particular time. For example, traditional network monitoring techniques that only use probes to gather performance data start to break down in virtualized and cloud environments. In those settings, IP addresses change constantly; an address used one week by an application may be assigned to a completely different app the next week.
“You are looking at moving targets, and suddenly you are not able to map traffic easily,” says Lori MacVittie, senior technical marketing manager with F5 Networks, a provider of network monitoring and management solutions.
Another monitoring nuance created by cloud and virtualized computing is that administrators can no longer just focus on traffic going into and out of the data center.
“They have a lot more traffic that’s just flowing between servers or even on the same server between virtual machines,” she explains. “That’s not something that would naturally be detected by traditional monitoring solutions, because they watch events out on the wire at a switch or router. Suddenly, you have all of these variables, and you are trying to figure out where the traffic is coming from and how you can manage it. The first challenge is determining where the problems are and what is creating them.”
The process begins before any problems arise. Network administrators should establish a baseline for their network performance: how the pipes operate under normal conditions when bottlenecks and other types of breakdowns are not evident.
“If you don’t know what normal is, it’s very difficult to understand what abnormal is,” says Don Rumford, consulting systems engineer with Avaya, a provider of unified communications, contact centers, networking and related services.
Baselines should document bandwidth usage, the kind of traffic normally flowing over the communication links and the number of people who use the infrastructure.
“You now have something that you can use to compare anomalies against,” he says. “So, if it’s 8 o’clock in the morning, and everyone is logging on to the network, you expect to see a peak in traffic. But if it’s in the middle of the day, and you see a spike in traffic, you need to look into it. We have found that most networks are properly designed, but that administrators can’t really predict how people will use the network — like when someone sends an e-mail to everybody about a cool video on YouTube. Those kinds of things add disruption.”
Baselines also make it possible to establish alert thresholds that warn administrators when network utilization rates, packet losses or other factors begin to rise significantly.
What data points are most important to measure?
Network administrators should track a mix of traditional metrics, such as utilization levels, NetFlow statistics and packet losses. In addition, because voice and video traffic have become more common, the IT staff will also need to home in on data detailing latency and jitter. This monitoring will let the team determine whether even subsecond delays are contributing to unacceptable outcomes for users.
Enterprise technology managers also should look at application performance to gauge overall performance problems.
“There’s high value in having some sense of the user experience and how applications and services are performing,” Frey says. “I’m a tireless advocate for having networking practitioners increase their awareness of the health of the applications, and one component of that is the user experience. There’s nothing more important in the long run than having a direct understanding of how user experience relates back to the application.”
On the data side, the easiest way to measure user experience is to document response times, says Eric Bear, director of managed services at Visual Network Systems, which offers solutions for managing application, network and Voice-over-IP (VoIP) performance.
For example, network managers might measure how quickly a website performs when a customer tries to buy a product or service. “Look at each web server’s response time and the Domain Name System [DNS] resolve time for the website,” he explains. “Determine how long it takes to download images and paint them on the screen. Then, when a request comes through to a back-end database server, measure the overall transaction to see if there is a problem and figure out what’s contributing to a potentially poor response time.”
In particular, enterprises that rely on VoIP and video calls will want to gain a user perspective on network performance, because jitter and packet loss can easily wreak havoc with this data. Mean opinion scores, or MOS, can help formulate a picture of video and VoIP performance. To gather MOS data, network managers can place a probe somewhere on the network to measure video and voice streams and quantify traffic performance.
“Some VoIP phones can provide scores of the calls as they are taking place, and there are tools that collect this information at both ends of a connection,” Bear says.
Veterans of network monitoring also point to proactive testing as an important technique for avoiding performance problems. The strategy hinges on finding potential bottlenecks before they significantly affect users.
“If only network managers could know about an underlying problem before that 8 a.m. spike in traffic, they could proactively avoid the impact on service to their staff,” says Jeffrey Buddington, an Avaya consulting sales engineer.
Again, because emerging IT initiatives such as unified communications and VoIP are so vulnerable to quality of service hiccups, these areas benefit from proactive testing.
One way to proactively test network performance levels is by configuring VoIP phones to call each other for a test. By sending actual voice traffic over the communication links, the network team can accurately measure latency, jitter, packet loss and MOS.
“This [approach] would gauge the quality that users would experience if they were actually talking on those particular phones,” Buddington says.
Because users are not actually talking on the phones, the tests are known as “synthetic transactions.” Nevertheless, the transactions can measure all aspects of call quality for key network resources and alert managers to problems before someone picks up a phone to make a call.
Devices that support the Simple Network Management Protocol (SNMP) remain a cornerstone of network monitoring, especially because makers have developed tools over time to collect a wider range of data about network status. NetFlow analyzers, now common in most routers and switches, provide visibility into who is using the network and the amount of bandwidth consumed.
A combination of tools for active and passive performance testing offers another important resource for determining ongoing network status. With active testing, network administrators use software-like agents that mimic the activities of actual users. For example, the agents can log in and initiate transactions with enterprise applications.
They can send alerts when a transaction is out of bounds, based on established baselines. Administrators can schedule these events to occur at specified times, every hour on the hour or on particular days of the year. The value? These tests let network administrators create trend reports.
“You look for peak-hour times to see when you might need to raise your network capacity, for instance,” Bear says.
With passive network monitoring, administrators place a data-collection probe along a communication path to capture information about the transactions taking place. “You can monitor single points in the network to watch the round-trip results for transactions, requests for information, retransmissions and other activities,” Bear says. “This gives an overall set of end-user response-time measurements."
He concedes that there are strengths and weaknesses associated with active and passive testing. “Let’s suppose someone is using a software as a service application from home or at an airport. The transactions are going across the Internet and not accessing the enterprise network whatsoever,” Bear explains. “In that case, it’s difficult, if not impossible, to put a probe in place to passively capture information. In that scenario, you need active testing. It’s almost the only way to determine if the service is up and running at the level it’s supposed to deliver.”
By contrast, Bear describes a scenario in which an enterprise relies on a private cloud. In a shared networking environment, potential performance conflicts could arise between the traffic going to and from the private cloud and with VoIP transmissions.
“In that case, you want to study your critical links and have a view of all the different applications that are being accessed to understand the interactions among them,” he says. “In this type of scenario, passive testing is the only way to go, because with the active test, you lose the ability to drill down and to see interaction between other traffic. You are isolated to knowing only the performance of the single application that you are testing.”
The ultimate answer is to deploy tools for both active and passive testing and use the combined output to access the total health of the network, Bear says. “That will give you the ability to see each application overall and drill down and troubleshoot as needed.”
Finally, he recommends deploying tools that are compliant with FCAPS, an International Organization for Standardization (ISO) network management model that sets tolerance levels for fault, configuration, administration, performance and security components. For example, FCAPS tools can prioritize performance alerts according to severity and create reports using data collected over time from across the network to track usage and bottleneck trends.
Gathering a lot of data won’t improve network performance. Administrators need tools to help them slice and dice the data and thus determine the most effective responses.
“Finding problems in a network is like finding a needle in a haystack,” Buddington says. “Managers need an automated way for the network monitors to make them aware of problems.” A root cause analysis engine provides that automation by using artificial intelligence–based software to suggest responses.
MacVittie advises that data centers also adopt tools that can aggregate performance information and display summaries in an electronic dashboard and additionally convert the data into report-friendly formats.
Plus, the analysis tool must support the ability to drill down into the infrastructure, she says. “Look at the application level, because that’s generally what will cause users to complain first: ‘This application is running too slow,’” MacVittie explains. “But you have to be able to drill down from an application perspective into the network to see if there’s a problem with a misconfigured switch, for example, or a port is going bad and dropping packets.”
Network managers face a fundamental question when they evaluate any monitoring solution: Is it better to buy a comprehensive tool suite from a single vendor or to integrate best-of-breed tools from multiple companies?
“We’ve gone full circle on that question a couple of times,” Bear says. “Now, I think there’s a general belief that no one vendor is the best at every aspect of network monitoring. So organizations will have to go with some mixture of best-of-breed solutions.”
No matter which side of this question a network administrator comes down on, the key will be for all of the tools to connect to a central data-collection platform so that managers can gain a common view of network health.
“I advocate for integrated management, where you don’t set up separate operations and practices for each technology,” Frey notes.
Tool integration is particularly important when organizations embrace new initiatives, because network administrators often assume that they’ll need a new set of monitoring tools as well, Frey says.
“You may have some new tools for the detailed troubleshooting, configuration and administration of a specific technology,” he adds. “That’s fine for the initial assessment and pilot phase. But once you move to production, the operations and planning functions need to be integrated into” the big-picture network monitoring platform.
Integration of network performance data is also important. A shared information database, also known as a performance management database, enables organizations to create internal best practices for resolving problems quickly.
“This is an important resource for organizations, particularly more complex ones with large IT teams. Such a database means that everybody has access to the same information,” Bear adds. “It gives everyone a common starting point when problems arise and reduces finger-pointing, because everyone is looking at the same collection of data.”
By adopting these tactics, network managers should be able to instill best practices that will help them scale and grow their networks as their organizations expand. But even after they develop a monitoring strategy and pull together a portfolio of solutions to help them carry it out, they will need to be vigilant.
“Organizations must constantly adapt — to acquisitions, to new applications and to new users,” Rumford points out. “Therefore, networks have to adapt over time, too.”