“I have become a little lazy,” confesses Vlad Karpel, vice president of IT at options Xpress, a Chicago-based online brokerage. “I try to convince myself that I can solve these problems myself, but in my heart I don’t think I really believe that.”
Karpel’s apparent crisis of confidence is a welcome one, now that he’s found some troubleshooting help. The problems he needs help with are occasional lags in the performance of his Web site, which handles more than 20,000 options trades a day from 170,000 customers. When optionsXpress launched six years ago, Karpel built the software systems himself, from scratch. Soon he was dealing with a maze of complexity, born of a rapidly growing business — a 39 percent revenue increase from 2004 to 2005 — and the addition of dozens of third-party applications.
“As we grew, we noticed load and performance issues,” says Karpel. “The application environment is very complicated, and it was almost impossible to reproduce. We needed some software that could help with that.”
Karpel is not alone. Web application downtime is the bugaboo of many small businesses that increasingly rely on online transactions for their livelihood. And with the complexity of software environments growing faster than most IT staffs, companies need solutions that go beyond simple network monitoring.
“Application management is a lot more sophisticated than network monitoring because it gives you some root-cause analysis,” says J.P. Farbini, an analyst at Forrester Research, who says the market for application monitoring software is about $300 million and growing at about 5 percent per year. “People need this level of monitoring, and it has taken off very quickly.”
Today Karpel uses monitoring tools to trace the source of mysterious slowdowns and — heaven forbid — crashes. Rather than recreating the entire application environment to locate the point of failure — a process that could take weeks — options Xpress now uses the log of transactions the software provides. Karpel loads an agent onto each server. Once a problem occurs, he simply pulls the log on the affected server. “It tells you every piece of data that is being exchanged, in real time,” says Karpel. The data is segmented into different profiles (database calls, trade executions, etc.) to further narrow the search. And Karpel can get notification of a problem sent to his e-mail or mobile phone.
All this sleuthing arms Karpel with valuable knowledge of the specific nature of a problem before he calls a third-party software provider and demands support. In the past, Karpel would spend days discovering which application was failing, and even more time convincing the software vendor to take responsibility for the problem. “Having that information is crucial,” says Karpel, who says he spent countless hours on “Is it us or is it you?” calls to tech support. “They take you more seriously if you sound intelligent.”
For optionsXpress, Web downtime translates into lost revenues. For the Nemours Center for Children’s Health Media in Wilmington, Del., the consequences of a crash can be even more dire. As part of a nonprofit children’s health network, the center serves up timely and relevant health information on its Web site, www.kidshealth.org, and to hospitals, HMOs and pharmaceutical companies that license the content. In total, the site racks up 180 million page views per year.
To keep the site running optimally, P.J. Gorenc, the director of operations, uses a combination of homegrown scripts and off-the-shelf tools. The combo is effective at measuring transaction times, but when a problem arises, Gorenc still finds himself sifting through code by hand — a task that puts a serious strain on his IT staff of six.
“We’re constantly checking for response and availability, and we’ve got agents loaded on every server,” says Gorenc. “It helps us find the location [of a problem], but after that, it’s a pretty manual process. We’d like to be even more automated.”
Finding the location is just the first step in the process, though. Recently, Gorenc was alerted that one of the partner sites was experiencing lag times. The monitoring software located the problem. He then set up a trace and measured every step of the individual transactions. He found that in a transaction totaling two seconds, there were several steps that took 0.2 seconds each, then one Structured Query Language (SQL) statement that was taking a full second and a half.
Upon further examination, he found that the statement was loading twice, slowing down the process. Once he isolated the problem, he was able to restructure the database table to speed up the transaction. It was relatively easy to fix — it only took an hour — but finding the cause was the tricky bit. “Just to look at the Web logs, we wouldn’t have been able to notice the problem,” he says. “If we had manually tried to find it, I don’t even know how long it would have taken.”
The end goal of all the monitoring tools and procedures is to avoid having customers point out a problem. Yet, often it still comes down to that. Dave Christensen, chief operating officer at CustomInk, an online supplier of customized T-shirts, hats and other merchandise, uses all kinds of site-monitoring software. But still, problems sometimes slip through undetected.
“We do segmentation reports, log everything and tag the whole site,” says Christensen. “But with more subtle problems, we’ll listen to the reps who are answering the phones. We still want to listen to what our customers are telling us. You have to have your ears open.”
Though application monitoring tools are no substitute for listening to customers, they may soon offer more help to IT managers trying to keep their Web sites at top performance. Recently, many of the smaller software vendors that specialize in application monitoring have been swallowed up by bigger companies looking to incorporate those tools to their products. Computer Associates recently snapped up Wily Technology to include its software in its Unicenter suite. Borland is in the process of buying Segue Software and its Application Lifecycle Management suite. And IBM has added several new components to its Tivoli Monitoring Express product.
There may be numerous ways to speed up Web sites, but there is one thing that most companies can agree on: There is no such thing as fast enough.