When millions of people across the globe depend on your website and mobile applications, pinpointing problems can be as difficult as finding a needle in a haystack. Not so for Care.com, which cut the time needed to resolve most incidents from more than an hour to less than 10 minutes.
It achieved those results through Splunk’s application performance monitoring solution, one of the newest crop of APM and end-user monitoring technologies that can help IT teams locate and solve application problems quickly.
For 15 years, the company has transformed the way families find and manage care for children, seniors, pets and homes, leveraging technology to meet the ever-evolving needs of families around the world. Available in more than 17 countries, the Care.com platform requires a multifaceted IT ecosystem, which has been built over time.
“Care.com has a number of back-end services that support our websites and provide APIs for our mobile apps in the U.S. and in other countries,” says Distinguished DevOps Engineer Matt Coddington. With help from Splunk, “our engineers are able to trace requests through the sometimes complex paths between these back-end microservices and monolithic applications.”
Today’s APM solutions provide observability of a company’s IT infrastructure, including servers, networks, applications, browsers, individual laptops and the application code itself. All these components generate mountains of data, which have been difficult to store and analyze until recently, according to Stephen Elliot, a group vice president at IDC.
“In the past five years, there’s been an explosion of data and types of data like metrics, logs, traces and events,” says Elliot. “All of these different layers of infrastructure and data are driving the need for an end-to-end application service that can bring it all together.”
Click the banner below to receive exclusive industry content when you register as an Insider.
How DevOps Monitors and Diagnoses Problems
At Care.com, Coddington’s team uses a variety of observability tools, including Splunk APM and Splunk Cloud. He says the visibility has been “invaluable” to the DevOps and engineering teams for monitoring and diagnosing problems.
“Once an application is instrumented, it starts to send both traces and metrics to the Splunk platform,” he says. “Our engineering teams use the metrics to monitor for problems and then utilize a combination of metrics, traces and logs to debug those issues. Splunk allows for alerts to be set up against those metrics so that teams are notified when thresholds are exceeded.”
Having a dependable observability platform also helped Care.com make the decision to introduce microservice architectures to its organization.
“We wouldn’t have embarked on our migration to the more complex microservices architecture without the distributed tracing and APM metrics available to us,” Coddington says. “An APM solution becomes a lot more critical to an engineering organization when microservice architectures are introduced, given the added complexity of troubleshooting.”
FIND OUT: How technology leaders are improving IT infrastructure.
Seamless Solutions in a Competitive Market
Like Care.com, Charter Communications depends on its monitoring solutions to find and resolve issues quickly. The cable and broadband company serves 32 million customers through its Spectrum brand. In a competitive market, delivering a flawless product is imperative.
“Charter develops and maintains many applications to provide our customers and employees with a great experience,” says Jeff Gutterman, group vice president of IT enterprise infrastructure at Charter. “For customers, this needs to be from the time of purchase through the consistent delivery of our products and services.”
Charter uses Cisco’s AppDynamics to detect just about everything needed for an application environment, including hardware and application monitoring. Within these systems, AppDynamics measures metrics like transaction times, response times, load times and throughput.
We wouldn’t have embarked on our migration to the more complex microservices architecture without the distributed tracing and APM metrics available to us.”
Distinguished DevOps Engineer, Care.com
Like most APM tools, AppDyamics presents this information in a single interface. When an event is detected, Charter’s 24/7 incident management, surveillance and application support teams use the information to begin an investigation.
“We use a playbook similar to what firefighters do,” he says. “When an event is triggered, an incident commander is assigned and takes command. That person creates a ticket for the start time and tracks milestones as the incident continues. The incident commander will pull in other teams to assist, depending on what the telemetry is indicating as the suspected root cause.”
The teams at Charter also conduct a post-incident analysis. “After the incident is resolved, the reason-for-outage investigation begins,” Gutterman explains. “This is where we review the incident ticket logs, telemetry, application, network-related logging and advanced telemetry tools to determine what exactly happened and what we need to do to ensure that it does not happen again. During this time, we also review how we responded to the incident and identify if any improvements are needed.”
Charter’s incident resolution teams have also started using artificial intelligence in the mitigation process. The tools and algorithms look for patterns and anomalies in alarms the teams receive that can help identify issues earlier or pinpoint an issue’s root cause. Since Charter has been using AppDynamics, its mean time to repair (a critical metric for Gutterman’s team) has been cut in half.
“Tools like AppDynamics do a good job of identifying the endpoints automatically and creating average response times for each action,” he says. “If an action or response time falls out of the average range, it can trigger alerts to our surveillance teams to take a closer look.”
The 2020 value of the worldwide APM market
Source: IDC, "Worldwide Application Performance Management Software Forecast, 20211–2025: Market Pivots to Observability," November 202
Building Customer Confidence Throughout the Lifecycle
Duck Creek Technologies is a back-end Software as a Service platform for the insurance industry, developing products to serve the entire insurance lifecycle, from sales support to policy design and customer service.
Microsoft Azure’s Application Insights has been key to continuously creating, improving and releasing new products. “The ability to see how our apps interact with the cloud is the biggest benefit,” says Quinn Easterbrook, Duck Creek’s chief enterprise architect. “Before, there was a gray box between transactions. Now, you can tie them together and see the underlying processing.”
That observability means that Duck Creek can ensure the applications are performing as effectively as possible. The company also shares this information with its clients. “The more insights and confidence we can give to our customers, the better,” Easterbrook says.
APM technology has allowed Duck Creek to continue to evolve its architecture to meet its mission of revolutionizing the insurance industry with technology. “Just moving to the cloud isn’t enough,” says Easterbrook. “Working in the cloud is the most important thing.”
DISCOVER: Why a modern network architecture supports digital transformation.
Michael Austin/Theispot (illustration); 4x6/Getty Images