Pandora Media has a data management challenge — a huge one. And it’s solving it in the cloud.
Casual fans of the music and entertainment streaming service Pandora know it as a great way to access their favorite songs and other content while discovering new artists. But behind the scenes, at Pandora’s headquarters in Oakland, Calif., is a massive data cluster running in the company’s data center.
Most companies store data. Many are beginning to use insights from their data to make business decisions and drive customer experiences. But since its founding in 2005, Pandora has made data the very heart and soul of its business.
“The core of Pandora is personalization,” explained Brett Uyeshiro, the company’s vice president of platform services, speaking at Google Cloud Next ’19. “The idea of Pandora is you can have a laid-back experience. You give us some cues, and we program a continuous stream of content for you, whether it’s music or nonmusic content like podcasts or comedy. And at the heart of this personalization is our data analytics system. It’s really the workhorse behind it.”
Uyeshiro spoke Wednesay at the massive conference for developers and other users of the Google Cloud Platform, which has attracted 30,000 attendees to San Francisco and concludes April 11.
How Pandora Manages Data From 68 Million Users
To date, Pandora has created 13 billion stations for its users, who listen to its content on more than 2,000 different types of devices and generate feedback for the company by giving a digital thumbs up or down to whatever they’re hearing. Pandora uses that feedback to recommend additional content that users might enjoy. Pandora has received and managed roughly 90 billion of these unique feedback points in its history.
To make Pandora work for its 68 million users around the world, its army of engineers and data scientists conduct nonstop analytics on the 7 petabytes of data it has running on 2,700 data nodes. It’s a massive undertaking that continues to get bigger as the weeks roll by and Pandora attracts more users and more data, and as it grows its services.
“And we’ve grown on-prem quite a bit,” Uyeshiro explained. “Every year we’ve had growth — more engineers, more data scientists, all working on a stable cluster. We have one production cluster that runs everything at Pandora. And we’ve added a lot of personalization features, like Personal Soundtrack, and we’ve added features for our advertisers too, including some call campaign reporting features and some segment targeting as well.”
Pandora’s decision last year to move its data to the Google Cloud Platform was a big one. It was necessary, Uyeshiro explained, in part because managing that much data on a single cluster had caused significant strains. Its data scientists and engineers, working with a variety of data management and analysis tools including Apache’s Kafka, Sqoop and Spark, were often jockeying for time on the cluster to run their computations.
“People can’t always get access to the cluster when they want to,” he explained.
The Benefits of Google Cloud for Pandora
But as the data has shifted to the cloud, those resource constraints are easing. One reason is that Google Cloud’s own Big Data tools, such as BigQuery, are boosting the performance of its analyses; in particular, they are cutting the time it takes to run particular queries.
For example, an analysis of impressions run for advertisers that used to take two hours now takes four minutes. Another query — a crucial daily query that the company simply calls “the nightly job” — once took five hours a day. Now it takes 90 minutes. “That’s huge, because a lot of other jobs depend on that nightly process,” Uyeshiro said.
Another benefit is disaster recovery. Because of the sheer size of Pandora’s data cluster, having a true backup is prohibitive; instead, it has what Uyeshiro calls a “business continuity” backup data center outside of Washington, D.C.
“The big drawback with this is that if we did need to fail back and rely on our backup data center, it would really take some time to scale that out to really take on production work,” he said. “Being able to have a true disaster recovery situation is amazing.”
Keep this page bookmarked for complete coverage of Google Cloud Next '19.