Getting data right: 3 symptoms of data inaccuracies and how to cure them


From time to time, we invite our trusted partners to share their views on important topics that can help improve your mobile app business.  The team at Adjust, the mobile app tracking platform, contributed the following ideas on the importance of clean data, and identify three key symptoms that can indicate underlying data collection issues:

App marketing is possibly the most data-driven marketing discipline.

The marketers that we meet are using a variety of tools and platforms to effectively identify the approaches and marketing initiatives that reach the right audiences, resonate with them, and ultimately result in return on investment. These systems are all tied together, talking to each other over postbacks and APIs. Some of them, such as Adjust’s, are there to collect and analyze data. Others, like Leadbolt’s, are designed to act on the data and turn insights into action.

But in order to effectively optimize your campaigns, you need to be able to rely on your data, before you even start making any analyses. Say that you’re counting the change in your pocket, but on every fourth coin, you roll a die and add the number of eyes to your total. Obviously, you’re going to toss the change back into your pocket with a fuzzy idea of how much money you actually have. This is analogous to the problems we refer to as “dirty data”.

Dirty data includes everything from unexplained discrepancies to large-scale piracy or fraud. We’ve all come across sums that don’t match or disproportionate conversion rates. As a counterpoint, we talk about “clean data” to mean datasets that are free from systematic errors, biases, or intentionally spoofed conclusions.

We’ve identified three key symptoms that can indicate underlying data collection issues:

  • “True” revenues from app stores don’t match up with the analytics dataset.

This is a common discrepancy that many people struggle with. The accountants say one thing, and the UA guys say another (typically, the latter are reporting higher revenues than the former.)

This is often driven by fake in-app purchases. Especially in games, utility apps or other verticals that rely on the app stores for billing, some end users attempt to pirate your app. The app’s in-app purchase receipts are spoofed, and the app itself – along with any integrated conversion or analytics SDKs – responds as it would with a legitimate purchase.

This is most common for the high-value IAPs, so many marketers take to ignoring or discounting the high-ticket users when optimizing their campaigns. Counter-intuitive, yet completely reasonable in the circumstances.

  • Some campaigns seem too good to be true.

As the old adage goes, they probably are. Most marketers we work with notice this on a subpublisher level – individual ad groups or segments that have outsize ROI for the channel.

Typically, what’s going on here is a very nefarious type of fraud. By spamming background clicks from second-rate apps, a malicious publisher can fool analytics systems and ad networks alike to misattribute organic installs to a randomly spammed click coming from the same device. As a result, these subpublishers catch the CPI payouts.

Long after the install, though, these truly organic users are still highly engaged, but their post-install activities are attributed to the fraudulent subpublisher, pushing up the average engagement KPIs for those campaigns. This results in campaigns that perform far better than you expect.

  • Conversion rates across the funnel are inconsistent, and weighted toward early parts of the funnel, for some campaigns.

A leaky funnel is a product problem – but a leaky funnel for specific marketing segments is a marketing problem.

One explanation for a high rate of incomplete funnels is that some subpublishers are padding out their CPI revenues by simulating installs altogether. In this simple fraud scheme, you could boot up a cloud server instance and run off-the-shelf OS simulation software (such as developers use for testing), faking an ad click and an associated install. Typically, though, these fraudsters don’t bother to simulate any further in-app activity.

This would result in post-install conversion rates that appear to drop off sharply for some segments.

These issues, when significant in size, can contribute to issues with your optimization, and will affect campaigns you’re running with other networks. When comparing any of these “unusual” campaigns to other channels, the performance of your legitimate campaigns appears relatively weak. As a consequence, many marketers have fallen into a trap where they redistribute budget from legitimate, clean campaigns to campaigns that are mixed to various degrees with the profiteering of fraudsters.

Even if you spot any of the issues above and take their results with a grain of salt, it can be difficult to regain confidence in your data. Decisive data-driven optimizations – those that would result in the greatest success – are often broken down into several “testing” steps. This is generally a good practice‚ but can also be a symptom of a general distrust in the analytics dataset.

At Adjust, we’ve worked with a lot of app marketers in this position and realized that access to clean data is often more important than more data. This was one of our greatest learnings from working on the Fraud Prevention Suite, which we launched in February 2016. For one, filtering out IPs that are permanently associated with data centers or similar (anonymous IPs), where simulation of devices takes place, straightens out the funnels in suspicious campaigns. Distribution Modelling has also proven effective to detect and prevent click-spamming, which is the culprit in campaigns where organic users are poached. Both of these approaches are part of our Fraud Prevention Suite.

The Suite meant that fraudsters were seeing their illegitimate revenues take a nosedive. But many of our early adopters for the Suite weren’t emphasizing this gain in our case studies. Instead, UA professionals from companies like Rovio and Viber were saying that their greatest advantage was trust in their data – and thus their ability to make stronger decisions.


Ready to try Leadbolt?

MarketingGetting data right: 3 symptoms of data inaccuracies and how to cure them