User Privacy Doesn’t Solve Publisher Privacy

User privacy is the largest trend shaping the direction of digital advertising. Governments are creating and enforcing new regulations that impact how advertising works, browsers and app stores are limiting how data can be shared and users are demanding more...

By Paul Bannister

User privacy is the largest trend shaping the direction of digital advertising. Governments are creating and enforcing new regulations that impact how advertising works, browsers and app stores are limiting how data can be shared and users are demanding more control over their data.

Some publishers believe that this more private future will increase privacy for their own data, and reduce information they leak to the ad tech ecosystem. While some types of data leakage will indeed be eliminated, many issues remain, and new types of leakage will emerge that publishers still need to understand and mitigate.

Four ways data leakage will persist

The first two issues exist within both Google’s Privacy Sandbox proposals and alternatives. Whichever bird proposal is ultimately adopted, they all suffer from one major flaw: Advertisers can request that publishers expose cohorts on their sites. The TURTLEDOVE proposal neatly explains this proposition:

This means that an advertiser can gather data from a given publisher and use that data to target cohorts around the web – without any money flowing back to the original publisher. Cue the email saying, “If you add these cohort tags to your pages, our agency promises we will spend millions of dollars with your site!” (Not.)

The second issue exists within the FLoC proposal. FLoC uses machine AND federated learning technologies to let browsers figure out which cohorts they belong to, based on pages the user visits, the content of those pages and other signals. The browser will expose that data via FLoC IDs – short alphanumeric codes, such as “49A7”, that don’t say what the cohort means, nor does the browser know.

Ad tech firms can then experiment with running different campaigns against those FLoC IDs (“Oh, cohort 49A7B converts really well for sneaker sales campaigns.”) and optimize campaigns toward the cohorts that work best.

And there’s the issue: The companies that will know the most about which cohorts work for which campaigns will be ad tech companies, and all of that cohort data, while user-private, is entirely based on data that is leaked from publisher and advertiser sites.

The first two issues will only exist in Chrome browsers, as other browsers are unlikely to adopt any of the advertising-related Privacy Sandbox proposals, while the next two issues will exist in any web browser.

Many publishers are flocking to “email identity” platforms that let them link users’ email addresses, collected via login or newsletter sign-up, to first-party cookies. This makes it easier for publishers to connect data to their users and easier for buyers to find their own customers and prospects on publishers’ sites. But the downside is that every ad tech intermediary can see that a given user was on a specific publisher and got a specific ad. Plus, the buyer will now know an extra piece of information about the user – they can attach the context of the page to their user profile. There are some mitigations that can be taken against this, but a lot of data will still easily leak out.

The last form of data leakage is the page context itself. Publishers’ pages are public on the internet. Even those behind paywalls usually have some way of being scraped. And the URL that a given user is visiting is available in every OpenRTB bid request to advertisers – which means that any contextual targeting company can use their algorithm to classify the content.

Buyers will want their contextual targeting options to be built into their demand-side platforms (DSPs). By default, buyers are going to rely on those integrated tools, which will be blunt-instrument (at best) and not do a good job of presenting the true deep context of a page that only the publisher will know.

Is there any hope for publishers?

There are some ways to cut down on data leakage. One of the best if uninspired ways to do so is with contracts. CCPA and California Ballot Proposition 24 offer some direction with respect to “service providers.” Ensuring that your partners and their downstream partners only use your data for your transactions may be hard to enforce, but it’s a great push forward to protecting publishers’ valuable data.

Another mitigation is forming with efforts from Prebid and the IAB. The IAB has already defined a content taxonomy, but there’s no agreed-upon way for publishers to create this data and push it out to advertisers. The Prebid organization has created a new committee to standardize these processes and allow publishers to push their own data out to buyers. While there are a number of obstacles to overcome, publisher data will ultimately be far more valuable to buyers than mass-produced third-party context scraping, and put publishers more in control of their own data.

Google’s bird proposals still have their issues and there’s little that can be done to mitigate this. Advertisers pressuring publishers to add TURTLEDOVE tags to their pages is going to be a serious problem in the future with no easy remedy. On all of these fronts, publishers need to become more active in the W3C, Prebid and IAB organizations, or entrust other parties to help push for their best interests in these technical negotiations.

That said, it’s questionable whether a world without data leakage is actually the best outcome for open web publishers. There may be a happy medium that works best for advertisers and for publishers.

If publishers expect the new privacy-friendly world to protect the privacy of their own data, they should look behind the curtain and understand the next generation of data leakage issues.

This article was originally posted on AdExchanger.