We are proud to share that we have been recognized as one of the top 15 web developers in Ukraine!...
[Case Study] The Challenges of Building a Privacy-focused Ad Platform
Ad-Annonce is a privacy-focused, Swiss-based AdWords equivalent that offers targeted advertising while maintaining website visitors’ anonymity. We discuss three challenges we faced while building this ad platform.
Sergey Slepokurov recently developed a privacy-focused ad platform, called Ad-Annonce, that targets the German-speaking market in Germany and Switzerland.
First, some Swiss cows.
Before we dive into the details of Ad-Annonce, we have to talk about some Swiss cows. No, these aren’t alpine-dwelling bovines wearing cute bells. We’re actually talking about a well-designed anonymous search engine — swisscows.ch. Swisscows takes data privacy seriously. They don’t collect any personal information about their visitors — no data about web browsers or operating systems, no geolocation, no cookies. They don’t store IP addresses. In other words, Swisscows is the Swiss version of DuckDuckGo.
The Swiss certainly care about their data privacy. In fact, data privacy is even a right protected by the 1992 Federal Act on Data Protection and by Article 13 of the Swiss Constitution, which states:
- Every person has the right to privacy in their private and family life and in their home, and in relation to their mail and telecommunications.
- Every person has the right to be protected against the misuse of their personal data.
“The headquarters of Swisscows is located in Switzerland. Swisscows is solely subject to Swiss law.”
“Swisscows only transmits personal data to third parties if we are legally obligated to do so …”
Note: NPR ran an insightful article in 2014 about Switzerland’s increasing appeal as a data safe zone. You can read it here: Switzerland: From Banking Paradise To Data Safe Zone
Swisscows is a visually appealing search engine that focuses on data privacy.
The anonymous search model of Swisscows differs markedly from Google’s model. Google’s entire business model is built on tracking user data and analysing every detail of every search. They even track user actions on websites that run Google Analytics: a requirement for any site displaying AdSense ads. Using all of this data, Google constructs user-specific ad profiles to offer highly targeted advertising.
Swisscows does none of this. But they DO offer in-search advertising through their Ad-Annonce advertising platform. Ad-Annonce is like Google AdWords — with anonymity. The Ad-Annonce network can also be integrated with sites other than Swisscows.ch; but no matter where ads are shown, Ad-Annonce respects visitor privacy.
Ads shown through the Ad-Annonce network are chosen solely based on a visitor’s real-time actions on a site. In other words, if you search for “gold iPhone cases,” then the ads you’re shown will be based on that query — and only on that query!
Let’s look at three of the biggest challenges we faced while developing this privacy-focused ad platform: (1) matching web texts to keyword rules, (2) dynamically pricing ads, and (3) rotating ads to balance number of ad views.
Online ads target keywords. And these keywords have specific rules attached to them. Keyword rules tell the search engine (or the internal ads engine) what content your ad should be paired with.
For example, an ad might target the keywords “buy” and “MacBook” — but exclude the keyword “used.” (This exclusion is called negative keywording.) One ad can have many positive and many negative rules: in fact, an ad may have as many as 100 rules.
The first challenge, then, is pairing web content with rules-based ads. It turns out that matching web texts to keyword rules is a bit more challenging than it appears at first glance. The second challenge is dynamically pricing ads, taking into account demand and several constants. The third challenge is rotating ads so that all ads have an equal opportunity to be displayed. If 100 ads have overlapping rules, for instance, then we need to make sure that all ads are displayed over time — not only the top 10.
1. Matching web texts to keyword rules.
When you search the web on any search engine, you’re essentially scanning full website texts for a specific keyword or phrase. This is a fairly common task, and is well understood. But when developing Ad-Annonce we needed to do the opposite: we needed to match texts to rules-based keywords, not locate keywords within a text.
Current out-of-the-box solutions don’t offer the inverse search functionality that we required. We did find one product — Elastic’s Percolator API — that can extract keywords from text, but unfortunately it’s still in beta and couldn’t handle all of the rules we needed to take into account for this project.
Since we couldn’t find a ready-made solution, we decided to design our own SQL-based engine to match web texts to keywords, taking into account all keyword rules established by advertising clients.
We’ll explain our SQL-based text-to-keyword solution in five steps (A–E).
A) Basics of keyword rules.
Each Ad-Annonce ad campaign can have as many as 100 triggers, and each trigger can be as long as 250 characters. Triggers are written in the following format:
|Trigger ->||php,||php help,||= php faq,||“php help”,||– not,||– not wiki|
|Rule Type||simple||complex||equal||match||stop||stop complex|
A trigger is the entire green row, composed of individual rules. A trigger can pass or fail for any given rule. The six types of rules are indicated in the bottom row.
As we can see in the table above, there are six types of keyword rules: (1) simple, (2) complex, (3) equal, (4) match, (5) stop, and (6) stop complex.
Here’s what these rules mean:
|Equal||= php faq||The search query should be exactly “php faq,” in that word order, and can include no other words.|
|Stop Complex||– not wiki||The trigger fails if “not” and “wiki” appear in that order within the search query. They may not be separated by other words.|
|Stop||– not||The trigger fails if “not” is located anywhere in the search query.|
|Simple||php||The word “php” must occur as least once, anywhere in the search query.|
|Complex||php help||The words “php” and “help” must both occur in the search query. The may be separated by other words, and/or their order may be inverted.|
|Match||“php help”||This exact string, “php help,” must be found within the search query.|
These are the rules against which we will check our search query.
B) Storing triggers in a word index.
We must store these triggers (and their constituent rules) in an easily searchable index. To do this we create a word index that contains the following values:
- Words (singly)
- Word Type
- Language (Ad-Annonce currently supports German, French, and English)
- Word Count (for complex rules — i.e. the match rule “php count” has a Word Count of 2)
- Order (for example, in “php count,” “php” is Order == 0, “count” is Order == 1)
C) Denormalizing the word index.
Once we’ve created our word index, we then “denormalize” the index to optimize for future search operations. Denormalizing optimizes the read performance of our index. Read performance can be improved by adding redundant data as well as by grouping data. For this project we’re concerned about search speed; data redundancy isn’t a big deal.
D) Searching by input query using the denormalized word index.
Once we have our denormalized index, we now need to run a website visitor’s search query through that index. Our search query could literally be any word or phrase — for example, “argyle socks made in Poland.”
To facilitate our search, we build a table of “dirty results.” As we move through the search process, we erase triggers from our dirty results when triggers pass or fail. Because we remove triggers from dirty results, processing becomes faster over time.
Triggers that pass are stored in a temporary table, corresponding to the search round during which the trigger passed.
Both triggers that pass AND triggers that fail are removed from the table of Dirty Results (left). Triggers that pass are stored in a Temporary Table (right).
We search starting with the least expensive rule and working our way to the most expensive rule. The “expense” of a rule is determined by what system resources are required to test that rule (CPU, memory, etc.). Here’s the order of operations, from least expensive to most expensive:
- Equal (positive rule)
- Stop Complex (negative rule)
- Stop (negative rule)
- Simple (positive rule)
- Complex (positive rule)
- Match (positive rule)
We create a temporary table corresponding to each positive rule — Equal, Simple, Complex, and Match. These temporary tables are then populated with triggers that pass during a given search round:
Because we erase triggers from dirty results whenever a trigger passes or fails, triggers can only pass during one search round; after a trigger has passed, it is no longer considered. In this way, triggers that pass are spread across all four temporary tables illustrated in the right-hand column. This brings us to step 5.
E) Generating final results.
Once we have calculated our search results, we need to bring together our four temporary tables (that we just discussed). We need to match these lists back together so that we can get a unified list of which triggers have been activated.
When a trigger is activated, its associated ad goes onto a Score Table. Ads that are displayed to website visitors are selected from the Score Table.
2. Dynamically pricing ads.
Our second challenge while building Ad-Annonce was dynamically pricing our ads. All advertising campaigns on Ad-Annonce specify budget and/or time limitations in addition to specifying keywords and rules.
In the first version of Ad-Annonce, campaigns were charged a fixed rate for every click, with fraud detection built in to reduce the risk of a client paying for fraudulent clicks. In the current version, however, Ad-Annonce offers dynamic pricing — just like Google’s AdWords. This means that the cost for a click varies depending on how many clients have ads targeting the same keyword. Each keyword can cost a varying amount over time due to basic supply and demand.
We calculate a dynamic price for each keyword that matches one of our passed triggers. This dynamic price is calculated at the step where we merge our lists of passed triggers (discussed in the previous section of this article). Here’s the formula that we use to calculate the dynamic price:
Dynamic Price = (Base Cost) + (Number Unique Ads/ Price Point Gap) * Delta
|Base Cost||A constant — 1 Frank, for example.|
|Number Unique Ads||The number of unique advertising clients who have ads with a given rule.|
|Price Point Gap||A constant that determines the gap between price points; set in the admin panel.|
|Delta||A constant, also set in the admin panel.|
If an ad campaign uses dynamic pricing, then it will automatically be charged the calculated price for all displayed ads. However, ad campaigns can also use static pricing. If a campaign uses static pricing, then an ad will be shown if the static price is set above the dynamically calculated price (i.e. if it has outbid its dynamically-priced competitors).
However, there may be a situation where there are only fixed price ads for a given rule, and no dynamically priced ads. In this case, we could encounter a scenario where the calculated dynamic cost is greater than the highest-priced fixed-cost ad. One logical way to resolve this scenario would be to display no ads, since no bids are equal to or greater than the dynamic price. However, it is better to show ads at some cost than to show no ads whatsoever. Therefore, in this case we downgrade our dynamic pricing, aligning it with the pricing of the highest-priced fixed-cost ad. In other words, if no fixed-cost ads meet or exceed the dynamically calculated pricing, we’ll still go ahead and show the ads that pay the most.
We should also note that there’s a maximum cost ceiling for ads. This means that even with infinite competition there is still a fixed price limit that’s within reason.
3. Rotating ads to balance number of ad views.
Our third challenge was “normalizing” ad views. In short, all ads should have an equal right to be displayed, so we needed some way to balance which ads are shown.
Let’s consider our “buy MacBook” ad again, only let’s assume that we actually have 1000 ads that match the rules “buy” and “MacBook.” In this case, the default action would be to return only the top five ads. Clearly we need to rotate ads — and not just randomly — to give equal visibility to all 1000 of those ads.
To give equal visibility, we select ads for display using view counters, then sort ads in ascending order based on number of views. This pushes ads with lower view counts to the top.
When there’s a view count tie, we show higher-paying ads first. Then we continue rotating through all other ads. This means that higher-paying ads won’t always be first in the queue, but they will return to first place every time we normalize, or flush, the view counts.
In order to normalize views, we flush the view counter daily. Flushing the counters rebalances ads. Rebalancing makes sure that ads that have already received many views are still shown, but also makes sure that fresh ads are mixed in with older ads. If we didn’t normalize by flushing our counters, then over time only fresh ads with low view counts would be displayed, displacing all older ads that have already had many views.
View counters also check if ad campaigns have reached their view limits. However, this introduces another complication: detecting spam clicks. To detect spam, we initially track all clicks. After one day of tracking, we are able to detect which clicks are spam and which clicks are legitimate. This means that click counts become accurate on the second day of an ad campaign, as it takes one day to detect which clicks are spam. After we’ve weeded out the spam, we can then count all valid clicks against a campaign’s limit (including legitimate clicks from the first day).
Ad-Annonce offers anonymity for website visitors while letting companies extend their advertising reach and letting websites earn money from paid ad clicks.
It’s simple to take AdWords campaigns and duplicate them in the Ad-Annonce ad platform. Online ads through Google may be shown on millions of sites, but they’re only shown on sites where users can be tracked. Without Google scripts, Google ads cannot be integrated. Ad-Annonce allows ad campaigns to extend their reach onto the Swisscows search engine and onto other websites that prize their visitor’s anonymity.
While developing the Ad-Annonce privacy-focused ad platform we encountered three primary challenges: (1) matching texts to keyword rules, (2) pricing ads, and (3) rotating ads to normalize number of views. To overcome these challenges we (1) developed our own SQL-based text-to-keyword tool, (2) established a formula for dynamic pricing that can work alongside statically-priced ads, and (3) weighed view counters and ad payout amounts to achieve balanced ad rotation.