Rules in content_filter
The need for speed: ADBlock extensions
ADBlock is one the most used extensions, and we noticed that on our comptetitors it heavily decreases the speed of the browser, making it even 3 times slower.
As our content_filter is very fast, we thought that this could have been a good opportunity to take the speed lead when this particular extension is used.
The main idea was to design an API to allow Opera to ideally execute the URL filtering done by ADBlock without using Javascript in the filtering process (Javascript is of course used by the extension to build the list of url filters).
We were ready to sacrify some of ADBlocck less important funcionalities, to give priority to speed.
ADBlock filtering mechanism is quite sophisticated, and reproducing it without using Javascript meant that a simple list of pattenr was not enough, so we had to introduce a new mechanism: rules.
More flexibility: rules
Simply put, a rule is an object that can validate a pattern match.
So an URL will be blocked only if:
- there is a pattern that matches the URL and
- all there rules associated to the pattern are valid for the URL (rules are in logical AND)
If there are no rules associated to a pattern, then if there is a match is already valid.
A side effect of rules is that if we don't have enough context to apply the rules, we cannot block a URL based only on the pattern, so we have to be careful to cover all the cases (as blocking URLs only at a network level is not always possible anymore).
Example of rules
There are several rules that can be attached to each pattern:
-
thirdParty: a URL from site A is loaded by another site
-
includeDomains: the URL is loaded by a site belonging to this list
-
excludeDomains: the URL is loaded by a site not belonging to this list
-
resources: the URL is loaded by one of the resources listed (for example it is loaded as an image, or as a font)
The need for context
Supporting these rules was not possible with the previous version of content_filter, as we did not have enough information to apply the rules.
content_filter was born as a pure network module, were the URL string was enough to apply the patterns and decide if the loading could have been executed or not.
Now this is no longer enough, and when a loading is requested some context has to be provide, to tell content filter which ServerName is requesting the URL, and which element initiated the load. More information (like the DOM_Environment) are also required for the events that we send to the extension.
Please note that for performance reason (and to try avoiding abuse from the extensions developer), we fire events only when a URL is blocked, or when it has ben "reallowed" (meaning that the black list blocks it, but the white list allows it).
Rules hierarchy
This is a diagram of the main rule classes: