Privoxy uses Perl-style "regular
- expressions" in its actions
- files and filter file,
- through the PCRE and
- PCRS libraries.
If you are reading this, you probably don't understand what "regular
- expressions" are, or what they can do. So this will be a very brief
- introduction only. A full explanation would require a book ;-)
Regular expressions provide a language to describe patterns that can be
- run against strings of characters (letter, numbers, etc), to see if they
- match the string or not. The patterns are themselves (sometimes complex)
- strings of literal characters, combined with wild-cards, and other special
- characters, called meta-characters. The "meta-characters" have
- special meanings and are used to build complex patterns to be matched against.
- Perl Compatible Regular Expressions are an especially convenient
- "dialect" of the regular expression language.
To make a simple analogy, we do something similar when we use wild-card
- characters when listing files with the dir command in DOS.
- *.* matches all filenames. The "special"
- character here is the asterisk which matches any and all characters. We can be
- more specific and use ? to match just individual
- characters. So "dir file?.text" would match
- "file1.txt", "file2.txt", etc. We are pattern
- matching, using a similar technique to "regular expressions"!
Regular expressions do essentially the same thing, but are much, much more
- powerful. There are many more "special characters" and ways of
- building complex patterns however. Let's look at a few of the common ones,
- and then some examples:
. - Matches any single character, e.g. "a",
- "A", "4", ":", or "@".
-
? - The preceding character or expression is matched ZERO or ONE
- times. Either/or.
-
+ - The preceding character or expression is matched ONE or MORE
- times.
-
* - The preceding character or expression is matched ZERO or MORE
- times.
-
\ - The "escape" character denotes that
- the following character should be taken literally. This is used where one of the
- special characters (e.g. ".") needs to be taken literally and
- not as a special meta-character. Example: "example\.com", makes
- sure the period is recognized only as a period (and not expanded to its
- meta-character meaning of any single character).
-
[ ] - Characters enclosed in brackets will be matched if
- any of the enclosed characters are encountered. For instance, "[0-9]"
- matches any numeric digit (zero through nine). As an example, we can combine
- this with "+" to match any digit one of more times: "[0-9]+".
-
( ) - parentheses are used to group a sub-expression,
- or multiple sub-expressions.
-
| - The "bar" character works like an
- "or" conditional statement. A match is successful if the
- sub-expression on either side of "|" matches. As an example:
- "/(this|that) example/" uses grouping and the bar character
- and would match either "this example" or "that
- example", and nothing else.
-
These are just some of the ones you are likely to use when matching URLs with
- Privoxy, and is a long way from a definitive
- list. This is enough to get us started with a few simple examples which may
- be more illuminating:
/.*/banners/.* - A simple example
- that uses the common combination of "." and "*" to
- denote any character, zero or more times. In other words, any string at all.
- So we start with a literal forward slash, then our regular expression pattern
- (".*") another literal forward slash, the string
- "banners", another forward slash, and lastly another
- ".*". We are building
- a directory path here. This will match any file with the path that has a
- directory named "banners" in it. The ".*" matches
- any characters, and this could conceivably be more forward slashes, so it
- might expand into a much longer looking path. For example, this could match:
- "/eye/hate/spammers/banners/annoy_me_please.gif", or just
- "/banners/annoying.html", or almost an infinite number of other
- possible combinations, just so it has "banners" in the path
- somewhere.
And now something a little more complex:
/.*/adv((er)?ts?|ertis(ing|ements?))?/ -
- We have several literal forward slashes again ("/"), so we are
- building another expression that is a file path statement. We have another
- ".*", so we are matching against any conceivable sub-path, just so
- it matches our expression. The only true literal that must
- match our pattern is adv, together with
- the forward slashes. What comes after the "adv" string is the
- interesting part.
Remember the "?" means the preceding expression (either a
- literal character or anything grouped with "(...)" in this case)
- can exist or not, since this means either zero or one match. So
- "((er)?ts?|ertis(ing|ements?))" is optional, as are the
- individual sub-expressions: "(er)",
- "(ing|ements?)", and the "s". The "|"
- means "or". We have two of those. For instance,
- "(ing|ements?)", can expand to match either "ing"
- OR"ements?". What is being done here, is an
- attempt at matching as many variations of "advertisement", and
- similar, as possible. So this would expand to match just "adv",
- or "advert", or "adverts", or
- "advertising", or "advertisement", or
- "advertisements". You get the idea. But it would not match
- "advertizements" (with a "z"). We could fix that by
- changing our regular expression to:
- "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which would then match
- either spelling.
/.*/advert[0-9]+\.(gif|jpe?g) - Again
- another path statement with forward slashes. Anything in the square brackets
- "[ ]" can be matched. This is using "0-9" as a
- shorthand expression to mean any digit one through nine. It is the same as
- saying "0123456789". So any digit matches. The "+"
- means one or more of the preceding expression must be included. The preceding
- expression here is what is in the square brackets -- in this case, any digit
- one through nine. Then, at the end, we have a grouping: "(gif|jpe?g)".
- This includes a "|", so this needs to match the expression on
- either side of that bar character also. A simple "gif" on one side, and the other
- side will in turn match either "jpeg" or "jpg",
- since the "?" means the letter "e" is optional and
- can be matched once or not at all. So we are building an expression here to
- match image GIF or JPEG type image file. It must include the literal
- string "advert", then one or more digits, and a "."
- (which is now a literal, and not a special character, since it is escaped
- with "\"), and lastly either "gif", or
- "jpeg", or "jpg". Some possible matches would
- include: "//advert1.jpg",
- "/nasty/ads/advert1234.gif",
- "/banners/from/hell/advert99.jpg". It would not match
- "advert1.gif" (no leading slash), or
- "/adverts232.jpg" (the expression does not include an
- "s"), or "/advert1.jsp" ("jsp" is not
- in the expression anywhere).
We are barely scratching the surface of regular expressions here so that you
- can understand the default Privoxy
- configuration files, and maybe use this knowledge to customize your own
- installation. There is much, much more that can be done with regular
- expressions. Now that you know enough to get started, you can learn more on
- your own :/
For information on regular expression based substitutions and their applications
- in filters, please see the filter file tutorial
- in this manual.
14.2. Privoxy's Internal Pages
Since Privoxy proxies each requested
- web page, it is easy for Privoxy to
- trap certain special URLs. In this way, we can talk directly to
- Privoxy, and see how it is
- configured, see how our rules are being applied, change these
- rules and other configuration options, and even turn
- Privoxy's filtering off, all with
- a web browser.
The URLs listed below are the special ones that allow direct access
- to Privoxy. Of course,
- Privoxy must be running to access these. If
- not, you will get a friendly error message. Internet access is not
- necessary either.
These may be bookmarked for quick reference. See next.
14.2.1. Bookmarklets
Below are some "bookmarklets" to allow you to easily access a
- "mini" version of some of Privoxy's
- special pages. They are designed for MS Internet Explorer, but should work
- equally well in Netscape, Mozilla, and other browsers which support
- JavaScript. They are designed to run directly from your bookmarks - not by
- clicking the links below (although that should work for testing).
To save them, right-click the link and choose "Add to Favorites"
- (IE) or "Add Bookmark" (Netscape). You will get a warning that
- the bookmark "may not be safe" - just click OK. Then you can run the
- Bookmarklet directly from your favorites/bookmarks. For even faster access,
- you can put them on the "Links" bar (IE) or the "Personal
- Toolbar" (Netscape), and run them with a single click.
Credit: The site which gave us the general idea for these bookmarklets is
- www.bookmarklets.com. They
- have more information about bookmarklets.
14.3. Chain of Events
Let's take a quick look at the basic sequence of events when a web page is
- requested by your browser and Privoxy is on duty:
First, your web browser requests a web page. The browser knows to send
- the request to Privoxy, which will in turn,
- relay the request to the remote web server after passing the following
- tests:
-
Privoxy traps any request for its own internal CGI
- pages (e.g http://p.p/) and sends the CGI page back to the browser.
-
Next, Privoxy checks to see if the URL
- matches any "+block" patterns. If
- so, the URL is then blocked, and the remote web server will not be contacted.
- "+handle-as-image"
- is then checked and if it does not match, an
- HTML "BLOCKED" page is sent back. Otherwise, if it does match,
- an image is returned. The type of image depends on the setting of "+set-image-blocker"
- (blank, checkerboard pattern, or an HTTP redirect to an image elsewhere).
-
Untrusted URLs are blocked. If URLs are being added to the
- trust file, then that is done.
-
If the URL pattern matches the "+fast-redirects" action,
- it is then processed. Unwanted parts of the requested URL are stripped.
-
Now the rest of the client browser's request headers are processed. If any
- of these match any of the relevant actions (e.g. "+hide-user-agent",
- etc.), headers are suppressed or forged as determined by these actions and
- their parameters.
-
Now the web server starts sending its response back (i.e. typically a web page and related
- data).
-
If the "+kill-popups"
- action applies, and it is an HTML or JavaScript document, the popup-code in the
- response is filtered on-the-fly as it is received.
-
If a "+filter"
- or "+deanimate-gifs"
- action applies (and the document type fits the action), the rest of the page is
- read into memory (up to a configurable limit). Then the filter rules (from
- default.filter and any other filter files) are
- processed against the buffered content. Filters are applied in the order
- they are specified in one of the filter files. Animated GIFs, if present,
- are reduced to either the first or last frame, depending on the action
- setting.The entire page, which is now filtered, is then sent by
- Privoxy back to your browser.
-
If neither "+filter"
- or "+deanimate-gifs"
- matches, then Privoxy passes the raw data through
- to the client browser as it becomes available.
-
As the browser receives the now (possibly filtered) page content, it
- reads and then requests any URLs that may be embedded within the page
- source, e.g. ad images, stylesheets, JavaScript, other HTML documents (e.g.
- frames), sounds, etc. For each of these objects, the browser issues a new
- request. And each such request is in turn processed as above. Note that a
- complex web page may have many such embedded URLs.
-
14.4. Troubleshooting: Anatomy of an Action
The way Privoxy applies
- actions and filters
- to any given URL can be complex, and not always so
- easy to understand what is happening. And sometimes we need to be able to
- see just what Privoxy is
- doing. Especially, if something Privoxy is doing
- is causing us a problem inadvertently. It can be a little daunting to look at
- the actions and filters files themselves, since they tend to be filled with
- regular expressions whose consequences are not
- always so obvious.
One quick test to see if Privoxy is causing a problem
- or not, is to disable it temporarily. This should be the first troubleshooting
- step. See the Bookmarklets section on a quick
- and easy way to do this (be sure to flush caches afterward!). Looking at the
- logs is a good idea too.
Another easy troubleshooting step to try is if you have done any
- customization of your installation, revert back to the installed
- defaults and see if that helps. There are times the developers get complaints
- about one thing or another, and the problem is more related to a customized
- configuration issue.
Privoxy also provides the
- http://config.privoxy.org/show-url-info
- page that can show us very specifically how actions
- are being applied to any given URL. This is a big help for troubleshooting.
First, enter one URL (or partial URL) at the prompt, and then
- Privoxy will tell us
- how the current configuration will handle it. This will not
- help with filtering effects (i.e. the "+filter" action) from
- one of the filter files since this is handled very
- differently and not so easy to trap! It also will not tell you about any other
- URLs that may be embedded within the URL you are testing. For instance, images
- such as ads are expressed as URLs within the raw page source of HTML pages. So
- you will only get info for the actual URL that is pasted into the prompt area
- -- not any sub-URLs. If you want to know about embedded URLs like ads, you
- will have to dig those out of the HTML source. Use your browser's "View
- Page Source" option for this. Or right click on the ad, and grab the
- URL.
Let's try an example, google.com,
- and look at it one section at a time in a sample configuration (your real
- configuration may vary):
Privoxy uses Perl-style
+ "regular expressions" in its actions files and filter file, through the PCRE and PCRS libraries.
+
+
If you are reading this, you probably don't understand what
+ "regular expressions" are, or what they can
+ do. So this will be a very brief introduction only. A full explanation
+ would require a book ;-)
+
+
Regular expressions provide a language to describe patterns that can
+ be run against strings of characters (letter, numbers, etc), to see if
+ they match the string or not. The patterns are themselves (sometimes
+ complex) strings of literal characters, combined with wild-cards, and
+ other special characters, called meta-characters. The "meta-characters" have special meanings and are used to
+ build complex patterns to be matched against. Perl Compatible Regular
+ Expressions are an especially convenient "dialect" of the regular expression language.
+
+
To make a simple analogy, we do something similar when we use
+ wild-card characters when listing files with the dir command in DOS. *.* matches
+ all filenames. The "special" character here
+ is the asterisk which matches any and all characters. We can be more
+ specific and use ? to match just individual
+ characters. So "dir file?.text" would match
+ "file1.txt", "file2.txt", etc. We are pattern matching, using a
+ similar technique to "regular
+ expressions"!
+
+
Regular expressions do essentially the same thing, but are much,
+ much more powerful. There are many more "special
+ characters" and ways of building complex patterns however. Let's
+ look at a few of the common ones, and then some examples:
+
+
+
+
+
. -
+ Matches any single character, e.g. "a", "A", "4", ":", or
+ "@".
+
+
+
+
+
+
+
+
? - The
+ preceding character or expression is matched ZERO or ONE times.
+ Either/or.
+
+
+
+
+
+
+
+
+ - The
+ preceding character or expression is matched ONE or MORE
+ times.
+
+
+
+
+
+
+
+
* - The
+ preceding character or expression is matched ZERO or MORE
+ times.
+
+
+
+
+
+
+
+
\ - The
+ "escape" character denotes that the
+ following character should be taken literally. This is used where
+ one of the special characters (e.g. ".") needs to be taken literally and not as a
+ special meta-character. Example: "example\.com", makes sure the period is
+ recognized only as a period (and not expanded to its
+ meta-character meaning of any single character).
+
+
+
+
+
+
+
+
[ ] -
+ Characters enclosed in brackets will be matched if any of the
+ enclosed characters are encountered. For instance, "[0-9]" matches any numeric digit (zero through
+ nine). As an example, we can combine this with "+" to match any digit one of more times:
+ "[0-9]+".
+
+
+
+
+
+
+
+
( ) -
+ parentheses are used to group a sub-expression, or multiple
+ sub-expressions.
+
+
+
+
+
+
+
+
| - The
+ "bar" character works like an
+ "or" conditional statement. A match is
+ successful if the sub-expression on either side of "|" matches. As an example: "/(this|that) example/" uses grouping and the bar
+ character and would match either "this
+ example" or "that example", and
+ nothing else.
+
+
+
+
+
These are just some of the ones you are likely to use when matching
+ URLs with Privoxy, and is a long way
+ from a definitive list. This is enough to get us started with a few
+ simple examples which may be more illuminating:
+
+
/.*/banners/.* - A simple example that uses
+ the common combination of "." and
+ "*" to denote any character, zero or more
+ times. In other words, any string at all. So we start with a literal
+ forward slash, then our regular expression pattern (".*") another literal forward slash, the string
+ "banners", another forward slash, and lastly
+ another ".*". We are building a directory
+ path here. This will match any file with the path that has a directory
+ named "banners" in it. The ".*" matches any characters, and this could conceivably
+ be more forward slashes, so it might expand into a much longer looking
+ path. For example, this could match: "/eye/hate/spammers/banners/annoy_me_please.gif", or
+ just "/banners/annoying.html", or almost an
+ infinite number of other possible combinations, just so it has
+ "banners" in the path somewhere.
+
+
And now something a little more complex:
+
+
/.*/adv((er)?ts?|ertis(ing|ements?))?/ - We
+ have several literal forward slashes again ("/"), so we are building another expression that is a
+ file path statement. We have another ".*",
+ so we are matching against any conceivable sub-path, just so it matches
+ our expression. The only true literal that must match our pattern is
+ adv, together with the forward
+ slashes. What comes after the "adv" string
+ is the interesting part.
+
+
Remember the "?" means the preceding
+ expression (either a literal character or anything grouped with
+ "(...)" in this case) can exist or not,
+ since this means either zero or one match. So "((er)?ts?|ertis(ing|ements?))" is optional, as are the
+ individual sub-expressions: "(er)",
+ "(ing|ements?)", and the "s". The "|" means
+ "or". We have two of those. For instance,
+ "(ing|ements?)", can expand to match either
+ "ing"OR"ements?". What is
+ being done here, is an attempt at matching as many variations of
+ "advertisement", and similar, as possible.
+ So this would expand to match just "adv", or
+ "advert", or "adverts", or "advertising",
+ or "advertisement", or "advertisements". You get the idea. But it would not
+ match "advertizements" (with a "z"). We could fix that by changing our regular
+ expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which
+ would then match either spelling.
+
+
/.*/advert[0-9]+\.(gif|jpe?g) - Again another
+ path statement with forward slashes. Anything in the square brackets
+ "[ ]" can be matched. This is using
+ "0-9" as a shorthand expression to mean any
+ digit one through nine. It is the same as saying "0123456789". So any digit matches. The "+" means one or more of the preceding expression must
+ be included. The preceding expression here is what is in the square
+ brackets -- in this case, any digit one through nine. Then, at the end,
+ we have a grouping: "(gif|jpe?g)". This
+ includes a "|", so this needs to match the
+ expression on either side of that bar character also. A simple
+ "gif" on one side, and the other side will
+ in turn match either "jpeg" or "jpg", since the "?" means
+ the letter "e" is optional and can be
+ matched once or not at all. So we are building an expression here to
+ match image GIF or JPEG type image file. It must include the literal
+ string "advert", then one or more digits,
+ and a "." (which is now a literal, and not a
+ special character, since it is escaped with "\"), and lastly either "gif", or "jpeg", or
+ "jpg". Some possible matches would include:
+ "//advert1.jpg", "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It would not match
+ "advert1.gif" (no leading slash), or
+ "/adverts232.jpg" (the expression does not
+ include an "s"), or "/advert1.jsp" ("jsp" is not
+ in the expression anywhere).
+
+
We are barely scratching the surface of regular expressions here so
+ that you can understand the default Privoxy configuration files, and maybe use this
+ knowledge to customize your own installation. There is much, much more
+ that can be done with regular expressions. Now that you know enough to
+ get started, you can learn more on your own :/
Since Privoxy proxies each
+ requested web page, it is easy for Privoxy to trap certain special URLs. In this way,
+ we can talk directly to Privoxy, and
+ see how it is configured, see how our rules are being applied, change
+ these rules and other configuration options, and even turn Privoxy's filtering off, all with a web
+ browser.
+
+
The URLs listed below are the special ones that allow direct access
+ to Privoxy. Of course, Privoxy must be running to access these. If not,
+ you will get a friendly error message. Internet access is not necessary
+ either.
Toggle Privoxy on or off. This feature can be turned off/on in
+ the main config file. When toggled
+ "off", "Privoxy" continues to run, but only as a
+ pass-through proxy, with no actions taking place:
Below are some "bookmarklets" to allow
+ you to easily access a "mini" version of
+ some of Privoxy's special pages.
+ They are designed for MS Internet Explorer, but should work equally
+ well in Netscape, Mozilla, and other browsers which support
+ JavaScript. They are designed to run directly from your bookmarks -
+ not by clicking the links below (although that should work for
+ testing).
+
+
To save them, right-click the link and choose "Add to Favorites" (IE) or "Add
+ Bookmark" (Netscape). You will get a warning that the bookmark
+ "may not be safe" - just click OK. Then
+ you can run the Bookmarklet directly from your favorites/bookmarks.
+ For even faster access, you can put them on the "Links" bar (IE) or the "Personal
+ Toolbar" (Netscape), and run them with a single click.
Let's take a quick look at how some of Privoxy's core features are triggered, and the
+ ensuing sequence of events when a web page is requested by your
+ browser:
+
+
+
+
First, your web browser requests a web page. The browser knows
+ to send the request to Privoxy,
+ which will in turn, relay the request to the remote web server
+ after passing the following tests:
+
+
+
+
Privoxy traps any request for
+ its own internal CGI pages (e.g http://p.p/) and sends the CGI page back to the
+ browser.
+
+
+
+
Next, Privoxy checks to see if
+ the URL matches any "+block" patterns. If so, the URL is then
+ blocked, and the remote web server will not be contacted. "+handle-as-image" and "+handle-as-empty-document" are then checked,
+ and if there is no match, an HTML "BLOCKED" page is sent back to the browser.
+ Otherwise, if it does match, an image is returned for the former,
+ and an empty text document for the latter. The type of image would
+ depend on the setting of "+set-image-blocker" (blank, checkerboard
+ pattern, or an HTTP redirect to an image elsewhere).
+
+
+
+
Untrusted URLs are blocked. If URLs are being added to the
+ trust file, then that is done.
+
+
+
+
If the URL pattern matches the "+fast-redirects" action, it is then processed.
+ Unwanted parts of the requested URL are stripped.
+
+
+
+
Now the rest of the client browser's request headers are
+ processed. If any of these match any of the relevant actions (e.g.
+ "+hide-user-agent", etc.), headers are
+ suppressed or forged as determined by these actions and their
+ parameters.
+
+
+
+
Now the web server starts sending its response back (i.e.
+ typically a web page).
If any "+filter" action or "+deanimate-gifs" action applies (and the
+ document type fits the action), the rest of the page is read into
+ memory (up to a configurable limit). Then the filter rules (from
+ default.filter and any other filter
+ files) are processed against the buffered content. Filters are
+ applied in the order they are specified in one of the filter files.
+ Animated GIFs, if present, are reduced to either the first or last
+ frame, depending on the action setting.The entire page, which is
+ now filtered, is then sent by Privoxy back to your browser.
+
+
If neither a "+filter" action or "+deanimate-gifs" matches, then Privoxy passes the raw data through to the
+ client browser as it becomes available.
+
+
+
+
As the browser receives the now (possibly filtered) page
+ content, it reads and then requests any URLs that may be embedded
+ within the page source, e.g. ad images, stylesheets, JavaScript,
+ other HTML documents (e.g. frames), sounds, etc. For each of these
+ objects, the browser issues a separate request (this is easily
+ viewable in Privoxy's logs). And
+ each such request is in turn processed just as above. Note that a
+ complex web page will have many, many such embedded URLs. If these
+ secondary requests are to a different server, then quite possibly a
+ very differing set of actions is triggered.
+
+
+
+
NOTE: This is somewhat of a simplistic overview of what happens with
+ each URL request. For the sake of brevity and simplicity, we have
+ focused on Privoxy's core features
+ only.
The way Privoxy applies actions and filters to any given URL can be complex,
+ and not always so easy to understand what is happening. And sometimes
+ we need to be able to see just what Privoxy is doing. Especially, if something
+ Privoxy is doing is causing us a
+ problem inadvertently. It can be a little daunting to look at the
+ actions and filters files themselves, since they tend to be filled with
+ regular expressions whose
+ consequences are not always so obvious.
+
+
One quick test to see if Privoxy is
+ causing a problem or not, is to disable it temporarily. This should be
+ the first troubleshooting step. See the Bookmarklets section on a quick
+ and easy way to do this (be sure to flush caches afterward!). Looking
+ at the logs is a good idea too. (Note that both the toggle feature and
+ logging are enabled via config file settings,
+ and may need to be turned "on".)
+
+
Another easy troubleshooting step to try is if you have done any
+ customization of your installation, revert back to the installed
+ defaults and see if that helps. There are times the developers get
+ complaints about one thing or another, and the problem is more related
+ to a customized configuration issue.
+
+
Privoxy also provides the http://config.privoxy.org/show-url-info page that can show
+ us very specifically how actions are
+ being applied to any given URL. This is a big help for
+ troubleshooting.
+
+
First, enter one URL (or partial URL) at the prompt, and then
+ Privoxy will tell us how the current
+ configuration will handle it. This will not help with filtering effects
+ (i.e. the "+filter" action) from one of the filter files since
+ this is handled very differently and not so easy to trap! It also will
+ not tell you about any other URLs that may be embedded within the URL
+ you are testing. For instance, images such as ads are expressed as URLs
+ within the raw page source of HTML pages. So you will only get info for
+ the actual URL that is pasted into the prompt area -- not any sub-URLs.
+ If you want to know about embedded URLs like ads, you will have to dig
+ those out of the HTML source. Use your browser's "View Page Source" option for this. Or right click on
+ the ad, and grab the URL.
+
+
Let's try an example, google.com, and look at it one section at a time in a sample
+ configuration (your real configuration may vary):
This is telling us how we have defined our
- "actions", and
- which ones match for our test case, "google.com".
- Displayed is all the actions that are available to us. Remember,
- the + sign denotes "on". -
- denotes "off". So some are "on" here, but many
- are "off". Each example we try may provide a slightly different
- end result, depending on our configuration directives.
The first listing
- is for our default.action file. The large, multi-line
- listing, is how the actions are set to match for all URLs, i.e. our default
- settings. If you look at your "actions" file, this would be the
- section just below the "aliases" section near the top. This
- will apply to all URLs as signified by the single forward slash at the end
- of the listing -- " / ".
But we have defined additional actions that would be exceptions to these general
- rules, and then we list specific URLs (or patterns) that these exceptions
- would apply to. Last match wins. Just below this then are two explicit
- matches for ".google.com". The first is negating our previous
- cookie setting, which was for "+session-cookies-only"
- (i.e. not persistent). So we will allow persistent cookies for google, at
- least that is how it is in this example. The second turns
- off any "+fast-redirects"
- action, allowing this to take place unmolested. Note that there is a leading
- dot here -- ".google.com". This will match any hosts and
- sub-domains, in the google.com domain also, such as
- "www.google.com" or "mail.google.com". But it would not
- match "www.google.de"! So, apparently, we have these two actions
- defined as exceptions to the general rules at the top somewhere in the lower
- part of our default.action file, and
- "google.com" is referenced somewhere in these latter sections.
Then, for our user.action file, we again have no hits.
- So there is nothing google-specific that we might have added to our own, local
- configuration. If there was, those actions would over-rule any actions from
- previously processed files, such as default.action.
- user.action typically has the last word. This is the
- best place to put hard and fast exceptions,
And finally we pull it all together in the bottom section and summarize how
- Privoxy is applying all its "actions"
- to "google.com":
Final results:
-
+In file: user.action [ View ][ Edit ]
+(no matches in this file)
+
+
+
+
+
+
This is telling us how we have defined our "actions",
+ and which ones match for our test case, "google.com". Displayed is all the actions that are
+ available to us. Remember, the + sign denotes
+ "on". - denotes
+ "off". So some are "on" here, but many are "off". Each example we try may provide a slightly
+ different end result, depending on our configuration directives.
+
+
The first listing is for our default.action file. The large, multi-line listing, is
+ how the actions are set to match for all URLs, i.e. our default
+ settings. If you look at your "actions"
+ file, this would be the section just below the "aliases" section near the top. This will apply to all
+ URLs as signified by the single forward slash at the end of the listing
+ -- " / ".
+
+
But we have defined additional actions that would be exceptions to
+ these general rules, and then we list specific URLs (or patterns) that
+ these exceptions would apply to. Last match wins. Just below this then
+ are two explicit matches for ".google.com".
+ The first is negating our previous cookie setting, which was for
+ "+session-cookies-only" (i.e. not persistent). So we
+ will allow persistent cookies for google, at least that is how it is in
+ this example. The second turns off any "+fast-redirects" action, allowing this to take
+ place unmolested. Note that there is a leading dot here -- ".google.com". This will match any hosts and
+ sub-domains, in the google.com domain also, such as "www.google.com" or "mail.google.com". But it would not match "www.google.de"! So, apparently, we have these two
+ actions defined as exceptions to the general rules at the top somewhere
+ in the lower part of our default.action file,
+ and "google.com" is referenced somewhere in
+ these latter sections.
+
+
Then, for our user.action file, we again
+ have no hits. So there is nothing google-specific that we might have
+ added to our own, local configuration. If there was, those actions
+ would over-rule any actions from previously processed files, such as
+ default.action. user.action typically has the last word. This is the
+ best place to put hard and fast exceptions,
+
+
And finally we pull it all together in the bottom section and
+ summarize how Privoxy is applying all
+ its "actions" to "google.com":
Notice the only difference here to the previous listing, is to
- "fast-redirects" and "session-cookies-only",
- which are activated specifically for this site in our configuration,
- and thus show in the "Final Results".
Now another example, "ad.doubleclick.net":
{ +block }
+
+
+
+
+
+
Notice the only difference here to the previous listing, is to
+ "fast-redirects" and "session-cookies-only", which are activated specifically
+ for this site in our configuration, and thus show in the "Final Results".
We'll just show the interesting part here - the explicit matches. It is
- matched three different times. Two "+block" sections,
- and a "+block +handle-as-image",
- which is the expanded form of one of our aliases that had been defined as:
- "+block-as-image". ("Aliases" are defined in
- the first section of the actions file and typically used to combine more
- than one action.)
Any one of these would have done the trick and blocked this as an unwanted
- image. This is unnecessarily redundant since the last case effectively
- would also cover the first. No point in taking chances with these guys
- though ;-) Note that if you want an ad or obnoxious
- URL to be invisible, it should be defined as "ad.doubleclick.net"
- is done here -- as both a "+block"
- and an
- "+handle-as-image".
- The custom alias "+block-as-image" just
- simplifies the process and make it more readable.
One last example. Let's try "http://www.example.net/adsl/HOWTO/".
- This one is giving us problems. We are getting a blank page. Hmmm ...
We'll just show the interesting part here - the explicit matches. It
+ is matched three different times. Two "+block{}" sections, and a "+block{}
+ +handle-as-image", which is the expanded form of one of our
+ aliases that had been defined as: "+block-as-image". ("Aliases"
+ are defined in the first section of the actions file and typically used
+ to combine more than one action.)
+
+
Any one of these would have done the trick and blocked this as an
+ unwanted image. This is unnecessarily redundant since the last case
+ effectively would also cover the first. No point in taking chances with
+ these guys though ;-) Note that if you want an ad or obnoxious URL to
+ be invisible, it should be defined as "ad.doubleclick.net" is done here -- as both a "+block{}"
+ and an "+handle-as-image". The custom alias "+block-as-image" just
+ simplifies the process and make it more readable.
+
+
One last example. Let's try "http://www.example.net/adsl/HOWTO/". This one is giving
+ us problems. We are getting a blank page. Hmmm ...
Ooops, the "/adsl/" is matching "/ads" in our
- configuration! But we did not want this at all! Now we see why we get the
- blank page. It is actually triggering two different actions here, and
- the effects are aggregated so that the URL is blocked, and Privoxy is told
- to treat the block as if it were an image. But this is, of course, all wrong.
- We could now add a new action below this (or better in our own
- user.action file) that explicitly
- un blocks (
- "{-block}") paths with
- "adsl" in them (remember, last match in the configuration
- wins). There are various ways to handle such exceptions. Example:
{ -block }
- /adsl
Now the page displays ;-)
- Remember to flush your browser's caches when making these kinds of changes to
- your configuration to insure that you get a freshly delivered page! Or, try
- using Shift+Reload.
But now what about a situation where we get no explicit matches like
- we did with:
{ +block +handle-as-image }
- /ads
That actually was very helpful and pointed us quickly to where the problem
- was. If you don't get this kind of match, then it means one of the default
- rules in the first section of default.action is causing
- the problem. This would require some guesswork, and maybe a little trial and
- error to isolate the offending rule. One likely cause would be one of the
- "+filter" actions.
- These tend to be harder to troubleshoot.
- Try adding the URL for the site to one of aliases that turn off
- "+filter":
Ooops, the "/adsl/" is matching
+ "/ads" in our configuration! But we did not
+ want this at all! Now we see why we get the blank page. It is actually
+ triggering two different actions here, and the effects are aggregated
+ so that the URL is blocked, and Privoxy is told to treat the block as if it were
+ an image. But this is, of course, all wrong. We could now add a new
+ action below this (or better in our own user.action file) that explicitly un blocks ( "{-block}")
+ paths with "adsl" in them (remember, last
+ match in the configuration wins). There are various ways to handle such
+ exceptions. Example:
+
+
+
+
+
+
{ -block }
+ /adsl
+
+
+
+
+
+
Now the page displays ;-) Remember to flush your browser's caches
+ when making these kinds of changes to your configuration to insure that
+ you get a freshly delivered page! Or, try using Shift+Reload.
+
+
But now what about a situation where we get no explicit matches like
+ we did with:
That actually was very helpful and pointed us quickly to where the
+ problem was. If you don't get this kind of match, then it means one of
+ the default rules in the first section of default.action is causing the problem. This would
+ require some guesswork, and maybe a little trial and error to isolate
+ the offending rule. One likely cause would be one of the "+filter"
+ actions. These tend to be harder to troubleshoot. Try adding the URL
+ for the site to one of aliases that turn off "+filter":
"{ shop }" is an "alias" that expands to
- "{ -filter -session-cookies-only }".
- Or you could do your own exception to negate filtering:
{ -filter }
+ .forbes.com
+
+
+
+
+
+
"{ shop }" is an
+ "alias" that expands to "{ -filter -session-cookies-only
+ }". Or you could do your own exception to negate
+ filtering:
+
+
+
+
+
+
{ -filter }
# Disable ALL filter actions for sites in this section
.forbes.com
developer.ibm.com
- localhost
This would turn off all filtering for these sites. This is best
- put in user.action, for local site
- exceptions. Note that when a simple domain pattern is used by itself (without
- the subsequent path portion), all sub-pages within that domain are included
- automatcially in the scope of the action.
Images that are inexplicably being blocked, may well be hitting the
-"+filter{banners-by-size}"
- rule, which assumes
- that images of certain sizes are ad banners (works well
- most of the time since these tend to be standardized).
"{ fragile }" is an alias that disables most
- actions that are the most likely to cause trouble. This can be used as a
- last resort for problem sites.
{ fragile }
+ localhost
+
+
+
+
+
+
This would turn off all filtering for these sites. This is best put
+ in user.action, for local site exceptions.
+ Note that when a simple domain pattern is used by itself (without the
+ subsequent path portion), all sub-pages within that domain are included
+ automatically in the scope of the action.
+
+
Images that are inexplicably being blocked, may well be hitting the
+ "+filter{banners-by-size}" rule, which assumes that
+ images of certain sizes are ad banners (works well most of the time since these
+ tend to be standardized).
+
+
"{ fragile }" is
+ an alias that disables most actions that are the most likely to cause
+ trouble. This can be used as a last resort for problem sites.
+
+
+
+
+
+
{ fragile }
# Handle with care: easy to break
mail.google.
- mybank.example.com
Remember to flush caches! Note that the
- mail.google reference lacks the TLD portion (e.g.
- ".com". This will effectively match any TLD with
- google in it, such as mail.google.de,
- just as an example.
- If this still does not work, you will have to go through the remaining
- actions one by one to find which one(s) is causing the problem.
\ No newline at end of file
+ mybank.example.com
+
+
+
+
+
+
Remember to flush
+ caches! Note that the mail.google
+ reference lacks the TLD portion (e.g. ".com"). This will effectively match any TLD with
+ google in it, such as mail.google.de., just as an example.
+
+
If this still does not work, you will have to go through the
+ remaining actions one by one to find which one(s) is causing the
+ problem.