From 5dcee0c0f1752475759b5b6acb8b7d65b4180fb1 Mon Sep 17 00:00:00 2001 From: oes Date: Wed, 17 Apr 2002 18:04:16 +0000 Subject: [PATCH] Proofreading part 2 --- doc/source/user-manual.sgml | 297 ++++++++++++++++++++++++------------ 1 file changed, 198 insertions(+), 99 deletions(-) diff --git a/doc/source/user-manual.sgml b/doc/source/user-manual.sgml index bc0b046b..02f985dd 100644 --- a/doc/source/user-manual.sgml +++ b/doc/source/user-manual.sgml @@ -24,7 +24,7 @@ This file belongs into ijbswa.sourceforge.net:/home/groups/i/ij/ijbswa/htdocs/ - $Id: user-manual.sgml,v 1.76 2002/04/16 04:25:51 hal9 Exp $ + $Id: user-manual.sgml,v 1.77 2002/04/17 13:51:23 oes Exp $ Written by and Copyright (C) 2001 the SourceForge Privoxy team. http://www.privoxy.org/ @@ -45,7 +45,7 @@ Privoxy User Manual -$Id: user-manual.sgml,v 1.76 2002/04/16 04:25:51 hal9 Exp $ +$Id: user-manual.sgml,v 1.77 2002/04/17 13:51:23 oes Exp $ @@ -2354,91 +2354,148 @@ Removed references to Win32. HB 09/23/01 The Actions File - The default.action file (formerly + The actions file (default.action, formerly: actionsfile or ijb.action) is used - to define what actions Privoxy takes, and thus - determines how ad images, cookies and various other aspects of HTTP content - and transactions are handled. These can be accepted or rejected for all - sites, or just those sites you choose. See below for a complete list of - actions. + to define what actions Privoxy takes for which + URLs, and thus determines how ad images, cookies and various other aspects + of HTTP content and transactions are handled on which sites (or even parts + thereof). + Anything you want can blocked, including ads, banners, or just some obnoxious URL that you would rather not see. Cookies can be accepted or rejected, or - accepted only during the current browser session (i.e. not written to disk). - Changes to default.action should be immediately visible - to Privoxy without the need to restart. + accepted only during the current browser session (i.e. not written to disk), + content can be modified, JavaScripts tamed, user-tracking fooled, and much more. + See below for a complete list of available actions. + + +Finding the Right Mix - Note that some sites may misbehave, or possibly not work at all with some - actions. This may require some tinkering with the rules to get the most - mileage of Privoxy's features, and still be - able to see and enjoy just what you want to. There is no general rule of - thumb on these things. There just are too many variables, and sites are - always changing. - + Note that some actions like cookie suppression or script disabling may + render some sites unusable, which rely on these techniques to work properly. + Finding the right mix of actions is not easy and sure a matter of personal + taste. In general, it can be said that the more aggressive + your default settings (in the top section of the actions file) are, + the more exceptions for trustes sites you will have to + make later. If, for example, you want to kill popup windows per default, you'll + have to make exceptions from that rule for sites that you regularly use + and that require popups for actually useful content, like maybe your bank, + favourite shop, or newspaper. - The easiest way to edit the actions file is with a browser by - loading http://config.privoxy.org/ - (shortcut: http://p.p/), and then select - Edit Actions List. A text editor can also be used. + We have tried to provide you with reasonable rules to start from in the + distribution actions file. But there is no general rule of thumb on these + things. There just are too many variables, and sites are constantly changing. + Sooner or later you will want to change the rules (and read this chapter). + + + +How to Edit - To determine which actions apply to a request, the URL of the request is - compared to all patterns in this file. Every time it matches, the list of - applicable actions for the URL is incrementally updated. You can trace - this process by visiting http://p.p/show-url-info. + The easiest way to edit the actions file is with a browser by + using our browser-based editor, which is available at http://config.privoxy.org/edit-actions. - - There are four types of lines in this file: comments (begin with a - # character), actions, aliases and patterns, all of which are - explained below, as well as the configuration file syntax that - Privoxy understands. - + If you prefer plain text editing to GUIs, you can of course also directly edit the + default.action file. + - -URL Domain and Path Syntax +How Actions are Applied to URLs - Generally, a pattern has the form <domain>/<path>, where both the - <domain> and <path> part are optional. If you only specify a - domain part, the / can be left out: + The actions file is separated into sections. There are special sections, + like the alias sections which will be discussed later. For now let's + concentrate on regular sectiions: They have a heading line (often split + up to multiple lines for readability) which consist of a list of actions, + separated by whitespace and enclosed in curly braces. Below that, there + is a list of URL patterns, each on a separate line. - www.example.com - is a domain only pattern and will match any request to - www.example.com. + To determine which actions apply to a request, the URL of the request is + compared to all patterns in this file. Every time it matches, the list of + applicable actions for the URL is incrementally updated, using the heading + of the section in which the pattern is located. If multiple matches for + the same URL set the same action differently, the last match wins. - www.example.com/ - means exactly the same. + You can trace this process by visiting http://config.privoxy.org/show-url-info. - www.example.com/index.html - matches only the single - document /index.html on www.example.com. + More detail on this is provided in the Appendix Anatomy of an Action. + + + +Patterns - /index.html - matches the document /index.html, - regardless of the domain. So would match any page named index.html - on any site. + Generally, a pattern has the form <domain>/<path>, where both the + <domain> and <path> part are optional. If you only specify a + domain part, the / can be left out: - - index.html - matches nothing, since it would be - interpreted as a domain name and there is no top-level domain called - .html. - + + + www.example.com + + + is a domain only pattern and will match any request to www.example.com, + regardless of which document on that server is requested. + + + + + www.example.com/ + + + means exactly the same. + + + + + www.example.com/index.html + + + matches only the single document /index.html + on www.example.com. + + + + + /index.html + + + matches the document /index.html, regardless of the domain, + i.e. on any web server. + + + + + index.html + + + matches nothing, since it would be interpreted as a domain name and + there is no top-level domain called .html. + + + + + +The Domain Pattern The matching of the domain part offers some flexible options: if the @@ -2446,79 +2503,118 @@ Removed references to Win32. HB 09/23/01 For example: - - .example.com - matches any domain or sub-domain that - ENDS in .example.com. - - - - www. - matches any domain that STARTS with - www. - + + + .example.com + + + matches any domain that ENDS in + .example.com + + + + + www. + + + matches any domain that STARTS with + www. + + + + + .example. + + + matches any domain that CONTAINS .example. + (Correctly speaking: It matches any FQDN that contains example as a domain.) + + + + Additionally, there are wild-cards that you can use in the domain names themselves. They work pretty similar to shell wild-cards: * stands for zero or more arbitrary characters, ? stands for - any single character. And you can define character classes in square - brackets and they can be freely mixed: + any single character, you can define character classes in square + brackets and all of that can be freely mixed: - - ad*.example.com - matches adserver.example.com, - ads.example.com, etc but not sfads.example.com. - + + + ad*.example.com + + + matches adserver.example.com, + ads.example.com, etc but not sfads.example.com + + + + + *ad*.example.com + + + matches all of the above, and then some. + + + + + .?pix.com + + + matches www.ipix.com, + pictures.epix.com, a.b.c.d.e.upix.com etc. + + + + + www[1-9a-ez].example.c* + + + matches www1.example.com, + www4.example.cc, wwwd.example.cy, + wwwz.example.com etc., but not + wwww.example.com. + + + + - - *ad*.example.com - matches all of the above, and then some. - + - - .?pix.com - matches www.ipix.com, - pictures.epix.com, a.b.c.d.e.upix.com, etc. - +The Path Pattern - www[1-9a-ez].example.com - matches www1.example.com, - www4.example.com, wwwd.example.com, - wwwz.example.com, etc., but not - wwww.example.com. + Privoxy uses Perl compatible regular expressions + (through the PCRE library) for + matching the path. - If Privoxy was compiled with - pcre support (the default), Perl compatible regular expressions - can be used. These are more flexible and powerful than other types - of regular expressions. See the pcre/docs/ directory or man - perlre (also available on http://www.perldoc.com/perl5.6/pod/perlre.html) - for details. A brief discussion of regular expressions is in the - Appendix. For instance: + There is an Appendix with a brief quickstart into regular + expressions, and full (very technical) documentation on PCRE regex syntax is available online + at http://www.pcre.org/man.txt. + You might also find the Perl man page on regular expressions (man perlre) + useful, which is available online at http://www.perldoc.com/perl5.6/pod/perlre.html. - /.*/advert[0-9]+\.jpe?g - would match a URL from any - domain, with any path that includes advert followed - immediately by one or more digits, then a . and ending in - either jpeg or jpg. So we match - example.com/ads/advert2.jpg, and - www.example.com/ads/banners/advert39.jpeg, but not - www.example.com/ads/banners/advert39.gif (no gifs in the - example pattern). + Note that the pattern is automatically left-anchored at the /, + i.e. it matches as if it would start with a ^. - Please note that matching in the path is case + Please also note that matching in the path is case INSENSITIVE by default, but you can switch to case sensitive at any point in the pattern by using the (?-i) switch: - - - - www.example.com/(?-i)PaTtErN.* - will match only - documents whose path starts with PaTtErN in + www.example.com/(?-i)PaTtErN.* will match only + documents whose path starts with PaTtErN in exactly this capitalization. + @@ -4338,6 +4434,9 @@ Requests Temple Place - Suite 330, Boston, MA 02111-1307, USA. $Log: user-manual.sgml,v $ + Revision 1.77 2002/04/17 13:51:23 oes + Proofreading, part one + Revision 1.76 2002/04/16 04:25:51 hal9 -Added 'Note to Upgraders' and re-ordered the 'Quickstart' section. -Note about proxy may need requests to re-read config files. -- 2.39.2