1 Internet Junkbuster User's Manual
3 By: Junkbuster Developers 10/15/01
7 The user manual gives the users information on how to install and configure
8 Internet Junkbuster. Internet Junkbuster is an application that provides
9 privacy and security to users of the World Wide Web.
12 You can find the latest version of the
13 user manual at http://ijbswa.sourceforge.net/doc/user-manual/
15 Feel free to send a note to the developers at
16 ijbswa-developers@lists.sourceforge.net
18 -----------------------------------------------------------
23 Junkbuster Configuration
24 Quickstart to Using Junkbuster
25 Contact the Developers
32 Internet Junkbuster is a web proxy with advanced filtering capabilities for
33 protecting privacy, filtering web page content, managing cookies, controlling
34 access, and removing ads, banners, pop-ups and other obnoxious Internet Junk.
35 Junkbuster has a very flexible configuration and can be customized to suit
36 individual needs and tastes. Internet Junkbuster has application for both
37 stand-alone systems and multi-user networks.
39 This documentation is included with the current development version of Internet
40 Junkbuster and is incomplete at this point. The most up to date reference for
41 the time being is still the comments in the source files and in the individual
42 configuration files. Development of version 3.0 is currently underway, and
43 includes many significant changes and enhancements over earlier verions. The
44 target release date for stable v3.0 is December 2001.
46 Since this is a development version, some features are in the process of being
47 implemented. This documentation may be slightly out of sync as a result. And
48 there are bugs, though hopefully not many!
50 -------------------------------------------------------------------------------
54 In addition to Junkbuster's traditional features of ad and banner blocking and
55 cookie management, this is a list of new features currently under development:
57 * Modularized configuration that will allow for system wide settings, and
58 individual user settings.
60 * A browser based GUI configuration utility (not finished).
62 * Blocking of annoying pop-up browser windows (previously available as a
65 * Partial support for HTTP/1.1.
67 * Support for Perl Compatible Regular Expressions in the configuration files,
68 and generally a more sophisticated configuration syntax over previous
71 * Web page content filtering.
75 -------------------------------------------------------------------------------
79 Junkbuster is available as raw source code, or pre-compiled binaries. See the
80 Junkbuster Home Page for current release info. Junkbuster is also available via
81 CVS. This is the recommended approach at this time. But please be aware that
82 CVS is constantly changing, and it may break in mysterious ways.
84 -------------------------------------------------------------------------------
88 For gzipped tar archives, unpack the source:
90 tar zxvf ijb_source_2.9*
94 For retrieving the current CVS sources, you'll need the CVS package installed
95 first. To download CVS source:
97 cvs -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa login
98 cvs -z3 -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa co current
102 This will create a directory named current/, which will contain the source
105 Then, in either case, to build from source:
113 For Redhat and SuSE Linux RPM packages, see below.
115 -------------------------------------------------------------------------------
119 To build Redhat RPM packages, install source as above. Then:
125 This will create both binary and src RPMs in the usual places. Example:
127 /usr/src/redhat/RPMS/i686/junkbuster-2.9.8-1.i686.rpm
129 /usr/src/redhat/SRPMS/junkbuster-2.9.9-1.src.rpm
131 To install, of course:
133 rpm -Uvv /usr/src/redhat/RPMS/i686/junkbuster-2.9.9-1.i686.rpm
136 This will place the Junkbuster configuration files in /etc/junkbuster/, and log
137 files in /var/log/junkbuster/.
139 -------------------------------------------------------------------------------
143 To build SuSE RPM packages, install source as above. Then:
149 This will create both binary and src RPMs in the usual places. Example:
151 /usr/src/suse/RPMS/i686/junkbuster-2.9.9-1.i686.rpm
153 /usr/src/suse/SRPMS/junkbuster-2.9.9-1.src.rpm
155 To install, of course:
157 rpm -Uvv /usr/src/suse/RPMS/i686/junkbuster-2.9.9-1.i686.rpm
160 This will place the Junkbuster configuration files in /etc/junkbuster/, and log
161 files in /var/log/junkbuster/.
163 -------------------------------------------------------------------------------
167 The OS/2 version of Junkbuster requires the EMX runtime library to be
168 installed. The EMX runtime library is available on the hobbes OS/2 archive,
169 among many other locations: http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&button
170 =Search&key=emxrt.zip&stype=all&sort=type&dir=%2Fpub%2Fos2%2Fdev%2Femx%2Fv0.9d
172 Junkbuster is packaged in a WarpIN self- installing archive. The
173 self-installing program will be named depending on the release version,
174 something like: ijbos123.exe. In order to install it, simply run this
175 executable or double-click on its icon and follow the WarpIN installation
176 panels. A shadow of the Junkbuster executable will be placed in your startup
177 folder so it will start automatically whenever OS/2 starts.
179 The directory you choose to install Junkbuster into will contain all of the
182 If you would like to build binary images on OS/2 yourself, you will need a
183 working EMX/GCC environment, plus several Unix-like tools. The Hobbes OS/2
184 archive is a good place to start when building such an environment. A set of
185 Unix-like tools named gnupack is located here: http://hobbes.nmsu.edu/cgi-bin/
186 h-search?sh=1&key=gnupack&stype=all&sort=type&dir=%2Fpub%2Fos2%2Fapps
188 Once you have the source code unpacked as above, you can build the binaries
189 from the current/ directory:
196 -------------------------------------------------------------------------------
200 Click-click. (I need help on this. Not a clue here. Also for configuration
203 -------------------------------------------------------------------------------
207 Some quick notes on other Operating Systems.
209 For FreeBSD (and other *BSDs?), the build will need gmake instead of the
210 included make. gmake is available from http://www.gnu.org. The rest should be
211 the same as above for Linux/Unix.
213 -------------------------------------------------------------------------------
215 Junkbuster Configuration
217 For Unix, *BSD and Linux, all configuraton files are located in /etc/junkbuster
218 / by default. For MS Windows and OS/2, these are all in the same directory as
219 the Junkbuster executable. The name and number of configuration files has
220 changed from previous versions, and is subject to change as development
223 The installed defaults provide a reasonable starting point. For the time being,
224 there are only three default configuration files (this will change in time):
226 * The main configuration file is named config on Linux, Unix, BSD, and OS/2,
227 and junkbustr.txt on Windows. On Amiga, it is AmiTCP:db/junkbuster/config.
229 * The actionsfile file is used to define various actions relating to images,
230 banners, pop-ups, banners and cookies.
232 * The re_filterfile file can be used to rewrite the raw page content,
233 including text as well as embedded HTML and JavaScript.
235 actionsfile and re_filterfile can use Perl style regular expressions for
236 maximum flexibility. All files use the "#" character to denote a comment. Such
237 lines are not processed by Junkbuster. After making any changes, restart
238 Junkbuster in order for the changes to take effect.
240 -------------------------------------------------------------------------------
242 The Main Configuration File
244 Again, the main configuration file is named config on Linux/Unix/BSD and OS/2,
245 and junkbustr.txt on Windows. Configuration lines consist of an initial keyword
246 followed by a list of values, all separated by whitespace (any number of spaces
247 or tabs). For example:
249 blockfile blocklist.ini
252 Indicates that the blockfile is named "blocklist.ini".
254 The "#" indicates a comment. Any part of a line following a "#" is ignored,
255 except if the "#" is preceded by a "\".
257 Thus, by placing a "#" at the start of an existing configuration line, you can
258 make it a comment and it will be treated as if it weren't there. This is called
259 "commenting out" an option and can be useful to turn off features: If you
260 comment out the "logfile" line, junkbuster will not log to a file at all. Watch
261 for the "default:" section in each explanation to see what happens if the
262 option is left unset (or commented out).
264 Long lines can be continued on the next line by using a "\" as the very last
267 There are various aspects of Junkbuster behavior that can be adjusted.
269 -------------------------------------------------------------------------------
271 Defining Other Configuration Files
273 Junkbuster can use a number of other files to tell it what ads to block, what
274 cookies to accept, etc. This section of the configuration file tells Junkbuster
275 where to find all those other files.
277 On Windows, Junkbuster looks for these files in the same directory as the
278 executable. On Unix and OS/2, Junkbuster looks for these files in the current
279 working directory. In either case, an absolute path name can be used to avoid
282 When development goes modular and multiuser, the blocker, filter, and per-user
283 config will be stored in subdirectories of "confdir". For now, only confdir/
284 templates is used for storing HTML templates for CGI results.
286 The location of the configuration files:
288 confdir /etc/junkbuster # No trailing /, please.
291 The directory where all logging (i.e. logfile and jarfile) takes place. No
292 trailing "/", please:
294 logdir /var/log/junkbuster
297 Note that all file specifications below are relative to the above two
300 The "actionsfile" contains patterns to specify the actions to apply to requests
301 for each site. Default: Cookies to and from all destinations are filtered.
302 Popups are disabled for all sites. All sites are filtered if re_filterfile
303 specified. No sites are blocked. An empty image is displayed for filtered ads
304 and other images (formerly "tinygif"). The syntax of this file is explained in
307 actionsfile actionsfile
310 The "re_filterfile" file contains content modification rules. These rules
311 permit powerful changes on the content of Web pages, e.g., you could disable
312 your favourite JavaScript annoyances, rewrite the actual content, or just have
313 some fun replacing "Microsoft" with "MicroSuck" wherever it appears on a Web
314 page. Default: No content modification, or whatever the developers are playing
317 re_filterfile re_filterfile
320 The logfile is where all logging and error messages are written. The logfile
321 can be useful for tracking down a problem with Junkbuster (e.g., it's not
322 blocking an ad you think it should block) but in most cases you probably will
325 Your logfile will grow indefinitely, and you will probably want to periodically
326 remove it. On Unix systems, you can do this with a cron job (see "man cron").
327 For Redhat, a logrotate script has been included.
329 On SuSE Linux systems, you can place a line like "/var/log/junkbuster.* +1024k
330 644 nobody.nogroup" in /etc/logfiles, with the effect that cron.daily will
331 automatically archive, gzip, and empty the log, when it exceeds 1M size.
333 Default: Log to the a file named logfile. Comment out to disable logging.
338 The "jarfile" defines where Junkbuster stores the cookies it intercepts. Note
339 that if you use a "jarfile", it may grow quite large. Default: Don't store
345 If you specify a "trustfile", Junkbuster will only allow access to sites that
346 are named in the trustfile. You can also mark sites as trusted referrers, with
347 the effect that access to untrusted sites will be granted, if a link from a
348 trusted referrer was used. The link target will then be added to the
349 "trustfile". This is a very restrictive feature that typical users most
350 propably want to leave disabled. Default: Disabled, don't use the trust
356 If you use the trust mechanism, it is a good idea to write up some online
357 documentation about your blocking policy and to specify the URL(s) here. They
358 will appear on the page that your users receive when they try to access
359 untrusted content. Use multiple times for multiple URLs. Default: Don't display
360 links on the "untrusted" info page.
362 trust-info-url http://www.your-site.com/why_we_block.html
363 trust-info-url http://www.your-site.com/what_we_allow.html
366 -------------------------------------------------------------------------------
368 Other Configuration Options
370 This part of the configuration file contains options that control how
373 "Admin-address" should be set to the email address of the proxy administrator.
374 It is used in many of the proxy-generated pages. Default: fill@me.in.please.
376 #admin-address fill@me.in.please
379 "Proxy-info-url" can be set to a URL that contains more info about this
380 Junkbuster installation, it's configuration and policies. It is used in many of
381 the proxy-generated pages and its use is highly recommended in multi-user
382 installations, since your users will want to know why certain content is
383 blocked or modified. Default: Don't show a link to online documentation.
385 proxy-info-url http://www.your-site.com/proxy.html
388 "Listen-address" specifies the address and port where Junkbuster will listen
389 for connections from your Web browser. The default is to listen on the
390 localhost port 8000, and this is suitable for most users. (In your web browser,
391 under proxy configuration, list the proxy server as "localhost" and the port as
394 If you already have another service running on port 8000, or if you want to
395 serve requests from other machines (e.g. on your local network) as well, you
396 will need to override the default. The syntax is "listen-address
397 [<ip-address>]:<port>". If you leave out the IP adress, junkbuster will bind to
398 all interfaces (addresses) on your machine and may become reachable from the
399 internet. In that case, consider using access control lists (acl's) (see
402 For example, suppose you are running Junkbuster on a machine which has the
403 address 192.168.0.1 on your local private network (192.168.0.0) and has another
404 outside connection with a different address. You want it to serve requests from
407 listen-address 192.168.0.1:8000
410 If you want it to listen on all addresses (including the outside connection):
415 If you do this, consider using ACLs (see "aclfile" above). Note: you will need
416 to point your browser(s) to the address and port that you have configured here.
417 Default: localhost:8000 (127.0.0.1:8000).
419 The debug option sets the level of debugging information to log in the logfile
420 (and to the console in the Windows version). A debug level of 1 is informative
421 because it will show you each request as it happens. Higher levels of debug are
422 probably only of interest to developers.
424 debug 1 # GPC = show each GET/POST/CONNECT request
425 debug 2 # CONN = show each connection status
426 debug 4 # IO = show I/O status
427 debug 8 # HDR = show header parsing
428 debug 16 # LOG = log all data into the logfile
429 debug 32 # FRC = debug force feature
430 debug 64 # REF = debug regular expression filter
431 debug 128 # = debug fast redirects
432 debug 256 # = debug GIF deanimation
433 debug 512 # CLF = Common Log Format
434 debug 1024 # = debug kill popups
435 debug 4096 # INFO = Startup banner and warnings.
436 debug 8192 # ERROR = Non-fatal errors
439 It is highly recommended that you enable ERROR reporting (debug 8192), at least
440 until the next stable release.
442 The reporting of FATAL errors (i.e. ones which crash JunkBuster) is always on
443 and cannot be disabled.
445 If you want to use CLF (Common Log Format), you should set "debug 512" ONLY, do
446 not enable anything else.
448 Multiple "debug" directives, are OK - they're logical-OR'd together.
450 debug 15 # same as setting the first 4 listed above
457 debug 8192 # Errors - *we highly recommended enabling this*
460 Junkbuster normally uses "multi-threading", a software technique that permits
461 it to handle many different requests simultaneously. In some cases you may wish
462 to disable this -- particularly if you're trying to debug a problem. The
463 "single-threaded" option forces Junkbuster to handle requests sequentially.
464 Default: Multi-threaded mode.
469 "toggle" allows you to temporarily disable all Junkbuster's filtering. Just set
472 The Windows version of Junkbuster puts an icon in the system tray, which allows
473 you to change this option without having to edit this file. If you right-click
474 on that icon (or select the "Options" menu), one choice is "Enable". Clicking
475 on enable toggles Junkbuster on and off. This is useful if you want to
476 temporarily disable Junkbuster, e.g., to access a site that requires cookies
477 which you normally have blocked.
479 "toggle 1" means Junkbuster runs normally, "toggle 0" means that Junkbuster
480 becomes a non-anonymizing non-blocking proxy. Default: 1.
485 -------------------------------------------------------------------------------
487 Access Control List (ACL)
489 Access controls are included at the request of some ISPs and systems
490 administrators, and are not usually needed by individual users. Please note the
491 warnings in the FAQ that this proxy is not intended to be a substitute for a
492 firewall or to encourage anyone to defer addressing basic security weaknesses.
494 If no access settings are specified, the proxy talks to anyone that connects.
495 If any access settings file are specified, then the proxy talks only to IP
496 addresses permitted somewhere in this file and not denied later in this file.
498 Summary -- if using an ACL:
500 Client must have permission to receive service.
502 LAST match in ACL wins.
504 Default behavior is to deny service.
506 The syntax for an entry in the Access Control List is:
508 ACTION SRC_ADDR[/SRC_MASKLEN] [ DST_ADDR[/DST_MASKLEN] ]
511 Where the individual fields are:
513 ACTION = "permit-access" or "deny-access"
515 SRC_ADDR = client hostname or dotted IP address
516 SRC_MASKLEN = number of bits in the subnet mask for the source
518 DST_ADDR = server or forwarder hostname or dotted IP address
519 DST_MASKLEN = number of bits in the subnet mask for the target
522 The field separator (FS) is whitespace (space or tab).
524 IMPORTANT NOTE: If the junkbuster is using a forwarder (see below) or a gateway
525 for a particular destination URL, the DST_ADDR that is examined is the address
526 of the forwarder or the gateway and NOT the address of the ultimate target.
527 This is necessary because it may be impossible for the local Junkbuster to
528 determine the address of the ultimate target (that's often what gateways are
531 Here are a few examples to show how the ACL features work:
533 "localhost" is OK -- no DST_ADDR implies that ALL destination addresses are OK:
535 permit-access localhost
538 A silly example to illustrate permitting any host on the class-C subnet with
539 Junkbuster to go anywhere:
541 permit-access www.junkbusters.com/24
544 Except deny one particular IP address from using it at all:
546 deny-access ident.junkbusters.com
549 You can also specify an explicit network address and subnet mask. Explicit
550 addresses do not have to be resolved to be used.
552 permit-access 207.153.200.0/24
555 A subnet mask of 0 matches anything, so the next line permits everyone.
557 permit-access 0.0.0.0/0
560 Note, you cannot say:
565 to allow all *.org domains. Every IP address listed must resolve fully.
567 An ISP may want to provide a Junkbuster that is accessible by "the world" and
568 yet restrict use of some of their private content to hosts on its internal
569 network (i.e. its own subscribers). Say, for instance the ISP owns the Class-B
570 IP address block 123.124.0.0 (a 16 bit netmask). This is how they could do it:
572 permit-access 0.0.0.0/0 0.0.0.0/0 # other clients can go anywhere
573 # with the following exceptions:
575 deny-access 0.0.0.0/0 123.124.0.0/16 # block all external requests for
576 # sites on the ISP's network
578 permit 0.0.0.0/0 www.my_isp.com # except for the ISP's main
581 permit 123.124.0.0/16 0.0.0.0/0 # the ISP's clients can go
585 Note that if some hostnames are listed with multiple IP addresses, the primary
586 value returned by DNS (via gethostbyname()) is used. Default: Anyone can access
589 -------------------------------------------------------------------------------
593 This feature allows chaining of HTTP requests via multiple proxies. It can be
594 used to better protect privacy and confidentiality when accessing specific
595 domains by routing requests to those domains to a special purpose filtering
596 proxy such as lpwa.com.
598 It can also be used in an environment with multiple networks to route requests
599 via multiple gateways allowing transparent access to multiple networks without
600 having to modify browser configurations.
602 Also specified here are SOCKS proxies. Junkbuster SOCKS 4 and SOCKS 4A. The
603 difference is that SOCKS 4A will resolve the target hostname using DNS on the
604 SOCKS server, not our local DNS client.
606 The syntax of each line is:
608 forward target_domain[:port] http_proxy_host[:port]
609 forward-socks4 target_domain[:port] socks_proxy_host[:port] http_proxy_host[:
611 forward-socks4a target_domain[:port] socks_proxy_host[:port] http_proxy_host[:
615 If http_proxy_host is ".", then requests are not forwarded to a HTTP proxy but
616 are made directly to the web servers.
618 Lines are checked in sequence, and the last match wins.
620 There is an implicit line equivalent to the following, which specifies that
621 anything not finding a match on the list is to go out without forwarding or
622 gateway protocol, like so:
624 forward .* . # implicit
627 In the following common configuration, everything goes to Lucent's LPWA, except
628 SSL on port 443 (which it doesn't handle):
630 forward .* lpwa.com:8000
634 See the FAQ for instructions on how to automate the login procedure for LPWA.
635 Some users have reported difficulties related to LPWA's use of "." as the last
636 element of the domain, and have said that this can be fixed with this:
638 forward lpwa. lpwa.com:8000
641 (NOTE: the syntax for specifiying target_domain has changed since the previous
642 paragraph was written -- it will not work now. More information is welcome.)
644 In this fictitious example, everything goes via an ISP's caching proxy, except
645 requests to that ISP:
647 forward .* caching.myisp.net:8000
651 For the @home network, we're told the forwarding configuration is this:
653 forward .* proxy:8080
656 Also, we're told they insist on getting cookies and JavaScript, so you need to
657 add home.com to the cookie file. We consider JavaScript a security risk. Java
660 In this example direct connections are made to all "internal" domains, but
661 everything else goes through Lucent's LPWA by way of the company's SOCKS
662 gateway to the Internet.
664 forward_socks4 .* lpwa.com:8000 firewall.my_company.com:1080
665 forward my_company.com .
668 This is how you could set up a site that always uses SOCKS but no forwarders:
670 forward_socks4a .* . firewall.my_company.com:1080
673 An advanced example for network administrators:
675 If you have links to multiple ISPs that provide various special content to
676 their subscribers, you can configure forwarding to pass requests to the
677 specific host that's connected to that ISP so that everybody can see all of the
678 content on all of the ISPs.
680 This is a bit tricky, but here's an example:
682 host-a has a PPP connection to isp-a.com. And host-b has a PPP connection to
683 isp-b.com. host-a can run a Junkbuster proxy with forwarding like this:
686 forward isp-b.com host-b:8000
689 host-b can run a Junkbuster proxy with forwarding like this:
692 forward isp-a.com host-a:8000
695 Now, anyone on the Internet (including users on host-a and host-b) can set
696 their browser's proxy to either host-a or host-b and be able to browse the
697 content on isp-a or isp-b.
699 Here's another practical example, for University of Kent at Canterbury students
700 with a network connection in their room, who need to use the University's Squid
703 forward *. ssbcache.ukc.ac.uk:3128 # Use the proxy, except for:
704 forward .ukc.ac.uk . # Anything on the same domain as us
705 forward * . # Host with no domain specified
706 forward 129.12.*.* . # A dotted IP on our /16 network.
707 forward 127.*.*.* . # Loopback address
708 forward localhost.localdomain . # Loopback address
709 forward www.ukc.mirror.ac.uk . # Specific host
712 If you intend to chain Junkbuster and squid locally, then chain as browser ->
713 squid -> junkbuster is the recommended way.
715 Your squid configuration could then look like this:
717 # Define junkbuster as parent cache
719 cache_peer 127.0.0.1 parent 8000 0 no-query
721 # Define ACL for protocol FTP
724 # Do not forward ACL FTP to junkbuster
725 always_direct allow FTP
727 # Do not forward ACL CONNECT (https) to junkbuster
728 always_direct allow CONNECT
730 # Forward the rest to junkbuster
731 never_direct allow all
734 -------------------------------------------------------------------------------
738 Junkbuster has a number of options specific to the Windows GUI interface:
740 If "activity-animation" is set to 1, the Junkbuster icon will animate when
741 "Junkbuster" is active. To turn off, set to 0.
746 If "log-messages" is set to 1, Junkbuster will log messages to the console
752 If "log-buffer-size" is set to 1, the size of the log buffer, i.e. the amount
753 of memory used for the log messages displayed in the console window, will be
754 limited to "log-max-lines" (see below).
756 Warning: Setting this to 0 will result in the buffer to grow infinitely and eat
762 log-max-lines is the maximum number of lines held in the log buffer. See above.
767 If "log-highlight-messages" is set to 1, Junkbuster will highlight portions of
768 the log messages with a bold-faced font:
770 log-highlight-messages 1
773 The font used in the console window:
775 log-font-name Comic Sans MS
778 Font size used in the console window:
783 "show-on-task-bar" controls whether or not Junkbuster will appear as a button
784 on the Task bar when minimized:
789 If "close-button-minimizes" is set to 1, the Windows close button will minimize
790 Junkbuster instead of closing the program (close with the exit option on the
793 close-button-minimizes 1
796 The "hide-console" option is specific to the MS-Win console version of
797 JunkBuster. If this option is used, Junkbuster will disconnect from and hide
803 -------------------------------------------------------------------------------
807 The "actionsfile" is used to define what actions Junkbuster takes, and thus
808 determines how images, cookies and various other aspects of HTTP content and
809 transactions are handled. Images can be anything you want, including ads,
810 banners, or just some obnoxious image that you would rather not see. Cookies
811 can be accepted or rejected. The default file is in fact named actionsfile.
813 To determine which actions apply to a request, the URL of the request is
814 compared to all patterns in this file. Every time it matches, the list of
815 applicable actions for the URL is incrementally updated. You can trace this
816 process by visiting http://i.j.b/show-url-info.
818 There are four types of lines in this file: comments (begin with a "#"
819 character), actions, aliases and patterns, all of which are explained below.
821 -------------------------------------------------------------------------------
823 URL Domain and Path Syntax
825 Generally, a pattern has the form <domain>/<path>, where both the <domain> and
826 <path> part are optional. If you only specify a domain part, the "/" can be
829 www.example.com - is a domain only pattern and will match any request to
832 www.example.com/ - means exactly the same.
834 www.example.com/index.html - matches only the single document "/index.html" on
837 /index.html - matches the document "/index.html", regardless of the domain.
839 index.html - matches nothing, since it would be interpreted as a domain name
840 and there is no top-level domain called ".html".
842 The matching of the domain part offers some flexible options: if the domain
843 starts or ends with a dot, it becomes unanchored at that end. For example:
845 .example.com - matches any domain that ENDS in ".example.com".
847 www. - matches any domain that STARTS with "www".
849 Additionally, there are wildcards that you can use in the domain names
850 themselves. They work pretty similar to shell wildcards: "*" stands for zero or
851 more arbitrary characters, "?" stands for any single character. And you can
852 define charachter classes in square brackets and they can be freely mixed:
854 ad*.example.com - matches "adserver.example.com", "ads.example.com", etc but
855 not "sfads.example.com".
857 *ad*.example.com - matches all of the above, and then some.
859 .?pix.com - matches "www.ipix.com", "pictures.epix.com", "a.b.c.d.e.upix.com",
862 www[1-9a-ez].example.com - matches "www1.example.com", "www4.example.com",
863 "wwwd.example.com", "wwwz.example.com", etc., but not "wwww.example.com".
865 If Junkbuster was compiled with "pcre" support (default), Perl compatible
866 regular expressions can be used. See the pcre/docs/ direcory or "man perlre"
867 (also available on http://www.perldoc.com/perl5.6/pod/perlre.html) for details.
868 A brief discussion of regular expressions is in the Appendix. For instance:
870 /.*/advert[0-9]+\.jpe?g - would match a URL from any domain, with any path that
871 includes "advert" followed immediately by one or more digits, then a "." and
872 ending in either "jpeg" or "jpg". So we match "example.com/ads/advert2.jpg",
873 and "www.example.com/ads/banners/advert39.jpeg", but not "www.example.com/ads/
874 banners/advert39.gif" (no gifs in the example pattern).
876 Please note that matching in the path is case INSENSITIVE by default, but you
877 can switch to case sensitive at any point in the pattern by using the "(?-i)"
880 www.example.com/(?-i)PaTtErN.* - will match only documents whose path starts
881 with "PaTtErN" in exactly this capitalization.
883 -------------------------------------------------------------------------------
887 Actions are enabled if preceded with a "+", and disabled if preceded with a
888 "-". Actions are invoked by enclosing the action name in curly braces (e.g.
889 {+some_action}), followed by a list of URLs to which the action applies. There
890 are three classes of actions:
892 * Boolean (e.g. "+/-block"):
894 {+name} # enable this action
895 {-name} # disable this action
898 * Parameterized (e.g. "+/-hide-user-agent"):
900 {+name{param}} # enable action and set parameter to "param"
901 {-name} # disable action
904 * Multi-value (e.g. "{+/-add-header{Name: value}}", "{+/-wafer{name=value}}
907 {+name{param}} # enable action and add parameter "param"
908 {-name{param}} # remove the parameter "param"
909 {-name} # disable this action totally
912 If nothing is specified in this file, no "actions" are taken. So in this case
913 JunkBuster would just be a normal, non-blocking, non-anonymizing proxy. You
914 must specifically enable the privacy and blocking features you need (although
915 the provided default actionsfile file will give a good starting point).
917 Later defined actions always over-ride earlier ones. For multi-valued actions,
918 the actions are applied in the order they are specified.
920 The list of valid Junkbuster "actions" are:
922 * Add the specified HTTP header, which is not checked for validity. You may
923 specify this many times to specify many different headers:
925 +add-header{Name: value}
928 * Block this URL totally.
933 * De-animate all animated GIF images, i.e. reduce them to their last frame.
934 This will also shrink the images considerably (in bytes, not pixels!). If
935 the option "first" is given, the first frame of the animation is used as
936 the replacement. If "last" is given, the last frame of the animation is
937 used instead, which propably makes more sense for most banner animations,
938 but also has the risk of not showing the entire last frame (if it is only a
939 delta to an earlier frame).
941 +deanimate-gifs{last}
942 +deanimate-gifs{first}
945 * "+downgrade" will downgrade HTTP/1.1 client requests to HTTP/1.0 and
946 downgrade the responses as well. Use this action for servers that use HTTP/
947 1.1 protocol features that Junkbuster doesn't handle well yet. HTTP/1.1 is
948 only partially implemented. Default is not to downgrade requests.
953 * Many sites, like yahoo.com, don't just link to other sites. Instead, they
954 will link to some script on their own server, giving the destination as a
955 parameter, which will then redirect you to the final target. URLs resulting
956 from this scheme typically look like: http://some.place/some_script?http://
959 Sometimes, there are even multiple consecutive redirects encoded in the
960 URL. These redirections via scripts make your web browing more traceable,
961 since the server from which you follow such a link can see where you go to.
962 Apart from that, valuable bandwidth and time is wasted, while your browser
963 ask the server for one redirect after the other. Plus, it feeds the
966 The "+fast-redirects" option enables interception of these requests by
967 Junkbuster, who will cut off all but the last valid URL in the request and
968 send a local redirect back to your browser without contacting the remote
974 * Filter the website through the re_filterfile:
979 * Block any existing X-Forwarded-for header, and do not add a new one:
984 * If the browser sends a "From:" header containing your e-mail address, this
985 either completely removes the header ("block"), or changes it to the
986 specified e-mail address.
989 +hide-from{spam@sittingduck.xqq}
992 * Don't send the "Referer:" (sic) header to the web site. You can block it,
993 forge a URL to the same server as the request (which is preferred because
994 some sites will not send images otherwise) or set it to a constant string
999 +hide-referer{http://nowhere.com}
1002 * Alternative spelling of "+hide-referer". It has the same parameters, and
1003 can be freely mixed with, "+hide-referer". ("referrer" is the correct
1004 English spelling, however the HTTP specification has a bug - it requires it
1005 to be spelled "referer".)
1010 * Change the "User-Agent:" header so web servers can't tell your browser
1011 type. Warning! This breaks many web sites. Specify the user-agent value you
1012 want. Example, pretend to be using Netscape on Linux:
1014 +hide-user-agent{Mozilla (X11; I; Linux 2.0.32 i586)}
1017 * Treat this URL as an image. This only matters if it's also "+block"ed, in
1018 which case a "blocked" image can be sent rather than a HTML page. See
1019 "+image-blocker{}" below for the control over what is actually sent.
1024 * Decides what to do with URLs that end up tagged with "{+block +image}".
1025 There are 4 options. "-image-blocker" will send a HTML "blocked" page,
1026 usually resulting in a "broken image" icon. "+image-blocker{logo}" will
1027 send a "JunkBuster" image. "+image-blocker{blank}" will send a 1x1
1028 transparent GIF image. And finally, "+image-blocker{http://xyz.com}" will
1029 send a HTTP temporary redirect to the specified image. This has the
1030 advantage of the icon being being cached by the browser, which will speed
1033 +image-blocker{logo}
1034 +image-blocker{blank}
1035 +image-blocker{http://i.j.b/send-banner}
1038 * By default (i.e. in the absence of a "+limit-connect" action), Junkbuster
1039 will only allow CONNECT requests to port 443, which is the standard port
1040 for https as a precaution.
1042 The CONNECT methods exists in HTTP to allow access to secure websites
1043 (https:// URLs) through proxies. It works very simply: the proxy connects
1044 to the server on the specified port, and then short-circuits its
1045 connections to the client and to the remote proxy. This can be a big
1046 security hole, since CONNECT-enabled proxies can be abused as TCP relays
1049 If you want to allow CONNECT for more ports than this, or want to forbid
1050 CONNECT altogether, you can specify a comma separated list of ports and
1051 port ranges (the latter using dashes, with the minimum defaulting to 0 and
1054 +limit-connect{443} # This is the default and need no be specified.
1055 +limit-connect{80,443} # Ports 80 and 443 are OK.
1056 +limit-connect{-3, 7, 20-100, 500-} # Port less than 3, 7, 20 to 100
1057 #and above 500 are OK.
1060 * "+no-compression" prevents the website from compressing the data. Some
1061 websites do this, which can be a problem for Junkbuster, since "+filter",
1062 "+no-popup" and "+gif-deanimate" will not work on compressed data. This
1063 will slow down connections to those websites, though. Default is
1064 "nocompression" is turned on.
1069 * Prevent the website from reading cookies:
1074 * Prevent the website from setting cookies:
1079 * Filter the website through a built-in filter to disable those obnoxious
1080 JavaScript pop-up windows via window.open(), etc. The two alternative
1081 spellings are equivalent.
1087 * This action only applies if you are using a jarfile for saving cookies. It
1088 sends a cookie to every site stating that you do not accept any copyright
1089 on cookies sent to you, and asking them not to track you. Of course, this
1090 is a (relatively) unique header they could use to track you.
1095 * This allows you to add an arbitrary cookie. It can be specified multiple
1096 times in order to add as many cookies as you like.
1101 The meaning of any of the above is reversed by preceding the action with a "-",
1102 in place of the "+".
1106 Turn off cookies by default, then allow a few through for specified sites:
1108 # Turn off all cookies
1109 { +no-cookies-read }
1112 # Execeptions to the above, sites that need cookies
1113 { -no-cookies-read }
1121 # Alternative way of saying the same thing
1122 {-no-cookies-set -no-cookies-read}
1127 Now turn off "fast redirects", and then we allow two exceptions:
1132 # Reverse it for these two sites, which don't work right without it.
1134 www.ukc.ac.uk/cgi-bin/wac\.cgi\?
1138 Turn on page filtering, with one exception for sourceforge:
1140 # Run everything through the default filter file (re_filterfile):
1143 # But please don't re_filter code from sourceforge!
1145 .cvs.sourceforge.net
1148 Now some URLs that we want "blocked", ie we won't see them. Many of these use
1149 regular expressions that will expand to match multiple URLs:
1153 /.*/(.*[-_.])?ads?[0-9]?(/|[-_.].*|\.(gif|jpe?g))
1154 /.*/(.*[-_.])?count(er)?(\.cgi|\.dll|\.exe|[?/])
1155 /.*/(ng)?adclient\.cgi
1156 /.*/(plain|live|rotate)[-_.]?ads?/
1157 /.*/(sponsor)s?[0-9]?/
1158 /.*/_?(plain|live)?ads?(-banners)?/
1160 /.*/ad(sdna_image|gifs?)/
1161 /.*/ad(server|stream|juggler)\.(cgi|pl|dll|exe)
1165 /.*/adv((er)?ts?|ertis(ing|ements?))?/
1169 /.*/cgi-bin/centralad/getimage
1170 /.*/images/addver\.gif
1171 /.*/images/marketing/.*\.(gif|jpe?g)
1175 /.*/sponsors?[0-9]?/
1176 /.*/advert[0-9]+\.jpg
1183 /graphics/defaultAd/
1185 /image\.ng/transactionID
1186 /images/.*/.*_anim\.gif # alvin brattli
1187 /ip_img/.*\.(gif|jpe?g)
1191 /cgi-bin/nph-adclick.exe/
1192 /.*/Image/BannerAdvertising/
1194 /.*/adlib/server\.cgi
1198 -------------------------------------------------------------------------------
1202 Custom "actions", known to Junkbuster as "aliases", can be defined by combining
1203 other "actions". These can in turn be invoked just like the built-in "actions".
1204 Currently, an alias can contain any character except space, tab, "=", "{" or "}
1205 ". But please use only "a"- "z", "0"-"9", "+", and "-". Alias names are not
1206 case sensitive, and must be defined before anything else in actionsfile! And
1207 there can only be one set of "aliases" of defined.
1209 Now let's define a few aliases:
1211 # Useful customer aliases we can use later. These must come first!
1213 +no-cookies = +no-cookies-set +no-cookies-read
1214 -no-cookies = -no-cookies-set -no-cookies-read
1216 -block -no-cookies -filter -fast-redirects -hide-referer -no-popups
1217 shop = -no-cookies -filter -fast-redirects
1218 +imageblock = +block +image
1220 #For people who don't like to type too much: ;-)
1223 c2 = -no-cookies-set +no-cookies-read
1224 c3 = +no-cookies-set -no-cookies-read
1225 #... etc. Customize to your heart's content.
1228 Some examples using our "shop" and "fragile" aliases from above:
1230 # These sites are very complex and require
1231 # minimal interference.
1233 .office.microsoft.com
1234 .windowsupdate.microsoft.com
1237 # Shopping sites - still want to block ads.
1240 .worldpay.com # for quietpc.com
1244 # These shops require pop-ups
1250 -------------------------------------------------------------------------------
1254 The filter file defines what filtering of web pages Junkbuster does. The
1255 default filter file is re_filterfile, located in the config directory. In this
1256 file, any document content, whether viewable text or embedded non-visible
1257 content, can be changed.
1259 This file uses regular expressions to alter or remove any string in the target
1260 page. Some examples from the included default re_filterfile:
1262 Stop web pages from displaying annoying messages in the status bar by deleting
1265 # The status bar is for displaying link targets, not pointless buzzwords.
1266 # Again, check it out on http://www.airport-cgn.de/.
1267 s/status='.*?';*//ig
1270 Just for kicks, replace any occurrence of "Microsoft" with "MicroSuck":
1272 s/microsoft(?!.com)/MicroSuck/ig
1275 Kill those auto-refresh tags:
1277 # Kill refresh tags. I like to refresh myself. Manually.
1278 # check it out on http://www.airport-cgn.de/ and go to the arrivals page.
1280 s/<meta[^>]*http-equiv[^>]*refresh.*URL=([^>]*?)"?>/<link rev="x-refresh" href
1282 s/<meta[^>]*http-equiv="?page-enter"?[^>]*content=[^>]*>/<!
1283 --no page enter for me-->/i
1286 -------------------------------------------------------------------------------
1288 Quickstart to Using Junkbuster
1290 Install package, then run and enjoy! Junbuster accepts only one command line
1291 option -- the configuration file to be used. Example Unix startup command:
1294 # /usr/sbin/junkbuster /etc/junkbuster/config &
1298 If no configuration file is specified on the command line, Junkbuster will look
1299 for a file named config in the current directory. Except on Amiga where it will
1300 look for AmiTCP:db/junkbuster/config and Win32 where it will try junkbstr.txt.
1301 If no file is specified on the command line and no default configuration file
1302 can be found, Junkbuster will fail to start.
1304 Be sure your browser is set to use the proxy which is by default at localhost,
1305 port 8000. With Netscape (and Mozilla), this can be set under Edit ->
1306 Preferences -> Advanced -> Proxies -> HTTP Proxy. For Internet Explorer: Tools
1307 > Internet Properties -> Connections -> LAN Setting. Then, check "Use Proxy"
1308 and fill in the appropriate info (Address: localhost, Port: 8000). Include if
1309 HTTPS proxy support too.
1311 The included default configuration files should give a reasonable starting
1312 point, though may be somewhat aggressive in blocking junk. You will probably
1313 want to keep an eye out for sites that require cookies, and add these to
1314 actionsfile as needed. By default, most of these will be blocked until you add
1315 them to the configuration. If you want the browser to handle this instead, you
1316 will need to edit actionsfile and disable this feature. If you use more than
1317 one browser, it would make more sense to let Junkbuster handle this. In which
1318 case, the browser(s) should be set to accept all cookies.
1320 If a particular site shows problems loading properly, try adding it to the
1321 {fragile} section of actionsfile. This will turn off most actions for this
1324 HTTP/1.1 support is not fully implemented. If browsers that support HTTP/1.1
1325 (like Mozilla or recent versions of I.E.) experience problems, you might try to
1326 force HTTP/1.0 compatiblity. For Mozilla, look under Edit -> Preferences ->
1327 Debug -> Networking. Or set the "+downgrade" config option in actionsfile.
1329 After running Junkbuster for a while, you can start to fine tune the
1330 configuration to suit your personal, or site, preferences and requirements.
1331 There are many, many aspects that can be customized.
1333 If you encounter problems, please verify it is a Junkbuster bug, by disabling
1334 Junkbuster, and then trying the same page. Also, try another browser if
1335 possible to eliminate browser or site problems. Before reporting it as a bug,
1336 see if there is not a configuration option that is enabled that is causing the
1337 page not to load. You can then add an exception for that page or site. If a
1338 bug, please report it to the developers (see below).
1340 -------------------------------------------------------------------------------
1342 Contact the Developers
1344 Feature requests and other questions should be posted to the Feature request
1345 page at SourceForge. There is also an archive there.
1347 Anyone interested in actively participating in development and related
1348 discussions can join the appropriate mailing list here. Archives are available
1351 Please report bugs, using the form at Sourceforge. Please try to verify that it
1352 is a Junkbuster bug, and not a browser or site bug first. Also, check to make
1353 sure this is not already a known bug.
1355 -------------------------------------------------------------------------------
1357 Copyright and History
1361 Internet Junkbuster is free software; you can redistribute it and/or modify it
1362 under the terms of the GNU General Public License as published by the Free
1363 Software Foundation; either version 2 of the License, or (at your option) any
1366 This program is distributed in the hope that it will be useful, but WITHOUT ANY
1367 WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
1368 PARTICULAR PURPOSE. See the GNU General Public License for more details, which
1369 is available from the Free Software Foundation, Inc, 59 Temple Place - Suite
1370 330, Boston, MA 02111-1307, USA.
1372 -------------------------------------------------------------------------------
1376 Junkbuster was originally written by Anonymous Coders and JunkBusters
1377 Corporation, and was released as free open-source software under the GNU GPL.
1378 Stefan Waldherr made many improvements, and started the SourceForge project to
1379 rekindle development. The last stable release was v2.0.2, which has now grown
1382 -------------------------------------------------------------------------------
1386 http://sourceforge.net/projects/ijbswa
1388 http://ijbswa.sourceforge.net/
1390 http://ijbswa.sourceforge.net/config/
1392 http://www.junkbusters.com/ht/en/cookies.html
1394 http://www.waldherr.org/junkbuster/
1396 http://privacy.net/analyze/
1398 http://www.squid-cache.org/
1402 -------------------------------------------------------------------------------
1408 Junkbuster can use "regular expressions" in various config files. Assuming
1409 support for "pcre" (Perl Compatible Regular Expressions) is compiled in, which
1410 is the default. Such configuration directives do not require regular
1411 expressions, but they can be used to increase flexibility by matching a pattern
1412 with wildcards against URLs.
1414 If you are reading this, you probably don't understand what "regular
1415 expressions" are, or what they can do. So this will be a very brief
1416 introduction only. A full explanation would require a book ;-)
1418 "Regular expressions" is a way of matching one character expression against
1419 another to see if it matches or not. One of the "expressions" is a literal
1420 string of readable characters (letter, numbers, etc), and the other is a
1421 complex string of literal characters combined with wildcards, and other special
1422 characters, called metacharacters. The "metacharacters" have special meanings
1423 and are used to build the complex pattern to be matched against. Perl
1424 Compatible Regular Expressions is an enhanced form of the regular expression
1425 language with backward compatibility.
1427 To make a simple analogy, we do something similar when we use wildcard
1428 characters when listing files with the dir command in DOS. *.* matches all
1429 filenames. The "special" character here is the asterik which matches any and
1430 all characters. We can be more specific and use ? to match just individual
1431 characters. So "dir file?.text" would match "file1.txt", "file2.txt", etc. We
1432 are pattern matching, using a similar technique to "regular expressions"!
1434 Regular expressions do essentially the same thing, but are much, much more
1435 powerful. There are many more "special characters" and ways of building complex
1436 patterns however. Let's look at a few of the common ones, and then some
1439 . - Matches any single character, e.g. "a", "A", "4", ":", or "@".
1441 ? - The preceding character or expression is matched ZERO or ONE times. Either/
1444 + - The preceding character or expression is matched ONE or MORE times.
1446 * - The preceding character or expression is matched ZERO or MORE times.
1448 \ - The "escape" character denotes that the following character should be taken
1449 literally. This is used where one of the special characters (e.g. ".") needs to
1450 be taken literally and not as a special metacharacter.
1452 [] - Characters enclosed in brackets will be matched if any of the enclosed
1453 characters are encountered.
1455 () - Pararentheses are used to group a sub-expression, or multiple
1458 | - The "bar" character works like an "or" conditional statement. A match is
1459 successful if the sub-expression on either side of "|" matches.
1461 s/string1/string2/g - This is used to rewrite strings of text. "string1" is
1462 replaced by "string2" in this example.
1464 These are just some of the ones you are likely to use when matching URLs with
1465 Junkbuster, and is a long way from a definitive list. This is enough to get us
1466 started with a few simple examples which may be more illuminating:
1468 /.*/banners/.* - A simple example that uses the common combination of "." and "
1469 *" to denote any character, zero or more times. In other words, any string at
1470 all. So we start with a literal forward slash, then our regular expression
1471 pattern (".*") another literal forward slash, the string "banners", another
1472 forward slash, and lastly another ".*". We are building a directory path here.
1473 This will match any file with the path that has a directory named "banners" in
1474 it. The ".*" matches any characters, and this could conceivably be more forward
1475 slashes, so it might expand into a much longer looking path. For example, this
1476 could match: "/eye/hate/spammers/banners/annoy_me_please.gif", or just "/
1477 banners/annoying.html", or almost an infinite number of other possible
1478 combinations, just so it has "banners" in the path somewhere.
1480 A now something a little more complex:
1482 /.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal forward
1483 slashes again ("/"), so we are building another expression that is a file path
1484 statement. We have another ".*", so we are matching against any conceivable
1485 sub-path, just so it matches our expression. The only true literal that must
1486 match our pattern is adv, together with the forward slashes. What comes after
1487 the "adv" string is the interesting part.
1489 Remember the "?" means the preceding expression (either a literal character or
1490 anything grouped with "(...)" in this case) can exist or not, since this means
1491 either zero or one match. So "((er)?ts?|ertis(ing|ements?))" is optional, as
1492 are the individual sub-expressions: "(er)", "(ing|ements?)", and the "s". The "
1493 |" means "or". We have two of those. For instance, "(ing|ements?)", can expand
1494 to match either "ing" OR "ements?". What is being done here, is an attempt at
1495 matching as many variations of "advertisement", and similar, as possible. So
1496 this would expand to match just "adv", or "advert", or "adverts", or
1497 "advertising", or "advertisement", or "advertisements". You get the idea. But
1498 it would not match "advertizements" (with a "z"). We could fix that by changing
1499 our regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/", which
1500 would then match either spelling.
1502 /.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with forward
1503 slashes. Anything in the square brackets "[]" can be matched. This is using
1504 "0-9" as a shorthand expression to mean any digit one through nine. It is the
1505 same as saying "0123456789". So any digit matches. The "+" means one or more of
1506 the preceding expression must be included. The preceding expression here is
1507 what is in the square brackets -- in this case, any digit one through nine.
1508 Then, at the end, we have a grouping: "(gif|jpe?g)". This includes a "|", so
1509 this needs to match the expression on either side of that bar character also. A
1510 simple "gif" on one side, and the other side will in turn match either "jpeg"
1511 or "jpg", since the "?" means the letter "e" is optional and can be matched
1512 once or not at all. So we are building an expression here to match image GIF or
1513 JPEG type image file. It must include the literal string "advert", then one or
1514 more digits, and a "." (which is now a literal, and not a special character,
1515 since it is escaped with "\"), and lastly either "gif", or "jpeg", or "jpg".
1516 Some possible matches would include: "//advert1.jpg", "/nasty/ads/
1517 advert1234.gif", "/banners/from/hell/advert99.jpg". It would not match
1518 "advert1.gif" (no leading slash), or "/adverts232.jpg" (the expression does not
1519 include an "s"), or "/advert1.jsp" ("jsp" is not in the expression anywhere).
1521 s/microsoft(?!.com)/MicroSuck/i - This is a substitution. "MicroSuck" will
1522 replace any occurence of "microsoft". The "i" at the end of the expression
1523 means ignore case. The "(?!.com)" means the match should fail if "microsoft" is
1524 followed by ".com". In other words, this acts like a "NOT" modifier. In case
1525 this is a hyperlink, we don't want to break it ;-).
1527 We are barely scratching the surface of regular expressions here so that you
1528 can understand the default Junkbuster configuration files, and maybe use this
1529 knowledge to customize your own installation. There is much, much more that can
1530 be done with regular expressions. Now that you know enough to get started, you
1531 can learn more on your own :/
1533 More reading on Perl Compatible Regular expressions: http://www.perldoc.com/
1534 perl5.6/pod/perlre.html