4 By: Junkbuster Developers
6 $Id: user-manual.sgml,v 1.20 2001/10/24 23:58:25 hal9 Exp $
8 The user manual gives the users information on how to install and
9 configure Internet Junkbuster. Internet Junkbuster is an application
10 that provides privacy and security to users of the World Wide Web.
12 You can find the latest version of the user manual at
13 [1]http://ijbswa.sourceforge.net/user-manual/.
15 Feel free to send a note to the developers at
16 <[2]ijbswa-developers@lists.sourceforge.net>.
17 _________________________________________________________________
33 3. [12]Junkbuster Configuration
35 3.1. [13]The Main Configuration File
36 3.2. [14]The Actions File
37 3.3. [15]The Filter File
39 4. [16]Quickstart to Using Junkbuster
40 5. [17]Contact the Developers
41 6. [18]Copyright and History
49 8.1. [23]Regular Expressions
53 Internet Junkbuster is a web proxy with advanced filtering
54 capabilities for protecting privacy, filtering web page content,
55 managing cookies, controlling access, and removing ads, banners,
56 pop-ups and other obnoxious Internet Junk. Junkbuster has a very
57 flexible configuration and can be customized to suit individual needs
58 and tastes. Internet Junkbuster has application for both stand-alone
59 systems and multi-user networks.
61 This documentation is included with the current development version of
62 Internet Junkbuster and is incomplete at this point. The most up to
63 date reference for the time being is still the comments in the source
64 files and in the individual configuration files. Development of
65 version 3.0 is currently underway, and includes many significant
66 changes and enhancements over earlier verions. The target release date
67 for stable v3.0 is December 2001.
69 Since this is a development version, some features are in the process
70 of being implemented. This documentation may be slightly out of sync
71 as a result. And there are bugs, though hopefully not many!
72 _________________________________________________________________
76 In addition to Junkbuster's traditional features of ad and banner
77 blocking and cookie management, this is a list of new features
78 currently under development:
80 * A browser based configuration utility (WIP at [24]http://i.j.b).
81 * Modularized configuration that will allow for system wide
82 settings, and individual user settings. (not implemented yet)
83 * Blocking of annoying pop-up browser windows (previously available
85 * Support for HTTP/1.1 (partially implemented at this point).
86 * Support for Perl Compatible Regular Expressions in the
87 configuration files, and generally a more sophisticated
88 configuration syntax over previous versions.
89 * Web page content filtering.
92 In addition, the configuration is more versatile overall.
93 _________________________________________________________________
97 Junkbuster is available as raw source code, or pre-compiled binaries.
98 See the [25]Junkbuster Home Page for current release info. Junkbuster
99 is also available via [26]CVS. This is the recommended approach at
100 this time. But please be aware that CVS is constantly changing, and it
101 may break in mysterious ways.
102 _________________________________________________________________
106 For gzipped tar archives, unpack the source:
108 tar zxvf ijb_source_2.9*
111 For retrieving the current CVS sources, you'll need the CVS package
112 installed first. To download CVS source:
114 cvs -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa login
115 cvs -z3 -d:pserver:anonymous@cvs.ijbswa.sourceforge.net:/cvsroot/ijbswa co cu
119 This will create a directory named current/, which will contain the
122 Then, in either case, to build from source:
124 autoconf #recommended for CVS source
130 For Redhat and SuSE Linux RPM packages, see below.
131 _________________________________________________________________
135 To build Redhat RPM packages, install source as above. Then:
137 autoconf #recommended for CVS source
141 This will create both binary and src RPMs in the usual places.
144 /usr/src/redhat/RPMS/i686/junkbuster-2.9.8-1.i686.rpm
146 /usr/src/redhat/SRPMS/junkbuster-2.9.9-1.src.rpm
148 To install, of course:
150 rpm -Uvv /usr/src/redhat/RPMS/i686/junkbuster-2.9.9-1.i686.rpm
152 This will place the Junkbuster configuration files in
153 /etc/junkbuster/, and log files in /var/log/junkbuster/.
154 _________________________________________________________________
158 To build SuSE RPM packages, install source as above. Then:
160 autoconf #recommended for CVS source
164 This will create both binary and src RPMs in the usual places.
167 /usr/src/suse/RPMS/i686/junkbuster-2.9.9-1.i686.rpm
169 /usr/src/suse/SRPMS/junkbuster-2.9.9-1.src.rpm
171 To install, of course:
173 rpm -Uvv /usr/src/suse/RPMS/i686/junkbuster-2.9.9-1.i686.rpm
175 This will place the Junkbuster configuration files in
176 /etc/junkbuster/, and log files in /var/log/junkbuster/.
177 _________________________________________________________________
181 The OS/2 version of Junkbuster requires the EMX runtime library to be
182 installed. The EMX runtime library is available on the hobbes OS/2
183 archive, among many other locations:
184 [27]http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&button=Search&key=emx
185 rt.zip&stype=all&sort=type&dir=%2Fpub%2Fos2%2Fdev%2Femx%2Fv0.9d
187 Junkbuster is packaged in a WarpIN self- installing archive. The
188 self-installing program will be named depending on the release
189 version, something like: ijbos123.exe. In order to install it, simply
190 run this executable or double-click on its icon and follow the WarpIN
191 installation panels. A shadow of the Junkbuster executable will be
192 placed in your startup folder so it will start automatically whenever
195 The directory you choose to install Junkbuster into will contain all
196 of the configuration files.
198 If you would like to build binary images on OS/2 yourself, you will
199 need a working EMX/GCC environment, plus several Unix-like tools. The
200 Hobbes OS/2 archive is a good place to start when building such an
201 environment. A set of Unix-like tools named gnupack is located here:
202 [28]http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&key=gnupack&stype=all
203 &sort=type&dir=%2Fpub%2Fos2%2Fapps
205 Once you have the source code unpacked as above, you can build the
206 binaries from the current/ directory:
211 _________________________________________________________________
215 Click-click. (I need help on this. Not a clue here. Also for
216 configuration section below. HB.)
217 _________________________________________________________________
221 Some quick notes on other Operating Systems.
223 For FreeBSD (and other *BSDs?), the build will need gmake instead of
224 the included make. gmake is available from [29]http://www.gnu.org. The
225 rest should be the same as above for Linux/Unix.
226 _________________________________________________________________
228 3. Junkbuster Configuration
230 For Unix, *BSD and Linux, all configuraton files are located in
231 /etc/junkbuster/ by default. For MS Windows and OS/2, these are all in
232 the same directory as the Junkbuster executable. The name and number
233 of configuration files has changed from previous versions, and is
234 subject to change as development progresses.
236 The installed defaults provide a reasonable starting point. For the
237 time being, there are only three default configuration files (this
238 will change in time):
240 * The main configuration file is named config on Linux, Unix, BSD,
241 and OS/2, and junkbustr.txt on Windows. On Amiga, it is
242 AmiTCP:db/junkbuster/config.
243 * The actionsfile file is used to define various "actions" relating
244 to images, banners, pop-ups, access restrictions, banners and
245 cookies. There is a CGI based editor for this file that can be
246 accessed via [30]http://i.j.b./. This is the easiest method of
247 configuring actions. (Still under active development.)
248 * The re_filterfile file can be used to rewrite the raw page
249 content, including text as well as embedded HTML and JavaScript.
251 actionsfile and re_filterfile can use Perl style regular expressions
252 for maximum flexibility. All files use the "#" character to denote a
253 comment. Such lines are not processed by Junkbuster. After making any
254 changes, restart Junkbuster in order for the changes to take effect.
256 While under development, the configuration content is subject to
257 change. The below documentation may not be accurate by the time you
258 read this. Also, what constitutes a "default" setting, may change, so
259 please check all your configuration files on important issues.
260 _________________________________________________________________
262 3.1. The Main Configuration File
264 Again, the main configuration file is named config on Linux/Unix/BSD
265 and OS/2, and junkbustr.txt on Windows. Configuration lines consist of
266 an initial keyword followed by a list of values, all separated by
267 whitespace (any number of spaces or tabs). For example:
269 blockfile blocklist.ini
271 Indicates that the blockfile is named "blocklist.ini".
273 A "#" indicates a comment. Any part of a line following a "#" is
274 ignored, except if the "#" is preceded by a "\".
276 Thus, by placing a "#" at the start of an existing configuration line,
277 you can make it a comment and it will be treated as if it weren't
278 there. This is called "commenting out" an option and can be useful to
279 turn off features: If you comment out the "logfile" line, junkbuster
280 will not log to a file at all. Watch for the "default:" section in
281 each explanation to see what happens if the option is left unset (or
284 Long lines can be continued on the next line by using a "\" as the
287 There are various aspects of Junkbuster behavior that can be tuned.
288 _________________________________________________________________
290 3.1.1. Defining Other Configuration Files
292 Junkbuster can use a number of other files to tell it what ads to
293 block, what cookies to accept, etc. This section of the configuration
294 file tells Junkbuster where to find all those other files.
296 On Windows, Junkbuster looks for these files in the same directory as
297 the executable. On Unix and OS/2, Junkbuster looks for these files in
298 the current working directory. In either case, an absolute path name
299 can be used to avoid problems.
301 When development goes modular and multiuser, the blocker, filter, and
302 per-user config will be stored in subdirectories of "confdir". For
303 now, only confdir/templates is used for storing HTML templates for CGI
306 The location of the configuration files:
308 confdir /etc/junkbuster # No trailing /, please.
310 The directory where all logging (i.e. logfile and jarfile) takes
311 place. No trailing "/", please:
313 logdir /var/log/junkbuster
315 Note that all file specifications below are relative to the above two
318 The "actionsfile" contains patterns to specify the actions to apply to
319 requests for each site. Default: Cookies to and from all destinations
320 are filtered. Popups are disabled for all sites. All sites are
321 filtered if re_filterfile specified. No sites are blocked. An empty
322 image is displayed for filtered ads and other images (formerly
323 "tinygif"). The syntax of this file is explained in detail [31]below.
325 actionsfile actionsfile
327 The "re_filterfile" file contains content modification rules. These
328 rules permit powerful changes on the content of Web pages, e.g., you
329 could disable your favourite JavaScript annoyances, rewrite the actual
330 content, or just have some fun replacing "Microsoft" with "MicroSuck"
331 wherever it appears on a Web page. Default: No content modification,
332 or whatever the developers are playing with :-/
334 re_filterfile re_filterfile
336 The logfile is where all logging and error messages are written. The
337 logfile can be useful for tracking down a problem with Junkbuster
338 (e.g., it's not blocking an ad you think it should block) but in most
339 cases you probably will never look at it.
341 Your logfile will grow indefinitely, and you will probably want to
342 periodically remove it. On Unix systems, you can do this with a cron
343 job (see "man cron"). For Redhat, a logrotate script has been
346 On SuSE Linux systems, you can place a line like
347 "/var/log/junkbuster.* +1024k 644 nobody.nogroup" in /etc/logfiles,
348 with the effect that cron.daily will automatically archive, gzip, and
349 empty the log, when it exceeds 1M size.
351 Default: Log to the a file named logfile. Comment out to disable
356 The "jarfile" defines where Junkbuster stores the cookies it
357 intercepts. Note that if you use a "jarfile", it may grow quite large.
358 Default: Don't store intercepted cookies.
362 If you specify a "trustfile", Junkbuster will only allow access to
363 sites that are named in the trustfile. You can also mark sites as
364 trusted referrers, with the effect that access to untrusted sites will
365 be granted, if a link from a trusted referrer was used. The link
366 target will then be added to the "trustfile". This is a very
367 restrictive feature that typical users most propably want to leave
368 disabled. Default: Disabled, don't use the trust mechanism.
372 If you use the trust mechanism, it is a good idea to write up some
373 online documentation about your blocking policy and to specify the
374 URL(s) here. They will appear on the page that your users receive when
375 they try to access untrusted content. Use multiple times for multiple
376 URLs. Default: Don't display links on the "untrusted" info page.
378 trust-info-url http://www.your-site.com/why_we_block.html
379 trust-info-url http://www.your-site.com/what_we_allow.html
380 _________________________________________________________________
382 3.1.2. Other Configuration Options
384 This part of the configuration file contains options that control how
387 "Admin-address" should be set to the email address of the proxy
388 administrator. It is used in many of the proxy-generated pages.
389 Default: fill@me.in.please.
391 #admin-address fill@me.in.please
393 "Proxy-info-url" can be set to a URL that contains more info about
394 this Junkbuster installation, it's configuration and policies. It is
395 used in many of the proxy-generated pages and its use is highly
396 recommended in multi-user installations, since your users will want to
397 know why certain content is blocked or modified. Default: Don't show a
398 link to online documentation.
400 proxy-info-url http://www.your-site.com/proxy.html
402 "Listen-address" specifies the address and port where Junkbuster will
403 listen for connections from your Web browser. The default is to listen
404 on the localhost port 8000, and this is suitable for most users. (In
405 your web browser, under proxy configuration, list the proxy server as
406 "localhost" and the port as "8000").
408 If you already have another service running on port 8000, or if you
409 want to serve requests from other machines (e.g. on your local
410 network) as well, you will need to override the default. The syntax is
411 "listen-address [<ip-address>]:<port>". If you leave out the IP
412 address, junkbuster will bind to all interfaces (addresses) on your
413 machine and may become reachable from the Internet. In that case,
414 consider using access control lists (acl's) (see "aclfile" above), or
417 For example, suppose you are running Junkbuster on a machine which has
418 the address 192.168.0.1 on your local private network (192.168.0.0)
419 and has another outside connection with a different address. You want
420 it to serve requests from inside only:
422 listen-address 192.168.0.1:8000
424 If you want it to listen on all addresses (including the outside
429 If you do this, consider using ACLs (see "aclfile" above). Note: you
430 will need to point your browser(s) to the address and port that you
431 have configured here. Default: localhost:8000 (127.0.0.1:8000).
433 The debug option sets the level of debugging information to log in the
434 logfile (and to the console in the Windows version). A debug level of
435 1 is informative because it will show you each request as it happens.
436 Higher levels of debug are probably only of interest to developers.
438 debug 1 # GPC = show each GET/POST/CONNECT request
439 debug 2 # CONN = show each connection status
440 debug 4 # IO = show I/O status
441 debug 8 # HDR = show header parsing
442 debug 16 # LOG = log all data into the logfile
443 debug 32 # FRC = debug force feature
444 debug 64 # REF = debug regular expression filter
445 debug 128 # = debug fast redirects
446 debug 256 # = debug GIF deanimation
447 debug 512 # CLF = Common Log Format
448 debug 1024 # = debug kill popups
449 debug 4096 # INFO = Startup banner and warnings.
450 debug 8192 # ERROR = Non-fatal errors
452 It is highly recommended that you enable ERROR reporting (debug 8192),
453 at least until the next stable release.
455 The reporting of FATAL errors (i.e. ones which crash JunkBuster) is
456 always on and cannot be disabled.
458 If you want to use CLF (Common Log Format), you should set "debug 512"
459 ONLY, do not enable anything else.
461 Multiple "debug" directives, are OK - they're logical-OR'd together.
463 debug 15 # same as setting the first 4 listed above
469 debug 8192 # Errors - *we highly recommended enabling this*
471 Junkbuster normally uses "multi-threading", a software technique that
472 permits it to handle many different requests simultaneously. In some
473 cases you may wish to disable this -- particularly if you're trying to
474 debug a problem. The "single-threaded" option forces Junkbuster to
475 handle requests sequentially. Default: Multi-threaded mode.
479 "toggle" allows you to temporarily disable all Junkbuster's filtering.
482 The Windows version of Junkbuster puts an icon in the system tray,
483 which also allows you to change this option. If you right-click on
484 that icon (or select the "Options" menu), one choice is "Enable".
485 Clicking on enable toggles Junkbuster on and off. This is useful if
486 you want to temporarily disable Junkbuster, e.g., to access a site
487 that requires cookies which you normally have blocked. This can also
488 be toggled via a web browser at the Junkbuster internal address of
489 [32]http://i.j.b./ on any platform.
491 "toggle 1" means Junkbuster runs normally, "toggle 0" means that
492 Junkbuster becomes a non-anonymizing non-blocking proxy. Default: 1
497 For content filtering, i.e. the "+filter" and "+deanimate-gif"
498 actions, it is neccessary that Junkbuster buffers the entire document
499 body. This can be potentially dangerous, since a server could just
500 keep sending data indefinitely and wait for your RAM to exhaust. With
503 The buffer-limit option lets you set the maximum size in Kbytes that
504 each buffer may use. When the documents buffer exceeds this size, it
505 is flushed to the client unfiltered and no further attempt to filter
506 the rest of it is made. Remember that there may multiple threads
507 running, which might require increasing the "buffer-limit" Kbytes
508 each, unless you have enabled "single-threaded" above.
512 To enable the web-based actionsfile editor set enable-edit-actions to
513 1, or 0 to disable. Note that you must have compiled JunkBuster with
514 support for this feature, otherwise this option has no effect. This
515 internal page can be reached at [33]http://i.j.b./.
517 Security note: If this is enabled, anyone who can use the proxy can
518 edit the actions file, and their changes will affect all users. For
519 shared proxies, you probably want to disable this. Default: enabled.
521 enable-edit-actions 1
523 Allow JunkBuster to be toggled on and off remotely, using your web
524 browser. Set "enable-remote-toggle"to 1 to enable, and 0 to disable.
525 Note that you must have compiled JunkBuster with support for this
526 feature, otherwise this option has no effect.
528 Security note: If this is enabled, anyone who can use the proxy can
529 toggle it on or off (see [34]http://i.j.b./), and their changes will
530 affect all users. For shared proxies, you probably want to disable
531 this. Default: enabled.
533 enable-remote-toggle 1
534 _________________________________________________________________
536 3.1.3. Access Control List (ACL)
538 Access controls are included at the request of some ISPs and systems
539 administrators, and are not usually needed by individual users. Please
540 note the warnings in the FAQ that this proxy is not intended to be a
541 substitute for a firewall or to encourage anyone to defer addressing
542 basic security weaknesses.
544 If no access settings are specified, the proxy talks to anyone that
545 connects. If any access settings file are specified, then the proxy
546 talks only to IP addresses permitted somewhere in this file and not
547 denied later in this file.
549 Summary -- if using an ACL:
551 Client must have permission to receive service.
553 LAST match in ACL wins.
555 Default behavior is to deny service.
557 The syntax for an entry in the Access Control List is:
559 ACTION SRC_ADDR[/SRC_MASKLEN] [ DST_ADDR[/DST_MASKLEN] ]
561 Where the individual fields are:
563 ACTION = "permit-access" or "deny-access"
564 SRC_ADDR = client hostname or dotted IP address
565 SRC_MASKLEN = number of bits in the subnet mask for the source
566 DST_ADDR = server or forwarder hostname or dotted IP address
567 DST_MASKLEN = number of bits in the subnet mask for the target
569 The field separator (FS) is whitespace (space or tab).
571 IMPORTANT NOTE: If the junkbuster is using a forwarder (see below) or
572 a gateway for a particular destination URL, the DST_ADDR that is
573 examined is the address of the forwarder or the gateway and NOT the
574 address of the ultimate target. This is necessary because it may be
575 impossible for the local Junkbuster to determine the address of the
576 ultimate target (that's often what gateways are used for).
578 Here are a few examples to show how the ACL features work:
580 "localhost" is OK -- no DST_ADDR implies that ALL destination
583 permit-access localhost
585 A silly example to illustrate permitting any host on the class-C
586 subnet with Junkbuster to go anywhere:
588 permit-access www.junkbusters.com/24
590 Except deny one particular IP address from using it at all:
592 deny-access ident.junkbusters.com
594 You can also specify an explicit network address and subnet mask.
595 Explicit addresses do not have to be resolved to be used.
597 permit-access 207.153.200.0/24
599 A subnet mask of 0 matches anything, so the next line permits
602 permit-access 0.0.0.0/0
604 Note, you cannot say:
608 to allow all *.org domains. Every IP address listed must resolve
611 An ISP may want to provide a Junkbuster that is accessible by "the
612 world" and yet restrict use of some of their private content to hosts
613 on its internal network (i.e. its own subscribers). Say, for instance
614 the ISP owns the Class-B IP address block 123.124.0.0 (a 16 bit
615 netmask). This is how they could do it:
617 permit-access 0.0.0.0/0 0.0.0.0/0 # other clients can go anywhere
618 # with the following exceptions
621 deny-access 0.0.0.0/0 123.124.0.0/16 # block all external request
623 # sites on the ISP's network
624 permit 0.0.0.0/0 www.my_isp.com # except for the ISP's main
626 permit 123.124.0.0/16 0.0.0.0/0 # the ISP's clients can go
629 Note that if some hostnames are listed with multiple IP addresses, the
630 primary value returned by DNS (via gethostbyname()) is used. Default:
631 Anyone can access the proxy.
632 _________________________________________________________________
636 This feature allows chaining of HTTP requests via multiple proxies. It
637 can be used to better protect privacy and confidentiality when
638 accessing specific domains by routing requests to those domains to a
639 special purpose filtering proxy such as lpwa.com. Or to use a caching
640 proxy to speed up browsing.
642 It can also be used in an environment with multiple networks to route
643 requests via multiple gateways allowing transparent access to multiple
644 networks without having to modify browser configurations.
646 Also specified here are SOCKS proxies. Junkbuster SOCKS 4 and SOCKS
647 4A. The difference is that SOCKS 4A will resolve the target hostname
648 using DNS on the SOCKS server, not our local DNS client.
650 The syntax of each line is:
652 forward target_domain[:port] http_proxy_host[:port]
653 forward-socks4 target_domain[:port] socks_proxy_host[:port]
654 http_proxy_host[:port]
655 forward-socks4a target_domain[:port] socks_proxy_host[:port]
656 http_proxy_host[:port]
658 If http_proxy_host is ".", then requests are not forwarded to a HTTP
659 proxy but are made directly to the web servers.
661 Lines are checked in sequence, and the last match wins.
663 There is an implicit line equivalent to the following, which specifies
664 that anything not finding a match on the list is to go out without
665 forwarding or gateway protocol, like so:
667 forward .* . # implicit
669 In the following common configuration, everything goes to Lucent's
670 LPWA, except SSL on port 443 (which it doesn't handle):
672 forward .* lpwa.com:8000
675 See the FAQ for instructions on how to automate the login procedure
676 for LPWA. Some users have reported difficulties related to LPWA's use
677 of "." as the last element of the domain, and have said that this can
680 forward lpwa. lpwa.com:8000
682 (NOTE: the syntax for specifiying target_domain has changed since the
683 previous paragraph was written -- it will not work now. More
684 information is welcome.)
686 In this fictitious example, everything goes via an ISP's caching
687 proxy, except requests to that ISP:
689 forward .* caching.myisp.net:8000
692 For the @home network, we're told the forwarding configuration is
695 forward .* proxy:8080
697 Also, we're told they insist on getting cookies and JavaScript, so you
698 need to add home.com to the cookie file. We consider JavaScript a
699 security risk. Java need not be enabled.
701 In this example direct connections are made to all "internal" domains,
702 but everything else goes through Lucent's LPWA by way of the company's
703 SOCKS gateway to the Internet.
705 forward_socks4 .* lpwa.com:8000 firewall.my_company.com:1080
706 forward my_company.com .
708 This is how you could set up a site that always uses SOCKS but no
711 forward_socks4a .* . firewall.my_company.com:1080
713 An advanced example for network administrators:
715 If you have links to multiple ISPs that provide various special
716 content to their subscribers, you can configure forwarding to pass
717 requests to the specific host that's connected to that ISP so that
718 everybody can see all of the content on all of the ISPs.
720 This is a bit tricky, but here's an example:
722 host-a has a PPP connection to isp-a.com. And host-b has a PPP
723 connection to isp-b.com. host-a can run a Junkbuster proxy with
724 forwarding like this:
727 forward isp-b.com host-b:8000
729 host-b can run a Junkbuster proxy with forwarding like this:
732 forward isp-a.com host-a:8000
734 Now, anyone on the Internet (including users on host-a and host-b) can
735 set their browser's proxy to either host-a or host-b and be able to
736 browse the content on isp-a or isp-b.
738 Here's another practical example, for University of Kent at Canterbury
739 students with a network connection in their room, who need to use the
740 University's Squid web cache.
742 forward *. ssbcache.ukc.ac.uk:3128 # Use the proxy, except for:
743 forward .ukc.ac.uk . # Anything on the same domain as us
744 forward * . # Host with no domain specified
745 forward 129.12.*.* . # A dotted IP on our /16 network.
746 forward 127.*.*.* . # Loopback address
747 forward localhost.localdomain . # Loopback address
748 forward www.ukc.mirror.ac.uk . # Specific host
750 If you intend to chain Junkbuster and squid locally, then chain as
751 browser -> squid -> junkbuster is the recommended way.
753 Your squid configuration could then look like this:
755 # Define junkbuster as parent cache
757 cache_peer 127.0.0.1 parent 8000 0 no-query
759 # Define ACL for protocol FTP
761 # Do not forward ACL FTP to junkbuster
762 always_direct allow FTP
763 # Do not forward ACL CONNECT (https) to junkbuster
764 always_direct allow CONNECT
765 # Forward the rest to junkbuster
766 never_direct allow all
767 _________________________________________________________________
769 3.1.5. Windows GUI Options
771 Junkbuster has a number of options specific to the Windows GUI
774 If "activity-animation" is set to 1, the Junkbuster icon will animate
775 when "Junkbuster" is active. To turn off, set to 0.
779 If "log-messages" is set to 1, Junkbuster will log messages to the
784 If "log-buffer-size" is set to 1, the size of the log buffer, i.e. the
785 amount of memory used for the log messages displayed in the console
786 window, will be limited to "log-max-lines" (see below).
788 Warning: Setting this to 0 will result in the buffer to grow
789 infinitely and eat up all your memory!
793 log-max-lines is the maximum number of lines held in the log buffer.
798 If "log-highlight-messages" is set to 1, Junkbuster will highlight
799 portions of the log messages with a bold-faced font:
801 log-highlight-messages 1
803 The font used in the console window:
805 log-font-name Comic Sans MS
807 Font size used in the console window:
811 "show-on-task-bar" controls whether or not Junkbuster will appear as a
812 button on the Task bar when minimized:
816 If "close-button-minimizes" is set to 1, the Windows close button will
817 minimize Junkbuster instead of closing the program (close with the
818 exit option on the File menu).
820 close-button-minimizes 1
822 The "hide-console" option is specific to the MS-Win console version of
823 JunkBuster. If this option is used, Junkbuster will disconnect from
824 and hide the command console.
827 _________________________________________________________________
829 3.2. The Actions File
831 The "actionsfile" is used to define what actions Junkbuster takes, and
832 thus determines how images, cookies and various other aspects of HTTP
833 content and transactions are handled. Images can be anything you want,
834 including ads, banners, or just some obnoxious image that you would
835 rather not see. Cookies can be accepted or rejected. The default file
836 is in fact named actionsfile.
838 To determine which actions apply to a request, the URL of the request
839 is compared to all patterns in this file. Every time it matches, the
840 list of applicable actions for the URL is incrementally updated. You
841 can trace this process by visiting [35]http://i.j.b/show-url-info.
843 The actions file can be edited with a browser by loading
844 [36]http://i.j.b, and then select "Edit Actions".
846 There are four types of lines in this file: comments (begin with a "#"
847 character), actions, aliases and patterns, all of which are explained
848 below, as well as the configuration file syntax that Junkbuster
850 _________________________________________________________________
852 3.2.1. URL Domain and Path Syntax
854 Generally, a pattern has the form <domain>/<path>, where both the
855 <domain> and <path> part are optional. If you only specify a domain
856 part, the "/" can be left out:
858 www.example.com - is a domain only pattern and will match any request
859 to "www.example.com".
861 www.example.com/ - means exactly the same.
863 www.example.com/index.html - matches only the single document
864 "/index.html" on "www.example.com".
866 /index.html - matches the document "/index.html", regardless of the
869 index.html - matches nothing, since it would be interpreted as a
870 domain name and there is no top-level domain called ".html".
872 The matching of the domain part offers some flexible options: if the
873 domain starts or ends with a dot, it becomes unanchored at that end.
876 .example.com - matches any domain that ENDS in ".example.com".
878 www. - matches any domain that STARTS with "www".
880 Additionally, there are wildcards that you can use in the domain names
881 themselves. They work pretty similar to shell wildcards: "*" stands
882 for zero or more arbitrary characters, "?" stands for any single
883 character. And you can define charachter classes in square brackets
884 and they can be freely mixed:
886 ad*.example.com - matches "adserver.example.com", "ads.example.com",
887 etc but not "sfads.example.com".
889 *ad*.example.com - matches all of the above, and then some.
891 .?pix.com - matches "www.ipix.com", "pictures.epix.com",
892 "a.b.c.d.e.upix.com", etc.
894 www[1-9a-ez].example.com - matches "www1.example.com",
895 "www4.example.com", "wwwd.example.com", "wwwz.example.com", etc., but
896 not "wwww.example.com".
898 If Junkbuster was compiled with "pcre" support (default), Perl
899 compatible regular expressions can be used. See the pcre/docs/
900 direcory or "man perlre" (also available on
901 [37]http://www.perldoc.com/perl5.6/pod/perlre.html) for details. A
902 brief discussion of regular expressions is in the [38]Appendix. For
905 /.*/advert[0-9]+\.jpe?g - would match a URL from any domain, with any
906 path that includes "advert" followed immediately by one or more
907 digits, then a "." and ending in either "jpeg" or "jpg". So we match
908 "example.com/ads/advert2.jpg", and
909 "www.example.com/ads/banners/advert39.jpeg", but not
910 "www.example.com/ads/banners/advert39.gif" (no gifs in the example
913 Please note that matching in the path is case INSENSITIVE by default,
914 but you can switch to case sensitive at any point in the pattern by
915 using the "(?-i)" switch:
917 www.example.com/(?-i)PaTtErN.* - will match only documents whose path
918 starts with "PaTtErN" in exactly this capitalization.
919 _________________________________________________________________
923 Actions are enabled if preceded with a "+", and disabled if preceded
924 with a "-". Actions are invoked by enclosing the action name in curly
925 braces (e.g. {+some_action}), followed by a list of URLs to which the
926 action applies. There are three classes of actions:
928 * Boolean (e.g. "+/-block"):
929 {+name} # enable this action
930 {-name} # disable this action
932 * Parameterized (e.g. "+/-hide-user-agent"):
933 {+name{param}} # enable action and set parameter to "param"
934 {-name} # disable action
936 * Multi-value (e.g. "{+/-add-header{Name: value}}",
937 "{+/-wafer{name=value}}"):
938 {+name{param}} # enable action and add parameter "param"
939 {-name{param}} # remove the parameter "param"
940 {-name} # disable this action totally
942 If nothing is specified in this file, no "actions" are taken. So in
943 this case JunkBuster would just be a normal, non-blocking,
944 non-anonymizing proxy. You must specifically enable the privacy and
945 blocking features you need (although the provided default actionsfile
946 file will give a good starting point).
948 Later defined actions always over-ride earlier ones. For multi-valued
949 actions, the actions are applied in the order they are specified.
951 The list of valid Junkbuster "actions" are:
953 * Add the specified HTTP header, which is not checked for validity.
954 You may specify this many times to specify many different headers:
955 +add-header{Name: value}
957 * Block this URL totally.
960 * De-animate all animated GIF images, i.e. reduce them to their last
961 frame. This will also shrink the images considerably (in bytes,
962 not pixels!). If the option "first" is given, the first frame of
963 the animation is used as the replacement. If "last" is given, the
964 last frame of the animation is used instead, which propably makes
965 more sense for most banner animations, but also has the risk of
966 not showing the entire last frame (if it is only a delta to an
968 +deanimate-gifs{last}
969 +deanimate-gifs{first}
971 * "+downgrade" will downgrade HTTP/1.1 client requests to HTTP/1.0
972 and downgrade the responses as well. Use this action for servers
973 that use HTTP/1.1 protocol features that Junkbuster doesn't handle
974 well yet. HTTP/1.1 is only partially implemented. Default is not
975 to downgrade requests.
978 * Many sites, like yahoo.com, don't just link to other sites.
979 Instead, they will link to some script on their own server, giving
980 the destination as a parameter, which will then redirect you to
981 the final target. URLs resulting from this scheme typically look
982 like: http://some.place/some_script?http://some.where-else.
983 Sometimes, there are even multiple consecutive redirects encoded
984 in the URL. These redirections via scripts make your web browing
985 more traceable, since the server from which you follow such a link
986 can see where you go to. Apart from that, valuable bandwidth and
987 time is wasted, while your browser ask the server for one redirect
988 after the other. Plus, it feeds the advertisers.
989 The "+fast-redirects" option enables interception of these
990 requests by Junkbuster, who will cut off all but the last valid
991 URL in the request and send a local redirect back to your browser
992 without contacting the remote site.
995 * Filter the website through the re_filterfile:
998 * Block any existing X-Forwarded-for header, and do not add a new
1002 * If the browser sends a "From:" header containing your e-mail
1003 address, this either completely removes the header ("block"), or
1004 changes it to the specified e-mail address.
1006 +hide-from{spam@sittingduck.xqq}
1008 * Don't send the "Referer:" (sic) header to the web site. You can
1009 block it, forge a URL to the same server as the request (which is
1010 preferred because some sites will not send images otherwise) or
1011 set it to a constant string of your choice.
1012 +hide-referer{block}
1013 +hide-referer{forge}
1014 +hide-referer{http://nowhere.com}
1016 * Alternative spelling of "+hide-referer". It has the same
1017 parameters, and can be freely mixed with, "+hide-referer".
1018 ("referrer" is the correct English spelling, however the HTTP
1019 specification has a bug - it requires it to be spelled "referer".)
1022 * Change the "User-Agent:" header so web servers can't tell your
1023 browser type. Warning! This breaks many web sites. Specify the
1024 user-agent value you want. Example, pretend to be using Netscape
1026 +hide-user-agent{Mozilla (X11; I; Linux 2.0.32 i586)}
1028 * Treat this URL as an image. This only matters if it's also
1029 "+block"ed, in which case a "blocked" image can be sent rather
1030 than a HTML page. See "+image-blocker{}" below for the control
1031 over what is actually sent.
1034 * Decides what to do with URLs that end up tagged with "{+block
1035 +image}". There are 4 options. "-image-blocker" will send a HTML
1036 "blocked" page, usually resulting in a "broken image" icon.
1037 "+image-blocker{logo}" will send a "JunkBuster" image.
1038 "+image-blocker{blank}" will send a 1x1 transparent GIF image. And
1039 finally, "+image-blocker{http://xyz.com}" will send a HTTP
1040 temporary redirect to the specified image. This has the advantage
1041 of the icon being being cached by the browser, which will speed up
1043 +image-blocker{logo}
1044 +image-blocker{blank}
1045 +image-blocker{http://i.j.b/send-banner}
1047 * By default (i.e. in the absence of a "+limit-connect" action),
1048 Junkbuster will only allow CONNECT requests to port 443, which is
1049 the standard port for https as a precaution.
1050 The CONNECT methods exists in HTTP to allow access to secure
1051 websites (https:// URLs) through proxies. It works very simply:
1052 the proxy connects to the server on the specified port, and then
1053 short-circuits its connections to the client and to the remote
1054 proxy. This can be a big security hole, since CONNECT-enabled
1055 proxies can be abused as TCP relays very easily.
1056 If you want to allow CONNECT for more ports than this, or want to
1057 forbid CONNECT altogether, you can specify a comma separated list
1058 of ports and port ranges (the latter using dashes, with the
1059 minimum defaulting to 0 and max to 65K):
1060 +limit-connect{443} # This is the default and need no be
1062 +limit-connect{80,443} # Ports 80 and 443 are OK.
1063 +limit-connect{-3, 7, 20-100, 500-} # Port less than 3, 7, 20 to
1065 #and above 500 are OK.
1067 * "+no-compression" prevents the website from compressing the data.
1068 Some websites do this, which can be a problem for Junkbuster,
1069 since "+filter", "+no-popup" and "+gif-deanimate" will not work on
1070 compressed data. This will slow down connections to those
1071 websites, though. Default is "nocompression" is turned on.
1074 * Prevent the website from reading cookies:
1077 * Prevent the website from setting cookies:
1080 * Filter the website through a built-in filter to disable those
1081 obnoxious JavaScript pop-up windows via window.open(), etc. The
1082 two alternative spellings are equivalent.
1086 * This action only applies if you are using a jarfile for saving
1087 cookies. It sends a cookie to every site stating that you do not
1088 accept any copyright on cookies sent to you, and asking them not
1089 to track you. Of course, this is a (relatively) unique header they
1090 could use to track you.
1093 * This allows you to add an arbitrary cookie. It can be specified
1094 multiple times in order to add as many cookies as you like.
1097 The meaning of any of the above is reversed by preceding the action
1098 with a "-", in place of the "+".
1102 Turn off cookies by default, then allow a few through for specified
1105 # Turn off all cookies
1106 { +no-cookies-read }
1108 # Execeptions to the above, sites that need cookies
1109 { -no-cookies-read }
1116 # Alternative way of saying the same thing
1117 {-no-cookies-set -no-cookies-read}
1121 Now turn off "fast redirects", and then we allow two exceptions:
1126 # Reverse it for these two sites, which don't work right without it.
1128 www.ukc.ac.uk/cgi-bin/wac\.cgi\?
1131 Turn on page filtering, with one exception for sourceforge:
1133 # Run everything through the default filter file (re_filterfile):
1136 # But please don't re_filter code from sourceforge!
1138 .cvs.sourceforge.net
1140 Now some URLs that we want "blocked", ie we won't see them. Many of
1141 these use regular expressions that will expand to match multiple URLs:
1145 /.*/(.*[-_.])?ads?[0-9]?(/|[-_.].*|\.(gif|jpe?g))
1146 /.*/(.*[-_.])?count(er)?(\.cgi|\.dll|\.exe|[?/])
1147 /.*/(ng)?adclient\.cgi
1148 /.*/(plain|live|rotate)[-_.]?ads?/
1149 /.*/(sponsor)s?[0-9]?/
1150 /.*/_?(plain|live)?ads?(-banners)?/
1152 /.*/ad(sdna_image|gifs?)/
1153 /.*/ad(server|stream|juggler)\.(cgi|pl|dll|exe)
1157 /.*/adv((er)?ts?|ertis(ing|ements?))?/
1161 /.*/cgi-bin/centralad/getimage
1162 /.*/images/addver\.gif
1163 /.*/images/marketing/.*\.(gif|jpe?g)
1167 /.*/sponsors?[0-9]?/
1168 /.*/advert[0-9]+\.jpg
1175 /graphics/defaultAd/
1177 /image\.ng/transactionID
1178 /images/.*/.*_anim\.gif # alvin brattli
1179 /ip_img/.*\.(gif|jpe?g)
1183 /cgi-bin/nph-adclick.exe/
1184 /.*/Image/BannerAdvertising/
1186 /.*/adlib/server\.cgi
1188 _________________________________________________________________
1192 Custom "actions", known to Junkbuster as "aliases", can be defined by
1193 combining other "actions". These can in turn be invoked just like the
1194 built-in "actions". Currently, an alias can contain any character
1195 except space, tab, "=", "{" or "}". But please use only "a"- "z",
1196 "0"-"9", "+", and "-". Alias names are not case sensitive, and must be
1197 defined before anything else in actionsfile! And there can only be one
1198 set of "aliases" defined.
1200 Now let's define a few aliases:
1202 # Useful customer aliases we can use later. These must come first!
1204 +no-cookies = +no-cookies-set +no-cookies-read
1205 -no-cookies = -no-cookies-set -no-cookies-read
1206 fragile = -block -no-cookies -filter -fast-redirects -hide-refere
1208 shop = -no-cookies -filter -fast-redirects
1209 +imageblock = +block +image
1210 #For people who don't like to type too much: ;-)
1213 c2 = -no-cookies-set +no-cookies-read
1214 c3 = +no-cookies-set -no-cookies-read
1215 #... etc. Customize to your heart's content.
1217 Some examples using our "shop" and "fragile" aliases from above:
1219 # These sites are very complex and require
1220 # minimal interference.
1222 .office.microsoft.com
1223 .windowsupdate.microsoft.com
1225 # Shopping sites - still want to block ads.
1228 .worldpay.com # for quietpc.com
1231 # These shops require pop-ups
1235 _________________________________________________________________
1237 3.3. The Filter File
1239 The filter file defines what filtering of web pages Junkbuster does.
1240 The default filter file is re_filterfile, located in the config
1241 directory. In this file, any document content, whether viewable text
1242 or embedded non-visible content, can be changed.
1244 This file uses regular expressions to alter or remove any string in
1245 the target page. Some examples from the included default
1248 Stop web pages from displaying annoying messages in the status bar by
1249 deleting such references:
1251 # The status bar is for displaying link targets, not pointless buzzwo
1253 # Again, check it out on http://www.airport-cgn.de/.
1254 s/status='.*?';*//ig
1256 Just for kicks, replace any occurrence of "Microsoft" with
1259 s/microsoft(?!.com)/MicroSuck/ig
1261 Kill those auto-refresh tags:
1263 # Kill refresh tags. I like to refresh myself. Manually.
1264 # check it out on http://www.airport-cgn.de/ and go to the arrivals p
1267 s/<meta[^>]*http-equiv[^>]*refresh.*URL=([^>]*?)"?>/<link rev="x-refr
1269 s/<meta[^>]*http-equiv="?page-enter"?[^>]*content=[^>]*>/<!--no page
1271 _________________________________________________________________
1273 4. Quickstart to Using Junkbuster
1275 Install package, then run and enjoy! Junbuster accepts only one
1276 command line option -- the configuration file to be used. Example Unix
1280 # /usr/sbin/junkbuster /etc/junkbuster/config &
1283 If no configuration file is specified on the command line, Junkbuster
1284 will look for a file named config in the current directory. Except on
1285 Amiga where it will look for AmiTCP:db/junkbuster/config and Win32
1286 where it will try junkbstr.txt. If no file is specified on the command
1287 line and no default configuration file can be found, Junkbuster will
1290 Be sure your browser is set to use the proxy which is by default at
1291 localhost, port 8000. With Netscape (and Mozilla), this can be set
1292 under Edit -> Preferences -> Advanced -> Proxies -> HTTP Proxy. For
1293 Internet Explorer: Tools > Internet Properties -> Connections -> LAN
1294 Setting. Then, check "Use Proxy" and fill in the appropriate info
1295 (Address: localhost, Port: 8000). Include if HTTPS proxy support too.
1297 The included default configuration files should give a reasonable
1298 starting point, though may be somewhat aggressive in blocking junk.
1299 You will probably want to keep an eye out for sites that require
1300 cookies, and add these to actionsfile as needed. By default, most of
1301 these will be blocked until you add them to the configuration. If you
1302 want the browser to handle this instead, you will need to edit
1303 actionsfile and disable this feature. If you use more than one
1304 browser, it would make more sense to let Junkbuster handle this. In
1305 which case, the browser(s) should be set to accept all cookies.
1307 If a particular site shows problems loading properly, try adding it to
1308 the {fragile} section of actionsfile. This will turn off most actions
1311 HTTP/1.1 support is not fully implemented. If browsers that support
1312 HTTP/1.1 (like Mozilla or recent versions of I.E.) experience
1313 problems, you might try to force HTTP/1.0 compatiblity. For Mozilla,
1314 look under Edit -> Preferences -> Debug -> Networking. Or set the
1315 "+downgrade" config option in actionsfile.
1317 After running Junkbuster for a while, you can start to fine tune the
1318 configuration to suit your personal, or site, preferences and
1319 requirements. There are many, many aspects that can be customized.
1320 "Actions" (from actionsfile) can be adjusted by pointing your browser
1321 to [39]http://i.j.b./, and then follow the link to "edit the actions
1322 list". (This is an internal page and does not require Internet
1325 In fact, various aspects of Junkbuster configuration can be viewed
1326 from this page, including current configuration parameters, source
1327 code version numbers, the browser's request headers, and "actions"
1328 that apply to a given URL. In addition to the actionsfile editor
1329 mentioned above, Junkbuster can also be turned "on" and "off" from
1332 If you encounter problems, please verify it is a Junkbuster bug, by
1333 disabling Junkbuster, and then trying the same page. Also, try another
1334 browser if possible to eliminate browser or site problems. Before
1335 reporting it as a bug, see if there is not a configuration option that
1336 is enabled that is causing the page not to load. You can then add an
1337 exception for that page or site. If a bug, please report it to the
1338 developers (see below).
1339 _________________________________________________________________
1341 5. Contact the Developers
1343 Feature requests and other questions should be posted to the
1344 [40]Feature request page at SourceForge. There is also an archive
1347 Anyone interested in actively participating in development and related
1348 discussions can join the appropriate mailing list [41]here. Archives
1349 are available here too.
1351 Please report bugs, using the form at [42]Sourceforge. Please try to
1352 verify that it is a Junkbuster bug, and not a browser or site bug
1353 first. Also, check to make sure this is not already a known bug.
1354 _________________________________________________________________
1356 6. Copyright and History
1360 Internet Junkbuster is free software; you can redistribute it and/or
1361 modify it under the terms of the GNU General Public License as
1362 published by the Free Software Foundation; either version 2 of the
1363 License, or (at your option) any later version.
1365 This program is distributed in the hope that it will be useful, but
1366 WITHOUT ANY WARRANTY; without even the implied warranty of
1367 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
1368 General Public License for more details, which is available from
1369 [43]the Free Software Foundation, Inc, 59 Temple Place - Suite 330,
1370 Boston, MA 02111-1307, USA.
1371 _________________________________________________________________
1375 Junkbuster was originally written by Anonymous Coders and
1376 [44]JunkBusters Corporation, and was released as free open-source
1377 software under the GNU GPL. [45]Stefan Waldherr made many
1378 improvements, and started the [46]SourceForge project to rekindle
1379 development. The last stable release was v2.0.2, which has now grown
1381 _________________________________________________________________
1385 [47]http://sourceforge.net/projects/ijbswa
1387 [48]http://ijbswa.sourceforge.net/
1391 [50]http://www.junkbusters.com/ht/en/cookies.html
1393 [51]http://www.waldherr.org/junkbuster/
1395 [52]http://privacy.net/analyze/
1397 [53]http://www.squid-cache.org/
1398 _________________________________________________________________
1402 8.1. Regular Expressions
1404 Junkbuster can use "regular expressions" in various config files.
1405 Assuming support for "pcre" (Perl Compatible Regular Expressions) is
1406 compiled in, which is the default. Such configuration directives do
1407 not require regular expressions, but they can be used to increase
1408 flexibility by matching a pattern with wildcards against URLs.
1410 If you are reading this, you probably don't understand what "regular
1411 expressions" are, or what they can do. So this will be a very brief
1412 introduction only. A full explanation would require a book ;-)
1414 "Regular expressions" is a way of matching one character expression
1415 against another to see if it matches or not. One of the "expressions"
1416 is a literal string of readable characters (letter, numbers, etc), and
1417 the other is a complex string of literal characters combined with
1418 wildcards, and other special characters, called metacharacters. The
1419 "metacharacters" have special meanings and are used to build the
1420 complex pattern to be matched against. Perl Compatible Regular
1421 Expressions is an enhanced form of the regular expression language
1422 with backward compatibility.
1424 To make a simple analogy, we do something similar when we use wildcard
1425 characters when listing files with the dir command in DOS. *.* matches
1426 all filenames. The "special" character here is the asterik which
1427 matches any and all characters. We can be more specific and use ? to
1428 match just individual characters. So "dir file?.text" would match
1429 "file1.txt", "file2.txt", etc. We are pattern matching, using a
1430 similar technique to "regular expressions"!
1432 Regular expressions do essentially the same thing, but are much, much
1433 more powerful. There are many more "special characters" and ways of
1434 building complex patterns however. Let's look at a few of the common
1435 ones, and then some examples:
1437 . - Matches any single character, e.g. "a", "A", "4", ":", or "@".
1439 ? - The preceding character or expression is matched ZERO or ONE
1442 + - The preceding character or expression is matched ONE or MORE
1445 * - The preceding character or expression is matched ZERO or MORE
1448 \ - The "escape" character denotes that the following character should
1449 be taken literally. This is used where one of the special characters
1450 (e.g. ".") needs to be taken literally and not as a special
1453 [] - Characters enclosed in brackets will be matched if any of the
1454 enclosed characters are encountered.
1456 () - Pararentheses are used to group a sub-expression, or multiple
1459 | - The "bar" character works like an "or" conditional statement. A
1460 match is successful if the sub-expression on either side of "|"
1463 s/string1/string2/g - This is used to rewrite strings of text.
1464 "string1" is replaced by "string2" in this example.
1466 These are just some of the ones you are likely to use when matching
1467 URLs with Junkbuster, and is a long way from a definitive list. This
1468 is enough to get us started with a few simple examples which may be
1471 /.*/banners/.* - A simple example that uses the common combination of
1472 "." and "*" to denote any character, zero or more times. In other
1473 words, any string at all. So we start with a literal forward slash,
1474 then our regular expression pattern (".*") another literal forward
1475 slash, the string "banners", another forward slash, and lastly another
1476 ".*". We are building a directory path here. This will match any file
1477 with the path that has a directory named "banners" in it. The ".*"
1478 matches any characters, and this could conceivably be more forward
1479 slashes, so it might expand into a much longer looking path. For
1480 example, this could match:
1481 "/eye/hate/spammers/banners/annoy_me_please.gif", or just
1482 "/banners/annoying.html", or almost an infinite number of other
1483 possible combinations, just so it has "banners" in the path somewhere.
1485 A now something a little more complex:
1487 /.*/adv((er)?ts?|ertis(ing|ements?))?/ - We have several literal
1488 forward slashes again ("/"), so we are building another expression
1489 that is a file path statement. We have another ".*", so we are
1490 matching against any conceivable sub-path, just so it matches our
1491 expression. The only true literal that must match our pattern is adv,
1492 together with the forward slashes. What comes after the "adv" string
1493 is the interesting part.
1495 Remember the "?" means the preceding expression (either a literal
1496 character or anything grouped with "(...)" in this case) can exist or
1497 not, since this means either zero or one match. So
1498 "((er)?ts?|ertis(ing|ements?))" is optional, as are the individual
1499 sub-expressions: "(er)", "(ing|ements?)", and the "s". The "|" means
1500 "or". We have two of those. For instance, "(ing|ements?)", can expand
1501 to match either "ing" OR "ements?". What is being done here, is an
1502 attempt at matching as many variations of "advertisement", and
1503 similar, as possible. So this would expand to match just "adv", or
1504 "advert", or "adverts", or "advertising", or "advertisement", or
1505 "advertisements". You get the idea. But it would not match
1506 "advertizements" (with a "z"). We could fix that by changing our
1507 regular expression to: "/.*/adv((er)?ts?|erti(s|z)(ing|ements?))?/",
1508 which would then match either spelling.
1510 /.*/advert[0-9]+\.(gif|jpe?g) - Again another path statement with
1511 forward slashes. Anything in the square brackets "[]" can be matched.
1512 This is using "0-9" as a shorthand expression to mean any digit one
1513 through nine. It is the same as saying "0123456789". So any digit
1514 matches. The "+" means one or more of the preceding expression must be
1515 included. The preceding expression here is what is in the square
1516 brackets -- in this case, any digit one through nine. Then, at the
1517 end, we have a grouping: "(gif|jpe?g)". This includes a "|", so this
1518 needs to match the expression on either side of that bar character
1519 also. A simple "gif" on one side, and the other side will in turn
1520 match either "jpeg" or "jpg", since the "?" means the letter "e" is
1521 optional and can be matched once or not at all. So we are building an
1522 expression here to match image GIF or JPEG type image file. It must
1523 include the literal string "advert", then one or more digits, and a
1524 "." (which is now a literal, and not a special character, since it is
1525 escaped with "\"), and lastly either "gif", or "jpeg", or "jpg". Some
1526 possible matches would include: "//advert1.jpg",
1527 "/nasty/ads/advert1234.gif", "/banners/from/hell/advert99.jpg". It
1528 would not match "advert1.gif" (no leading slash), or "/adverts232.jpg"
1529 (the expression does not include an "s"), or "/advert1.jsp" ("jsp" is
1530 not in the expression anywhere).
1532 s/microsoft(?!.com)/MicroSuck/i - This is a substitution. "MicroSuck"
1533 will replace any occurence of "microsoft". The "i" at the end of the
1534 expression means ignore case. The "(?!.com)" means the match should
1535 fail if "microsoft" is followed by ".com". In other words, this acts
1536 like a "NOT" modifier. In case this is a hyperlink, we don't want to
1539 We are barely scratching the surface of regular expressions here so
1540 that you can understand the default Junkbuster configuration files,
1541 and maybe use this knowledge to customize your own installation. There
1542 is much, much more that can be done with regular expressions. Now that
1543 you know enough to get started, you can learn more on your own :/
1545 More reading on Perl Compatible Regular expressions:
1546 [54]http://www.perldoc.com/perl5.6/pod/perlre.html
1550 1. http://ijbswa.sourceforge.net/user-manual/
1551 2. mailto:ijbswa-developers@lists.sourceforge.net
1552 3. file://localhost/home/swa/sf/current/doc/source/tmp.html#INTRODUCTION
1553 4. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN27
1554 5. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION
1555 6. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-SOURCE
1556 7. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-RH
1557 8. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-SUSE
1558 9. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-OS2
1559 10. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-WIN
1560 11. file://localhost/home/swa/sf/current/doc/source/tmp.html#INSTALLATION-OTHER
1561 12. file://localhost/home/swa/sf/current/doc/source/tmp.html#CONFIGURATION
1562 13. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN158
1563 14. file://localhost/home/swa/sf/current/doc/source/tmp.html#ACTIONSFILE
1564 15. file://localhost/home/swa/sf/current/doc/source/tmp.html#FILTERFILE
1565 16. file://localhost/home/swa/sf/current/doc/source/tmp.html#QUICKSTART
1566 17. file://localhost/home/swa/sf/current/doc/source/tmp.html#CONTACT
1567 18. file://localhost/home/swa/sf/current/doc/source/tmp.html#COPYRIGHT
1568 19. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN1161
1569 20. file://localhost/home/swa/sf/current/doc/source/tmp.html#AEN1167
1570 21. file://localhost/home/swa/sf/current/doc/source/tmp.html#SEEALSO
1571 22. file://localhost/home/swa/sf/current/doc/source/tmp.html#APPENDIX
1572 23. file://localhost/home/swa/sf/current/doc/source/tmp.html#REGEX
1574 25. http://sourceforge.net/projects/ijbswa/
1575 26. http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/ijbswa/current/
1576 27. http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&button=Search&key=emxrt.zip&stype=all&sort=type&dir=%2Fpub%2Fos2%2Fdev%2Femx%2Fv0.9d
1577 28. http://hobbes.nmsu.edu/cgi-bin/h-search?sh=1&key=gnupack&stype=all&sort=type&dir=%2Fpub%2Fos2%2Fapps
1578 29. http://www.gnu.org/
1580 31. file://localhost/home/swa/sf/current/doc/source/tmp.html#ACTIONSFILE
1584 35. http://i.j.b/show-url-info
1586 37. http://www.perldoc.com/perl5.6/pod/perlre.html
1587 38. file://localhost/home/swa/sf/current/doc/source/tmp.html#REGEX
1589 40. http://sourceforge.net/tracker/?atid=361118&group_id=11118&func=browse
1590 41. http://sourceforge.net/mail/?group_id=11118
1591 42. http://sourceforge.net/tracker/?group_id=11118&atid=111118
1592 43. http://www.gnu.org/copyleft/gpl.html
1593 44. http://www.junkbusters.com/ht/en/ijbfaq.html
1594 45. http://www.waldherr.org/junkbuster/
1595 46. http://sourceforge.net/projects/ijbswa/
1596 47. http://sourceforge.net/projects/ijbswa
1597 48. http://ijbswa.sourceforge.net/
1599 50. http://www.junkbusters.com/ht/en/cookies.html
1600 51. http://www.waldherr.org/junkbuster/
1601 52. http://privacy.net/analyze/
1602 53. http://www.squid-cache.org/
1603 54. http://www.perldoc.com/perl5.6/pod/perlre.html