From: oes Date: Sat, 18 Aug 2001 11:36:32 +0000 (+0000) Subject: Wrote a manpage, finally X-Git-Tag: v_2_9_9~148 X-Git-Url: http://www.privoxy.org/gitweb/%40user-manual%40%40actions-help-prefix%40HIDE-IF-MODIFIED-SINCE?a=commitdiff_plain;h=9f9883408d76529bf25cba580429ed68e4f74eda;p=privoxy.git Wrote a manpage, finally --- diff --git a/doc/pcrs.3 b/doc/pcrs.3 new file mode 100644 index 00000000..c6925652 --- /dev/null +++ b/doc/pcrs.3 @@ -0,0 +1,472 @@ +.\" Copyright (c) 2001 Andreas S. Oesterhelt +.\" +.\" This is free documentation; you can redistribute it and/or +.\" modify it under the terms of the GNU General Public License as +.\" published by the Free Software Foundation; either version 2 of +.\" the License, or (at your option) any later version. +.\" +.\" The GNU General Public License's references to "object code" +.\" and "executables" are to be interpreted as the output of any +.\" document formatting or typesetting system, including +.\" intermediate and printed output. +.\" +.\" This manual is distributed in the hope that it will be useful, +.\" but WITHOUT ANY WARRANTY; without even the implied warranty of +.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +.\" GNU General Public License for more details. +.\" +.\" You should have received a copy of the GNU General Public +.\" License along with this manual; if not, write to the Free +.\" Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111, +.\" USA. +.\" +.TH PCRS 3 "17 August 2001" +.SH NAME +pcrs - Perl-compatible regular substitution. +.SH SYNOPSIS +.br +.BI "#include " +.PP +.br +.BI "pcrs_job *pcrs_compile(const char *" "pattern" "," +.ti +5n +.BI "const char *" "substitute" ", const char *" "options" "," +.ti +5n +.BI "int *" "errptr" ");" +.PP +.br +.BI "pcrs_job *pcrs_compile_command(const char *" "command" "," +.ti +5n +.BI "int *" "errptr"); +.PP +.br +.BI "int pcrs_execute(pcrs_job *" "job" ", char *" "subject" "," +.ti +5n +.BI "int " "subject_length" ", char **" "result" "," +.ti +5n +.BI "int *" "result_length" ");" +.PP +.br +.BI "int pcrs_execute_list (pcrs_job *" "joblist" ", char *" "subject" "," +.ti +5n +.BI "int " "subject_length" ", char **" "result" "," +.ti +5n +.BI "int *" "result_length" ");" +.PP +.br +.BI "pcrs_job *pcrs_free_job(pcrs_job *" "job" ");" +.PP +.br +.BI "void pcrs_free_joblist(pcrs_job *" "joblist" ");" +.PP +.br +.BI "char *pcrs_strerror(int " "err" ");" +.PP +.br + +.SH DESCRIPTION +The +.SM PCRS +library is a supplement to the +.SB PCRE(3) +library that implements +.RB "regular expression based substitution, like provided by " "Perl(1)" "'s 's'" +operator. It uses the same syntax and semantics as Perl 5, with just a few +differences (see below). + +In a first step, the information on a substitution, i.e. the pattern, the +substitute and the options are compiled from Perl syntax to an internal form +.RB "called " "pcrs_job" " by using either the " "pcrs_compile()" " or " +.BR "pcrs_compile_command()" " functions." + +Once the job is compiled, it can be used on subjects, which are arbitrary +memory areas containing string or binary data, by calling +.BR "pcrs_execute()" ". Jobs can be chained to joblists and whole" +.RB "joblists can be applied to a subject using " "pcrs_execute_list()" "." + +There are also convenience functions for freeing the jobs and for errno-to-string +.RB "conversion, namely " "pcrs_free_job()" ", " "pcrs_free_joblist()" " and " +.BR "pcrs_strerror()" "." + +.SH COMPILING JOBS +.RB "The function " "pcrs_compile()" " is called to compile a " "pcrs_job" +.RI "from a " "pattern" ", " "substitute" " and " "options" " string. +.RB "The resulting " "pcrs_job" " structure is dynamically allocated and it" +.RB "is the caller's responsibility to " "free()" " it when it's no longer needed." + +.BR "pcrs_compile_command()" " is a convenience wrapper function that parses a Perl" +.IR "command" " of the form" +.BI "s/" "pattern" "/" "substitute" "/[" "options" "]" +.RB "into its components and then calls " "pcrs_compile()" ". As in Perl, you" +.RB "are not bound to the '" "/" "' character: Whatever" +.RB "follows the '" "s" "' will be used as the delimiter. Patterns or substitutes" +that contain the delimiter need to quote it: +\fBs/th\\/is/th\\/at/\fR +.RB "will replace " "th/is" " by " "th/at" " and can be written more simply as" +.BR "s|th/is|th/at|" "." + +.IR "pattern" ", " "substitute" ", " "options" " and " "command" " must be" +.RI "zero-terminated C strings. " "substitute" " and " "options" " may be" +.BR "NULL" ", in which case they are treated like the empty string." + +.SS "Return value and diagnostics" +On success, both functions return a pointer to the compiled job. +.RB "On failure, " "NULL" +.RI "is returned. In that case, the pcrs error code is written to *" "err" "." + +.SS Patterns +.RI "For the syntax of the " "pattern" ", see the " +.BR "PCRE(3)" " manual page." + +.SS Substitutes +.RI "The " "substitute" " uses" +.RB "Perl syntax as documented in the " "perlre(1)" " manual page, with" +some exceptions: + +Most notably and evidently, since +.SM PCRS +is not Perl, variable interpolation or Perl command substitution won't work. +Special variables that do get interpolated, are: +.TP +.B "$1, $2, ..., $n" +Like in Perl, these variables refer to what the nth capturing subpattern +in the pattern matched. +.TP +.B "$& and $0" +.RB "refer to the whole match. Note that " "$0" " is deprecated in recent" +Perl versions and now refers to the program name. +.TP +.B "$+" +refers to what the last capturing subpattern matched. +.TP +.BR "$` and $'" " (backtick and tick)" +.RI "refer to the areas of the " "subject" " before and after the match, respectively." +.RB "Note that, like in Perl, the " "unmodified" " subject is used, even" +if a global substitution previously matched. + +.PP +Perl4-style references to subpattern matches of the form +\fB\\1, \\2, ...\fR +.RB "which only exist in Perl5 for backwards compatibility, are " "not" +supported. + +Also, since the substitute is a double-quoted string in Perl, you +might expect all Perl syntax for special characters to apply. In fact, +only the following are supported: + +.TP +\fB\\n\fR +newline (0x0a) +.TP +\fB\\r\fR +carriage return (0x0d) +.TP +\fB\\t\fR +horizontal tab (0x09) +.TP +\fB\\f\fR +form feed (0x0c) +.TP +\fB\\b\fR +backspace (0x08) +.TP +\fB\\a\fR +alarm, bell (0x07) +.TP +\fB\\e\fR +escape (0x1b) +.TP +\fB\\0\fR +binary zero (0x00) + +.SS "Options" +.RB "The options " "gmisx" " are supported. " "e" " is not, since it would" +.RB "require a Perl interpreter and neither is " o ", because the pattern +is explicitly compiled, anyway. Additionally, +.SM PCRS +.RB "honors the options " "U" " and " "T" "." +Where +.SM PCRE +.RB "options are mentioned below, refer to " PCRE(3) " for the subtle differences" +to Perl behaviour. + +.TP +.B g +.RB "Replace " all " instances of" +.IR pattern " in " subject , +not just the first one. + +.TP +.B i +.RI "Match the " pattern " without respect to case. This translates to" +.SM PCRE_CASELESS. + +.TP +.B m +.RI "Treat the " subject " as consisting of multiple lines, i.e." +.RB ' ^ "' matches immediately after, and '" $ "' immediately before each newline." +Translates to +.SM PCRE_MULTILINE. + +.TP +.B s +.RI "Treat the " subject " as consisting of one single line, i.e." +.RB "let the scope of the '" . "' metacharacter include newlines." +Translates to +.SM PCRE_DOTALL. + +.TP +.B x +.RI "Allow extended regular expression syntax in the " pattern "," +.RB "enabling whitespace and comments in complex patterns." +Translates to +.SM PCRE_EXTENDED. + +.TP +.B U +.RB "Switch the default behaviour of the '" * "' and '" + "' quantifiers" +.RB "to ungreedy. Note that appending a '" ? "' switches back to greedy(!)." +.RB "The explicit in-pattern switches " (?U) " and " (?-U) " remain unaffected." +Translates to +.SM PCRE_UNGREEDY. + +.TP +.B T +.RI "Consider the " substitute " trivial, i.e. do not interpret any references" +or special character escape sequences in the substitute. Handy for large +user-supplied substitutes, which would otherwise have to be examined and properly +quoted. + +.PP +Unsupported options are silently ignored. + +.SH EXECUTING JOBS + +.RI "Calling " pcrs_execute() " produces a modified copy of the " subject ", in which" +.RB "the first (or all, if the '" g "' option was given when compiling the job)" +.RI "occurance(s) of the job's " pattern " in the " subject " is replaced by the job's" +.IR substitute . + +.RI "The first " subject_length " bytes following " subject " are processed, so" +.RI "a " subject_length " that exceeds the actual " subject " is dangerous." +Note that if you want to get your zero-terminated C strings back including their +.RI "termination, you must let " subject_length " include the binary zero, i.e." +set it to +.BI strlen( subject ") + 1." + +.RI "The " subject " itself is left untouched, and the " *result " is dynamically" +.RB "allocated, so it is the caller's responsibility to " free() " it when it's" +no longer needed. + +.RI "The result's length is written to " *result_length "." + +.RB "If the job matched, the " PCRS_SUCCESS " flag in" +.IB job ->flags +is set. + +.SS Return value and diagnostics + +.RB "On success, " pcrs_execute() " returns the number of substitutions that" +were made, which is limited to 0 or 1 for non-global searches. +.RI "On failure, a negative error code is returned and " *result " is set" +.RB "to " NULL . + +.SH FREEING JOBS +.RB "It is not sufficient to call " free() " on a " pcrs_job ", because it " +contains pointers to other dynamically allocated structures. +.RB "Use " pcrs_free() " instead. It is safe to pass " NULL " pointers " +.RB "(or pointers to invalid " pcrs_job "s that contain " NULL " pointers" +.RB "to dependant structures) to " pcrs_free() "." + +.SS Return value +.RB "The value of the job's " next " pointer." + + +.SH CHAINING JOBS + +.SM PCRS +.RB "supports to some extent the chaining of multiple " pcrs_job " structures by" +.RB "means of their " next " member." + +Chaining the jobs is up to you, but once you have built a linked list of jobs, +.RI "you can execute a whole " joblist " on a given subject by" +.RB "a single call to " pcrs_execute_list() ", which will sequentially traverse" +.RB "the linked list until it reaches a " NULL " pointer, and call " pcrs_execute() +.RI "for each job it encounters, feeding the " result " and " result_length " of each" +.RI "call into the next as the " subject " and " subject_length ". As in the single" +.RI "job case, the original " subject " remains untouched, but all interim " result "s" +.RB "are of course " free() "d. The return value is the accumulated number of matches" +.RI "for all jobs in the " joblist "." +.RI "Note that while this is handy, it reduces the diagnostic value of " err ", since " +you won't know which job failed. + +.RI "In analogy, you can free all jobs in a given " joblist " by calling" +.BR pcrs_free_joblist() . + +.SH QUOTING +The quote character is (surprise!) '\fB\\\fR'. It quotes the delimiter in a +.IR command ", the" +.RB ' $ "' in a" +.IR substitute ", and, of course, itself. Note that the" +.RB ' $ "'doesn't need to be quoted if it isn't followed by " [0-9+'`&] "." + +.RI "For quoting in the " pattern ", please refer to" +.BR PCRE(3) . + +.SH DIAGNOSTICS + +.RB "When " compiling " a job either via the " pcrs_compile() " or " pcrs_compile_command() +.RB "functions, you know that something went wrong when you are returned a " NULL " pointer." +.RI "In that case, or in the event of non-fatal warnings, the integer pointed to by " err +contains a nonzero error code, which is either a passed-through +.SM PCRE +error code or one generated by +.SM PCRS. +Under normal circumstances, it can take the following values: +.TP +.B PCRE_ERROR_NOMEMORY +While compiling the pattern, +.SM PCRE +ran out of memory. +.TP +.B PCRS_ERR_NOMEM +While compiling the job, +.SM PCRS +ran out of memory. +.TP +.B PCRS_ERR_CMDSYNTAX +.BR pcrs_compile_command() " didn't find four tokens while parsing the" +.IR command . +.TP +.B PCRS_ERR_STUDY +A +.SM PCRE +.RB "error occured while studying the compiled pattern. Since " pcre_study() +only provides textual diagnostic information, the details are lost. +.TP +.B PCRS_WARN_BADREF +.RI "The " substitute " contains a reference to a capturing subpattern that" +.RI "has a higher index than the number of capturing subpatterns in the " pattern +or that exceeds the current hard limit of 33 (See LIMITATIONS below). As in Perl, +this is non-fatal and results in substitutions with the empty string. + +.PP +.RB "When " executing " jobs via " pcrs_execute() " or " pcrs_execute_list() "," +.RI "a negative return code indicates an error. In that case, *" result +.RB "is " NULL ". Possible error codes are:" +.TP +.B PCRE_ERROR_NOMEMORY +While matching the pattern, +.SM PCRE +ran out of memory. This can only happen if there are more than 33 backrefrences +.RI "in the " pattern "(!)" +.BR and " memory is too tight to extend storage for more." +.TP +.B PCRS_ERR_NOMEM +While executing the job, +.SM PCRS +ran out of memory. +.TP +.B PCRS_ERR_BADJOB +.RB "The " pcrs_job "* passed to " pcrs_execute " was NULL, or the" +.RB "job is bogus (it contains " NULL " pointers to the compiled +pattern, extra, or substitute). + +.PP +If you see any other +.SM PCRE +error code passed through, you've either messed with the compiled job +or found a bug in +.SM PCRS. +Please send me an email. + +.RB "Ah, and don't look for " PCRE_ERROR_NOMATCH ", since this" +is not an error in the context of +.SM PCRS. +.RI "Should there be no match, an exact copy of the " subject " is" +.RI "found at *" result " and the return code is 0 (matches)." + +All error codes can be translated into human readable text by means +.RB "of the " pcrs_strerror() " function." + + +.SH EXAMPLE +A trivial command-line test program for +.SM PCRS +might look like: + +.nf +#include +#include + +int main(int Argc, char **Argv) +{ + pcrs_job *job; + char *result; + int newsize, err; + + if (Argc != 3) + { + fprintf(stderr, "Usage: %s s/pattern/substitute/[options] subject\\n", Argv[0]); + return 1; + } + + if (NULL == (job = pcrs_compile_command(Argv[1], &err))) + { + printf("Compile error: %s (%d).\\n", pcrs_strerror(err), err); + } + + if (0 > (err = pcrs_execute(job, Argv[2], strlen(Argv[2]) + 1, &result, &newsize))) + { + printf("Exec error: %s (%d).\\n", pcrs_strerror(err), err); + } + + /* Will tolerate NULL result */ + printf("Result: *%s*\\n", result); + + pcrs_free_job(job); + if (result) free(result); + + return 0; +} + +.fi + + +.SH LIMITATIONS +The number of matches that a global job can have is only limited by the +available memory. An initial storage for 40 matches is reserved, which +is dynamically resized by the factor 1.6 if exhausted. + +The number of capturing subpatterns is currently limited to 33, which +is a Bad Thing[tm]. It should be dynamically expanded until it reaches the +.SM PCRE +limit of 99. + +All of the above values can be adjusted in the "Capacity" section +.RB "of " pcrs.h "." + +The Perl-style escape sequences for special characters \\\fInnn\fR, +\\x\fInn\fR, and \\c\fIX\fR are currently unsupported. + +.SH BUGS +This library has only been tested in the context of one application +and should be considered high risk. + +.SH HISTORY +.SM PCRS +was originally written for the Internet Junkbuster project +(http://sourceforge.net/projects/ijbswa/). + +.SH SEE ALSO +.B PCRE(3), perl(1), perlre(1) + +.SH AUTHOR + +.SM PCRS +is Copyright 2000, 2001 by Andreas Oesterhelt and is +licensed under the Terms of the GNU Lesser General Public License (LGPL), which +should be included in this distribution. + +If not, refer to http://www.gnu.org/licenses/lgpl.html or write to the Free +Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. \ No newline at end of file