T H E   C O M B   P R O J E C T

  Common Object Meta Builder.
  Idea copyright (C) 2007 Matous Jan Fialka.
  Released under the terms of GNU/FDL.



IMPORTANT NOTES BEFORE YOU START READING

  Everything in this document that is enclosed by "[" and "]" in syntax
  definitions, tables or in text enclosed by double quotes means regular
  expression as defined in POSIX 1003.2.

  In syntax definitions, text enclosed by double quotes and in example
  outputs as well every three-digit number preceded with back-quote
  ("\") means octal representation in ASCII character table.

  Curly brackets, "{" and "}", in syntax definition groups more than one
  thing together and the "pipe" character, "|", means logical OR.

  In every example in this document the "=>" means some "middle product"
  of the parser and "->" means the expected final output that goes to
  standard output in debugging mode or that is passed to the specified
  interpreter in normal, processing mode.



INPUT

  Syntax:

    <input>   = <ascii> <input>
    <ascii>   = [\000-\377]



SPECIAL CHARACTERS

  Syntax:

    <sc>      = [\(\)\{\}\[\]\<\,\ \t\r\n\v\f\^\-\\]

  Table of Special Characters:

    CHAR            TOKEN     MEANING

    [\(]            CO        Combination Opening
    [\)]            CC        Combination Closure
    [\{]            RCO       Reverse Combination Opening
    [\}]            RCC       Reverse Combination Closure
    [\[]            CRO       Character Range Opening
    [\]]            CRC       Character Range Closure
    [\<]            FIO       File Inclusion
    [\,\t\r\n\v\f]  FS        Field Separator
    [\-]            CRD       Range Distinquisher
    [\ \t]          BS        Blank Space
    [\\]            QC        Quote Character
    [\^]            CSC       Control Sequence Constructor



QUOTING

  Syntax:

    <quoted>  = <qc> <ascii>

  Logic:

    if read <qc> <sc>
    then
            write <sc>
    else
            act
    fi



CONTROL SEQUENCES

  Syntax:

    <ctlseq>  = { <qc> <ic> } | { <csc> <cc> }
    <qc>      = [\\]
    <csc>     = [\^]
    <ic>      = [0qabtnvfreld]
    <cc>      = [@A-Z\[\]\^\_\?\\]

  Table of Interpreted Characters:

    CHAR    ASCII   HEX     MEANING

    [0]     NUL     00
    [q]     EOT     04      end of transmission (active)
    [a]     BEL     07
    [b]     BS      08      back space
    [t]     HT      09
    [n]     LF      0a
    [v]     VT      0b
    [f]     FF      0c
    [r]     CR      0d
    [e]     ESC     1b
    [d]     DEL     7f      delete (active)

  Table of Control Characters:

    CHAR    ASCII   HEX     MEANING

    [@]     NUL     00
    [A]     SOH     01
    [B]     STX     02
    [C]     ETX     03
    [D]     EOT     04      end of transmission (passive)
    [E]     ENQ     05
    [F]     ACK     06
    [G]     BEL     07
    [H]     BS      08      back space
    [I]     HT      09
    [J]     LF      0a
    [K]     VT      0b
    [L]     FF      0c
    [M]     CR      0d
    [N]     SO      0e
    [O]     SI      0f
    [P]     DLE     10
    [Q]     DC1     11
    [R]     DC2     12
    [S]     DC3     13
    [T]     DC4     14
    [U]     NAK     15
    [V]     SYN     16
    [W]     ETB     17
    [X]     CAN     18
    [Y]     EM      19
    [Z]     SUB     1a
    [\[]    ESC     1b
    [\\]    FS      1c
    [\]]    GS      1d
    [\^]    RS      1e
    [\_]    US      1f
    [\?]    DEL     7f      delete (passive)



QUOTING EXAMPLES

  Hello\, World!
  =>
  (Hello\, World!)
  ->
  Hello, World!

  Hello\,\tWorld!
  =>
  (Hello\,        World!)
  ->
  Hello,        World!

  Hello\,\n World!
  =>
  (Hello\,
   World!)
  ->
  Hello,
   World!

  Hello\,^J World!
  =>
  (Hello\,
   World!)
  ->
  Hello,
   World!

  Hello\, World!\b!^?!
  =>
  (Hello\, World!\010!\177!)
  ->
  Hello, World!

  Hello\q\, World!
  =>
  (Hello\004\, World!)
  ->
  Hello



CHARACTER RANGES EXAMPLES

  [a-c]
  =>
  (a,b,c)

  [0-9a-f]
  =>
  (0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f)

  [1357a-c]
  =>
  (1,3,5,7,a,b,c)

  [12[3-5]]
  =>
  (1,2,3,4,5)



COMBINATIONS EXAMPLES

  (foo(1,2,3))
  =>
  (foo1,foo2,foo3)
  ->
  foo1
  foo2
  foo3

  {foo(1,2,3)}
  =>
  {foo1,foo2,foo3}
  ->
  foo3
  foo2
  foo1

  (foo{1,2,3})
  =>
  (foo3,foo2,foo1)
  ->
  foo3
  foo2
  foo1

  {foo(bar{1,(2,3)},uff)}
  =>
  {foobar3,foobar2,foobar1,oooo}
  ->
  foooooo
  foobar1
  foobar2
  foobar3

  (foo[1-3])
  =>
  (foo,(1,2,3))
  =>
  (foo,1,2,3)
  ->
  foo
  1
  2
  3

  {foo([1-3])}
  =>
  {foo(1,2,3)}
  =>
  {foo1,foo2,foo3}
  ->
  foo3
  foo2
  foo1

  Notice that the all the FS (Field Separator) characters are separating
  the combinational fields! Thus, for instance, sequence

  (a1,a2,a3)

  is pretty similar to, for instance,

  (
     a1,
     a2,
     a3
  )

  or even, for instance,

  (a1
    a2
     a3)

  et cetera. This is why COMB is so powerful and user's friendly tool!
  In all the above three cases the final output will be:

  ->
  a1
  a2
  a3



FILE INCLUSION EXAMPLES

  < /path/to/filename
  ->
  (file content gets here)

  < /path/to/filename{1,2,3}, foo bar
  =>
  < /path/to/filename3 
  < /path/to/filename2 
  < /path/to/filename1 
  foo bar



ARGUMENT OPTIONS

  Run-time options are given to the program's argument in a quite
  strange manner. It is so, because GNU/Linux's "shebang" does not
  handle more than one argument in natural. What a pity! Therefor
  argument consists of several, colon (":") separated, options.
  
  Each option has it's value argument. Options' values are separated
  from the options itself by the equal sign ("="). If more than one
  argument needed to be passed to an option separate it from the others
  by comma (",").

  If space character (" ") is needed to be written anywhere in an
  program's argument, it MUST be replased with underscore sign ("_").
  Underscore sign itself, if needed, MUST be quoted with backslash
  character ("\"). To use an backslash itself double it.

  You also need to quote all the other characters used as argument
  special characters or enclose them in double quotes.

  Argument options are: INTERPRETER, STEP, DEBUG, SOURCE and BLANK.


  Interpreter

    Option: INTERPRETER

    Description:
    
      A program (interpreter) to pass parsed output to (in either
      step-by-step or normal mode).

    Default: "/bin/sh_--posix"

    Example: #! /usr/bin/comb INTERPRETER=/bin/echo_-e_-n_:STEP=1


  Step-by-step mode

    Option: STEP

    Values: "[01]" (where "0" means false, "1" means true)

    Description:
    
      If step-by-step mode is on, every single output rule is passed to
      interpreter a time.

    Default: "0"

    Example: #! /usr/bin/comb INTERPRETER=/sbin/iptables-restore_


  Debugging

    Option: DEBUG

    Values: "[01]" (where "0" means false, "1" means true)

    Description:

      If in debug (dry-run) mode, rules are just beying written to the
      standard output and are not passed to the interpreter. Several
      debugging information are written to the standard error output as
      well.

    Default: "0"

    Example: #! /usr/bin/comb INTERPRETER=/sbin/iptables_:DEBUG=1

    Input example:

      ---
      #! /usr/bin/comb INTERPRETER=/sbin/iptables_:STEP=1:DEBUG=1

      -P (INPUT, FORWARD, OUTPUT) DROP

      -A INPUT -i (
        lo+
        eth0 (
          -m state --state RELATED\,ESTABLISHED
          -p tcp --dport 22
        )
      ) -j ACCEPT

      -A OUTPUT -o (lo+, eth0) -j ACCEPT
      ---

    Debug output example:

      ---
      COMB is Common Object Meta Builder (version 1.0).
      Copyright (C) 2007 Guy Josef Wiltfang, Matous Jan Fialka.
      Released under the terms of GNU/GPL.

      # Runtime options...

      % INTERPRETER = "/sbin/iptables "
      % STEP = 1
      % DEBUG = 1
      % SOURCE = "/dev/stdin"
      % BLANK = 0

      # Preparsed source code...

      < "/dev/stdin"
      | -P (INPUT,FORWARD,OUTPUT) DROP
      | -A INPUT -i (
      | lo+
      | eth0 (
      | -m state --state RELATED\,ESTABLISHED
      | -p tcp --dport 22
      | )
      | ) -j ACCEPT
      | -A OUTPUT -o (lo+,eth0) -j ACCEPT

      # Parser output...

      ! "/sbin/iptables "
      + -P INPUT DROP
      + -P FORWARD DROP
      + -P OUTPUT DROP
      + -A INPUT -i lo+ -j ACCEPT
      + -A INPUT -i eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
      + -A INPUT -i eth0 -p tcp --dport 22 -j ACCEPT
      + -A OUTPUT -o lo+ -j ACCEPT
      + -A OUTPUT -o eth0 -j ACCEPT

      # Runtime statistics...

      ? RULES: 3
      ? EVALS: 4
      ? NESTS: 1
      ? STEPS: 8

      # End of transmission...
      ---

    As you can see, the debugging output is fully customized for
    parsing. You can easily get the rules part, the steps part, the
    options settings, the run-time statistics, messages or the plain
    text around.


  Sourcing files

    Option: SOURCE

    Description:

      A list of files to be sourced in the parser.

    Default: "/dev/stdin"

    Example: #! /usr/bin/comb SOURCE=file1,file2


  Processing blank output

    Option: BLANK

    Values: "[01]" (where "0" means false, "1" means true)

    Description:

      If blank processing is set on, blank output will be sent to the
      interpreter as well.

    Default: "0"

    Example: #! /usr/bin/comb INTERPRETER=/bin/echo_-e_:BLANK:1



REAL LIFE EXAMPLE

  There is a small example COMB script from real life. Why to write
  expensive scripts using BASH or AWK or Perl or whatever if you have
  COMB? In next example suppose you have two lists of IP addresses
  (separated by either commas or newline characters) in two files in
  directory /etc/iptables/ named with ".iplist" extension. You can
  write a small COMB pre-processor that will generate rules for the
  Netfilter from the two files. The script can look like this:

    ---
    #! /usr/bin/comb INTERPRETER=/sbin/iptables_:STEP=1

    -P (INPUT, FORWARD, OUTPUT) DROP

    -A INPUT (
            -i eth0 (
                    -m state --state RELATED\,ESTABLISHED
                    -p tcp --dport (
                            ssh -s ( < /etc/iptables/ssh.iplist )
                            ftp -s ( < /etc/iptables/ftp.iplist )
                    )
            )
    ) -j ACCEPT 

    -A OUTPUT -o eth0 -j ACCEPT

    \quit

    This will never get reached because the "\q" sequence above ended
    the transmission just as if you pressed the Control-D sequence in
    terminal. This way large documentation or comments can be included
    in the end of COMB source code. Isn't it handy?
    ---


  For more examples, please chek out some test macros.



TIPS AND TRICKS

  1. Separate your long comments, poems or whatever from the source
     code with the EOT control sequence! Have more fun!



Last edited on Wed Jul  4 21:49:35 CEST 2007.