Unlike commonly used regexp libraries, regular expressions are not strings: instead a first class syntax is used to define them.
Felix allows you to name regular expressions with the syntax:
regexp <name> = <regexp> ;The name is an identifier. A string used in a regexp stands for a match of each character of the string in sequence. The following symbols are special, and are given from weakest to strongest binding order:
| symbol | syntax | meaning |
|---|---|---|
| | | infix | alternatives |
| * | postfix | 0 or more occurences |
| + | postfix | 1 or more occurences |
| ? | postfix | 0 or 1 occurences |
| <juxtaposition> | infix | concatenation |
| <name> | atomic | re denoted by the name in a REGEXP definition |
| <string> | atomic | sequence of chars of the string |
| [<charset>] | atomic | any char of the charset |
| [^<charset>] | atomic | any char not in the charset |
| . | atomic | any char other than end of line |
| _ | atomic | any char |
| eof | atomic | end marker |
| (<regexp>) | atomic | brackets |
| symbol | meaning |
|---|---|
| <string> | any character in the string |
| <char>-<char> | any between or including the two chars |
1: include "std"; 2: regexp lower = ["abcdefghijklmnopqrstuvwxyz"]; 3: regexp upper = ["ABCDEFGHIJKLMNOPQRSTUVWXYZ"]; 4: regexp digit = ["0123456789"]; 5: regexp alpha = lower | upper | "_"; 6: regexp id = alpha (alpha | digit) *; 7:
8: print 9: regmatch "identifier" with 10: | digit+ => "Number" 11: | id => "Identifier" 12: endmatch 13: ; 14: endl; 15: 16: print 17: regmatch "9999" with 18: | digit+ => "Number" 19: | id => "Identifier" 20: endmatch 21: ; 22: endl; 23: 24: print 25: regmatch "999xxx" with 26: | digit+ => "Number" 27: | id => "Identifier" 28: | _* => "Neither" 29: endmatch 30: ; 31: endl; 32: 33:
Note: the generated code is *extremely* fast, within one or two memory fetches of the fastest possible code. here is the generated code for the inner loop of a regmatch:
while(state && start != end)
state = matrix[*start++][state];