mumble

A Lisp written in C, following the *Build Your Own Lisp* book
Log | Files | Refs | README

README.md (44241B)


      1 Micro Parser Combinators
      2 ========================
      3 
      4 Version 0.9.0
      5 
      6 
      7 About
      8 -----
      9 
     10 _mpc_ is a lightweight and powerful Parser Combinator library for C.
     11 
     12 Using _mpc_ might be of interest to you if you are...
     13 
     14 * Building a new programming language
     15 * Building a new data format
     16 * Parsing an existing programming language
     17 * Parsing an existing data format
     18 * Embedding a Domain Specific Language
     19 * Implementing [Greenspun's Tenth Rule](http://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule)
     20 
     21 
     22 Features
     23 --------
     24 
     25 * Type-Generic
     26 * Predictive, Recursive Descent
     27 * Easy to Integrate (One Source File in ANSI C)
     28 * Automatic Error Message Generation
     29 * Regular Expression Parser Generator
     30 * Language/Grammar Parser Generator
     31 
     32 
     33 Alternatives
     34 ------------
     35 
     36 The current main alternative for a C based parser combinator library is a branch of [Cesium3](https://github.com/wbhart/Cesium3/tree/combinators).
     37 
     38 _mpc_ provides a number of features that this project does not offer, and also overcomes a number of potential downsides:
     39 
     40 * _mpc_ Works for Generic Types
     41 * _mpc_ Doesn't rely on Boehm-Demers-Weiser Garbage Collection
     42 * _mpc_ Doesn't use `setjmp` and `longjmp` for errors
     43 * _mpc_ Doesn't pollute the namespace
     44 
     45 
     46 Quickstart
     47 ==========
     48 
     49 Here is how one would use _mpc_ to create a parser for a basic mathematical expression language.
     50 
     51 ```c
     52 mpc_parser_t *Expr  = mpc_new("expression");
     53 mpc_parser_t *Prod  = mpc_new("product");
     54 mpc_parser_t *Value = mpc_new("value");
     55 mpc_parser_t *Maths = mpc_new("maths");
     56 
     57 mpca_lang(MPCA_LANG_DEFAULT,
     58   " expression : <product> (('+' | '-') <product>)*; "
     59   " product    : <value>   (('*' | '/')   <value>)*; "
     60   " value      : /[0-9]+/ | '(' <expression> ')';    "
     61   " maths      : /^/ <expression> /$/;               ",
     62   Expr, Prod, Value, Maths, NULL);
     63 
     64 mpc_result_t r;
     65 
     66 if (mpc_parse("input", input, Maths, &r)) {
     67   mpc_ast_print(r.output);
     68   mpc_ast_delete(r.output);
     69 } else {
     70   mpc_err_print(r.error);
     71   mpc_err_delete(r.error);
     72 }
     73 
     74 mpc_cleanup(4, Expr, Prod, Value, Maths);
     75 ```
     76 
     77 If you were to set `input` to the string `(4 * 2 * 11 + 2) - 5`, the printed output would look like this.
     78 
     79 ```
     80 >
     81   regex
     82   expression|>
     83     value|>
     84       char:1:1 '('
     85       expression|>
     86         product|>
     87           value|regex:1:2 '4'
     88           char:1:4 '*'
     89           value|regex:1:6 '2'
     90           char:1:8 '*'
     91           value|regex:1:10 '11'
     92         char:1:13 '+'
     93         product|value|regex:1:15 '2'
     94       char:1:16 ')'
     95     char:1:18 '-'
     96     product|value|regex:1:20 '5'
     97   regex
     98 ```
     99 
    100 Getting Started
    101 ===============
    102 
    103 Introduction
    104 ------------
    105 
    106 Parser Combinators are structures that encode how to parse particular languages. They can be combined using intuitive operators to create new parsers of increasing complexity. Using these operators detailed grammars and languages can be parsed and processed in a quick, efficient, and easy way.
    107 
    108 The trick behind Parser Combinators is the observation that by structuring the library in a particular way, one can make building parser combinators look like writing a grammar itself. Therefore instead of describing _how to parse a language_, a user must only specify _the language itself_, and the library will work out how to parse it ... as if by magic!
    109 
    110 _mpc_ can be used in this mode, or, as shown in the above example, you can specify the grammar directly as a string or in a file.
    111 
    112 Basic Parsers
    113 -------------
    114 
    115 ### String Parsers
    116 
    117 All the following functions construct new basic parsers of the type `mpc_parser_t *`. All of those parsers return a newly allocated `char *` with the character(s) they manage to match. If unsuccessful they will return an error. They have the following functionality.
    118 
    119 * * *
    120 
    121 ```c
    122 mpc_parser_t *mpc_any(void);
    123 ```
    124 
    125 Matches any individual character
    126 
    127 * * *
    128 
    129 ```c
    130 mpc_parser_t *mpc_char(char c);
    131 ```
    132 
    133 Matches a single given character `c`
    134 
    135 * * *
    136 
    137 ```c
    138 mpc_parser_t *mpc_range(char s, char e);
    139 ```
    140 
    141 Matches any single given character in the range `s` to `e` (inclusive)
    142 
    143 * * *
    144 
    145 ```c
    146 mpc_parser_t *mpc_oneof(const char *s);
    147 ```
    148 
    149 Matches any single given character in the string  `s`
    150 
    151 * * *
    152 
    153 ```c
    154 mpc_parser_t *mpc_noneof(const char *s);
    155 ```
    156 
    157 Matches any single given character not in the string `s`
    158 
    159 * * *
    160 
    161 ```c
    162 mpc_parser_t *mpc_satisfy(int(*f)(char));
    163 ```
    164 
    165 Matches any single given character satisfying function `f`
    166 
    167 * * *
    168 
    169 ```c
    170 mpc_parser_t *mpc_string(const char *s);
    171 ```
    172 
    173 Matches exactly the string `s`
    174 
    175 
    176 ### Other Parsers
    177 
    178 Several other functions exist that construct parsers with some other special functionality.
    179 
    180 * * *
    181 
    182 ```c
    183 mpc_parser_t *mpc_pass(void);
    184 ```
    185 
    186 Consumes no input, always successful, returns `NULL`
    187 
    188 * * *
    189 
    190 ```c
    191 mpc_parser_t *mpc_fail(const char *m);
    192 mpc_parser_t *mpc_failf(const char *fmt, ...);
    193 ```
    194 
    195 Consumes no input, always fails with message `m` or formatted string `fmt`.
    196 
    197 * * *
    198 
    199 ```c
    200 mpc_parser_t *mpc_lift(mpc_ctor_t f);
    201 ```
    202 
    203 Consumes no input, always successful, returns the result of function `f`
    204 
    205 * * *
    206 
    207 ```c
    208 mpc_parser_t *mpc_lift_val(mpc_val_t *x);
    209 ```
    210 
    211 Consumes no input, always successful, returns `x`
    212 
    213 * * *
    214 
    215 ```c
    216 mpc_parser_t *mpc_state(void);
    217 ```
    218 
    219 Consumes no input, always successful, returns a copy of the parser state as a `mpc_state_t *`. This state is newly allocated and so needs to be released with `free` when finished with.
    220 
    221 * * *
    222 
    223 ```c
    224 mpc_parser_t *mpc_anchor(int(*f)(char,char));
    225 ```
    226 
    227 Consumes no input. Successful when function `f` returns true. Always returns `NULL`.
    228 
    229 Function `f` is a _anchor_ function. It takes as input the last character parsed, and the next character in the input, and returns success or failure. This function can be set by the user to ensure some condition is met. For example to test that the input is at a boundary between words and non-words.
    230 
    231 At the start of the input the first argument is set to `'\0'`. At the end of the input the second argument is set to `'\0'`.
    232 
    233 
    234 
    235 Parsing
    236 -------
    237 
    238 Once you've build a parser, you can run it on some input using one of the following functions. These functions return `1` on success and `0` on failure. They output either the result, or an error to a `mpc_result_t` variable. This type is defined as follows.
    239 
    240 ```c
    241 typedef union {
    242   mpc_err_t *error;
    243   mpc_val_t *output;
    244 } mpc_result_t;
    245 ```
    246 
    247 where `mpc_val_t *` is synonymous with `void *` and simply represents some pointer to data - the exact type of which is dependant on the parser.
    248 
    249 
    250 * * *
    251 
    252 ```c
    253 int mpc_parse(const char *filename, const char *string, mpc_parser_t *p, mpc_result_t *r);
    254 ```
    255 
    256 Run a parser on some string.
    257 
    258 * * *
    259 
    260 ```c
    261 int mpc_parse_file(const char *filename, FILE *file, mpc_parser_t *p, mpc_result_t *r);
    262 ```
    263 
    264 Run a parser on some file.
    265 
    266 * * *
    267 
    268 ```c
    269 int mpc_parse_pipe(const char *filename, FILE *pipe, mpc_parser_t *p, mpc_result_t *r);
    270 ```
    271 
    272 Run a parser on some pipe (such as `stdin`).
    273 
    274 * * *
    275 
    276 ```c
    277 int mpc_parse_contents(const char *filename, mpc_parser_t *p, mpc_result_t *r);
    278 ```
    279 
    280 Run a parser on the contents of some file.
    281 
    282 
    283 Combinators
    284 -----------
    285 
    286 Combinators are functions that take one or more parsers and return a new parser of some given functionality.
    287 
    288 These combinators work independently of exactly what data type the parser(s) supplied as input return. In languages such as Haskell ensuring you don't input one type of data into a parser requiring a different type is done by the compiler. But in C we don't have that luxury. So it is at the discretion of the programmer to ensure that he or she deals correctly with the outputs of different parser types.
    289 
    290 A second annoyance in C is that of manual memory management. Some parsers might get half-way and then fail. This means they need to clean up any partial result that has been collected in the parse. In Haskell this is handled by the Garbage Collector, but in C these combinators will need to take _destructor_ functions as input, which say how clean up any partial data that has been collected.
    291 
    292 Here are the main combinators and how to use then.
    293 
    294 * * *
    295 
    296 ```c
    297 mpc_parser_t *mpc_expect(mpc_parser_t *a, const char *e);
    298 mpc_parser_t *mpc_expectf(mpc_parser_t *a, const char *fmt, ...);
    299 ```
    300 
    301 Returns a parser that runs `a`, and on success returns the result of `a`, while on failure reports that `e` was expected.
    302 
    303 * * *
    304 
    305 ```c
    306 mpc_parser_t *mpc_apply(mpc_parser_t *a, mpc_apply_t f);
    307 mpc_parser_t *mpc_apply_to(mpc_parser_t *a, mpc_apply_to_t f, void *x);
    308 ```
    309 
    310 Returns a parser that applies function `f` (optionality taking extra input `x`) to the result of parser `a`.
    311 
    312 * * *
    313 
    314 ```c
    315 mpc_parser_t *mpc_check(mpc_parser_t *a, mpc_dtor_t da, mpc_check_t f, const char *e);
    316 mpc_parser_t *mpc_check_with(mpc_parser_t *a, mpc_dtor_t da, mpc_check_with_t f, void *x, const char *e);
    317 mpc_parser_t *mpc_checkf(mpc_parser_t *a, mpc_dtor_t da, mpc_check_t f, const char *fmt, ...);
    318 mpc_parser_t *mpc_check_withf(mpc_parser_t *a, mpc_dtor_t da, mpc_check_with_t f, void *x, const char *fmt, ...);
    319 ```
    320 
    321 Returns a parser that applies function `f` (optionally taking extra input `x`) to the result of parser `a`. If `f` returns non-zero, then the parser succeeds and returns the value of `a` (possibly modified by `f`). If `f` returns zero, then the parser fails with message `e`, and the result of `a` is destroyed with the destructor `da`.
    322 
    323 * * *
    324 
    325 ```c
    326 mpc_parser_t *mpc_not(mpc_parser_t *a, mpc_dtor_t da);
    327 mpc_parser_t *mpc_not_lift(mpc_parser_t *a, mpc_dtor_t da, mpc_ctor_t lf);
    328 ```
    329 
    330 Returns a parser with the following behaviour. If parser `a` succeeds, then it fails and consumes no input. If parser `a` fails, then it succeeds, consumes no input and returns `NULL` (or the result of lift function `lf`). Destructor `da` is used to destroy the result of `a` on success.
    331 
    332 * * *
    333 
    334 ```c
    335 mpc_parser_t *mpc_maybe(mpc_parser_t *a);
    336 mpc_parser_t *mpc_maybe_lift(mpc_parser_t *a, mpc_ctor_t lf);
    337 ```
    338 
    339 Returns a parser that runs `a`. If `a` is successful then it returns the result of `a`. If `a` is unsuccessful then it succeeds, but returns `NULL` (or the result of `lf`).
    340 
    341 * * *
    342 
    343 ```c
    344 mpc_parser_t *mpc_many(mpc_fold_t f, mpc_parser_t *a);
    345 ```
    346 
    347 Runs `a` zero or more times until it fails. Results are combined using fold function `f`. See the _Function Types_ section for more details.
    348 
    349 * * *
    350 
    351 ```c
    352 mpc_parser_t *mpc_many1(mpc_fold_t f, mpc_parser_t *a);
    353 ```
    354 
    355 Runs `a` one or more times until it fails. Results are combined with fold function `f`.
    356 
    357 * * *
    358 
    359 ```c
    360 mpc_parser_t *mpc_count(int n, mpc_fold_t f, mpc_parser_t *a, mpc_dtor_t da);
    361 ```
    362 
    363 Runs `a` exactly `n` times. If this fails, any partial results are destructed with `da`. If successful results of `a` are combined using fold function `f`.
    364 
    365 * * *
    366 
    367 ```c
    368 mpc_parser_t *mpc_or(int n, ...);
    369 ```
    370 
    371 Attempts to run `n` parsers in sequence, returning the first one that succeeds. If all fail, returns an error.
    372 
    373 * * *
    374 
    375 ```c
    376 mpc_parser_t *mpc_and(int n, mpc_fold_t f, ...);
    377 ```
    378 
    379 Attempts to run `n` parsers in sequence, returning the fold of the results using fold function `f`. First parsers must be specified, followed by destructors for each parser, excluding the final parser. These are used in case of partial success. For example: `mpc_and(3, mpcf_strfold, mpc_char('a'), mpc_char('b'), mpc_char('c'), free, free);` would attempt to match `'a'` followed by `'b'` followed by `'c'`, and if successful would concatenate them using `mpcf_strfold`. Otherwise would use `free` on the partial results.
    380 
    381 * * *
    382 
    383 ```c
    384 mpc_parser_t *mpc_predictive(mpc_parser_t *a);
    385 ```
    386 
    387 Returns a parser that runs `a` with backtracking disabled. This means if `a` consumes more than one character, it will not be reverted, even on failure. Turning backtracking off has good performance benefits for grammars which are `LL(1)`. These are grammars where the first character completely determines the parse result - such as the decision of parsing either a C identifier, number, or string literal. This option should not be used for non `LL(1)` grammars or it will produce incorrect results or crash the parser.
    388 
    389 Another way to think of `mpc_predictive` is that it can be applied to a parser (for a performance improvement) if either successfully parsing the first character will result in a completely successful parse, or all of the referenced sub-parsers are also `LL(1)`.
    390 
    391 
    392 Function Types
    393 --------------
    394 
    395 The combinator functions take a number of special function types as function pointers. Here is a short explanation of those types are how they are expected to behave. It is important that these behave correctly otherwise it is easy to introduce memory leaks or crashes into the system.
    396 
    397 * * *
    398 
    399 ```c
    400 typedef void(*mpc_dtor_t)(mpc_val_t*);
    401 ```
    402 
    403 Given some pointer to a data value it will ensure the memory it points to is freed correctly.
    404 
    405 * * *
    406 
    407 ```c
    408 typedef mpc_val_t*(*mpc_ctor_t)(void);
    409 ```
    410 
    411 Returns some data value when called. It can be used to create _empty_ versions of data types when certain combinators have no known default value to return. For example it may be used to return a newly allocated empty string.
    412 
    413 * * *
    414 
    415 ```c
    416 typedef mpc_val_t*(*mpc_apply_t)(mpc_val_t*);
    417 typedef mpc_val_t*(*mpc_apply_to_t)(mpc_val_t*,void*);
    418 ```
    419 
    420 This takes in some pointer to data and outputs some new or modified pointer to data, ensuring to free the input data if it is no longer used. The `apply_to` variation takes in an extra pointer to some data such as global state.
    421 
    422 * * *
    423 
    424 ```c
    425 typedef int(*mpc_check_t)(mpc_val_t**);
    426 typedef int(*mpc_check_with_t)(mpc_val_t**,void*);
    427 ```
    428 
    429 This takes in some pointer to data and outputs 0 if parsing should stop with an error. Additionally, this may change or free the input data. The `check_with` variation takes in an extra pointer to some data such as global state.
    430 
    431 * * *
    432 
    433 ```c
    434 typedef mpc_val_t*(*mpc_fold_t)(int,mpc_val_t**);
    435 ```
    436 
    437 This takes a list of pointers to data values and must return some combined or folded version of these data values. It must ensure to free any input data that is no longer used once the combination has taken place.
    438 
    439 
    440 Case Study - Identifier
    441 =======================
    442 
    443 Combinator Method
    444 -----------------
    445 
    446 Using the above combinators we can create a parser that matches a C identifier.
    447 
    448 When using the combinators we need to supply a function that says how to combine two `char *`.
    449 
    450 For this we build a fold function that will concatenate zero or more strings together. For this sake of this tutorial we will write it by hand, but this (as well as many other useful fold functions), are actually included in _mpc_ under the `mpcf_*` namespace, such as `mpcf_strfold`.
    451 
    452 ```c
    453 mpc_val_t *strfold(int n, mpc_val_t **xs) {
    454   char *x = calloc(1, 1);
    455   int i;
    456   for (i = 0; i < n; i++) {
    457     x = realloc(x, strlen(x) + strlen(xs[i]) + 1);
    458     strcat(x, xs[i]);
    459     free(xs[i]);
    460   }
    461   return x;
    462 }
    463 ```
    464 
    465 We can use this to specify a C identifier, making use of some combinators to say how the basic parsers are combined.
    466 
    467 ```c
    468 mpc_parser_t *alpha = mpc_or(2, mpc_range('a', 'z'), mpc_range('A', 'Z'));
    469 mpc_parser_t *digit = mpc_range('0', '9');
    470 mpc_parser_t *underscore = mpc_char('_');
    471 
    472 mpc_parser_t *ident = mpc_and(2, strfold,
    473   mpc_or(2, alpha, underscore),
    474   mpc_many(strfold, mpc_or(3, alpha, digit, underscore)),
    475   free);
    476 
    477 /* Do Some Parsing... */
    478 
    479 mpc_delete(ident);
    480 ```
    481 
    482 Notice that previous parsers are used as input to new parsers we construct from the combinators. Note that only the final parser `ident` must be deleted. When we input a parser into a combinator we should consider it to be part of the output of that combinator.
    483 
    484 Because of this we shouldn't create a parser and input it into multiple places, or it will be doubly freed.
    485 
    486 
    487 Regex Method
    488 ------------
    489 
    490 There is an easier way to do this than the above method. _mpc_ comes with a handy regex function for constructing parsers using regex syntax. We can specify an identifier using a regex pattern as shown below.
    491 
    492 ```c
    493 mpc_parser_t *ident = mpc_re("[a-zA-Z_][a-zA-Z_0-9]*");
    494 
    495 /* Do Some Parsing... */
    496 
    497 mpc_delete(ident);
    498 ```
    499 
    500 
    501 Library Method
    502 --------------
    503 
    504 Although if we really wanted to create a parser for C identifiers, a function for creating this parser comes included in _mpc_ along with many other common parsers.
    505 
    506 ```c
    507 mpc_parser_t *ident = mpc_ident();
    508 
    509 /* Do Some Parsing... */
    510 
    511 mpc_delete(ident);
    512 ```
    513 
    514 Parser References
    515 =================
    516 
    517 Building parsers in the above way can have issues with self-reference or cyclic-reference. To overcome this we can separate the construction of parsers into two different steps. Construction and Definition.
    518 
    519 * * *
    520 
    521 ```c
    522 mpc_parser_t *mpc_new(const char *name);
    523 ```
    524 
    525 This will construct a parser called `name` which can then be used as input to others, including itself, without fear of being deleted. Any parser created using `mpc_new` is said to be _retained_. This means it will behave differently to a normal parser when referenced. When deleting a parser that includes a _retained_ parser, the _retained_ parser will not be deleted along with it. To delete a retained parser `mpc_delete` must be used on it directly.
    526 
    527 A _retained_ parser can then be _defined_ using...
    528 
    529 * * *
    530 
    531 ```c
    532 mpc_parser_t *mpc_define(mpc_parser_t *p, mpc_parser_t *a);
    533 ```
    534 
    535 This assigns the contents of parser `a` to `p`, and deletes `a`. With this technique parsers can now reference each other, as well as themselves, without trouble.
    536 
    537 * * *
    538 
    539 ```c
    540 mpc_parser_t *mpc_undefine(mpc_parser_t *p);
    541 ```
    542 
    543 A final step is required. Parsers that reference each other must all be undefined before they are deleted. It is important to do any undefining before deletion. The reason for this is that to delete a parser it must look at each sub-parser that is used by it. If any of these have already been deleted a segfault is unavoidable - even if they were retained beforehand.
    544 
    545 * * *
    546 
    547 ```c
    548 void mpc_cleanup(int n, ...);
    549 ```
    550 
    551 To ease the task of undefining and then deleting parsers `mpc_cleanup` can be used. It takes `n` parsers as input, and undefines them all, before deleting them all.
    552 
    553 * * *
    554 
    555 ```c
    556 mpc_parser_t *mpc_copy(mpc_parser_t *a);
    557 ```
    558 
    559 This function makes a copy of a parser `a`. This can be useful when you want to
    560 use a parser as input for some other parsers multiple times without retaining
    561 it.
    562 
    563 * * *
    564 
    565 ```c
    566 mpc_parser_t *mpc_re(const char *re);
    567 mpc_parser_t *mpc_re_mode(const char *re, int mode);
    568 ```
    569 
    570 This function takes as input the regular expression `re` and builds a parser
    571 for it. With the `mpc_re_mode` function optional mode flags can also be given.
    572 Available flags are `MPC_RE_MULTILINE` / `MPC_RE_M` where the start of input
    573 character `^` also matches the beginning of new lines and the end of input `$`
    574 character also matches new lines, and `MPC_RE_DOTALL` / `MPC_RE_S` where the
    575 any character token `.` also matches newlines (by default it doesn't).
    576 
    577 
    578 Library Reference
    579 =================
    580 
    581 Common Parsers
    582 --------------
    583 
    584 
    585 <table>
    586 
    587   <tr><td><code>mpc_soi</code></td><td>Matches only the start of input, returns <code>NULL</code></td></tr>
    588   <tr><td><code>mpc_eoi</code></td><td>Matches only the end of input, returns <code>NULL</code></td></tr>
    589   <tr><td><code>mpc_boundary</code></td><td>Matches only the boundary between words, returns <code>NULL</code></td></tr>
    590   <tr><td><code>mpc_boundary_newline</code></td><td>Matches the start of a new line, returns <code>NULL</code></td></tr>
    591   <tr><td><code>mpc_whitespace</code></td><td>Matches any whitespace character <code>" \f\n\r\t\v"</code></td></tr>
    592   <tr><td><code>mpc_whitespaces</code></td><td>Matches zero or more whitespace characters</td></tr>
    593   <tr><td><code>mpc_blank</code></td><td>Matches whitespaces and frees the result, returns <code>NULL</code></td></tr>
    594   <tr><td><code>mpc_newline</code></td><td>Matches <code>'\n'</code></td></tr>
    595   <tr><td><code>mpc_tab</code></td><td>Matches <code>'\t'</code></td></tr>
    596   <tr><td><code>mpc_escape</code></td><td>Matches a backslash followed by any character</td></tr>
    597   <tr><td><code>mpc_digit</code></td><td>Matches any character in the range <code>'0'</code> - <code>'9'</code></td></tr>
    598   <tr><td><code>mpc_hexdigit</code></td><td>Matches any character in the range <code>'0</code> - <code>'9'</code> as well as <code>'A'</code> - <code>'F'</code> and <code>'a'</code> - <code>'f'</code></td></tr>
    599   <tr><td><code>mpc_octdigit</code></td><td>Matches any character in the range <code>'0'</code> - <code>'7'</code></td></tr>
    600   <tr><td><code>mpc_digits</code></td><td>Matches one or more digit</td></tr>
    601   <tr><td><code>mpc_hexdigits</code></td><td>Matches one or more hexdigit</td></tr>
    602   <tr><td><code>mpc_octdigits</code></td><td>Matches one or more octdigit</td></tr>
    603   <tr><td><code>mpc_lower</code></td><td>Matches any lower case character</td></tr>
    604   <tr><td><code>mpc_upper</code></td><td>Matches any upper case character</td></tr>
    605   <tr><td><code>mpc_alpha</code></td><td>Matches any alphabet character</td></tr>
    606   <tr><td><code>mpc_underscore</code></td><td>Matches <code>'_'</code></td></tr>
    607   <tr><td><code>mpc_alphanum</code></td><td>Matches any alphabet character, underscore or digit</td></tr>
    608   <tr><td><code>mpc_int</code></td><td>Matches digits and returns an <code>int*</code></td></tr>
    609   <tr><td><code>mpc_hex</code></td><td>Matches hexdigits and returns an <code>int*</code></td></tr>
    610   <tr><td><code>mpc_oct</code></td><td>Matches octdigits and returns an <code>int*</code></td></tr>
    611   <tr><td><code>mpc_number</code></td><td>Matches <code>mpc_int</code>, <code>mpc_hex</code> or <code>mpc_oct</code></td></tr>
    612   <tr><td><code>mpc_real</code></td><td>Matches some floating point number as a string</td></tr>
    613   <tr><td><code>mpc_float</code></td><td>Matches some floating point number and returns a <code>float*</code></td></tr>
    614   <tr><td><code>mpc_char_lit</code></td><td>Matches some character literal surrounded by <code>'</code></td></tr>
    615   <tr><td><code>mpc_string_lit</code></td><td>Matches some string literal surrounded by <code>"</code></td></tr>
    616   <tr><td><code>mpc_regex_lit</code></td><td>Matches some regex literal surrounded by <code>/</code></td></tr>
    617   <tr><td><code>mpc_ident</code></td><td>Matches a C style identifier</td></tr>
    618 
    619 </table>
    620 
    621 
    622 Useful Parsers
    623 --------------
    624 
    625 <table>
    626 
    627   <tr><td><code>mpc_startswith(mpc_parser_t *a);</code></td><td>Matches the start of input followed by <code>a</code></td></tr>
    628   <tr><td><code>mpc_endswith(mpc_parser_t *a, mpc_dtor_t da);</code></td><td>Matches <code>a</code> followed by the end of input</td></tr>
    629   <tr><td><code>mpc_whole(mpc_parser_t *a, mpc_dtor_t da);</code></td><td>Matches the start of input, <code>a</code>, and the end of input</td></tr>
    630   <tr><td><code>mpc_stripl(mpc_parser_t *a);</code></td><td>Matches <code>a</code> first consuming any whitespace to the left</td></tr>
    631   <tr><td><code>mpc_stripr(mpc_parser_t *a);</code></td><td>Matches <code>a</code> then consumes any whitespace to the right</td></tr>
    632   <tr><td><code>mpc_strip(mpc_parser_t *a);</code></td><td>Matches <code>a</code> consuming any surrounding whitespace</td></tr>
    633   <tr><td><code>mpc_tok(mpc_parser_t *a);</code></td><td>Matches <code>a</code> and consumes any trailing whitespace</td></tr>
    634   <tr><td><code>mpc_sym(const char *s);</code></td><td>Matches string <code>s</code> and consumes any trailing whitespace</td></tr>
    635   <tr><td><code>mpc_total(mpc_parser_t *a, mpc_dtor_t da);</code></td><td>Matches the whitespace consumed <code>a</code>, enclosed in the start and end of input</td></tr>
    636   <tr><td><code>mpc_between(mpc_parser_t *a, mpc_dtor_t ad, <br /> const char *o, const char *c);</code></td><td> Matches <code>a</code> between strings <code>o</code> and <code>c</code></td></tr>
    637   <tr><td><code>mpc_parens(mpc_parser_t *a, mpc_dtor_t ad);</code></td><td>Matches <code>a</code> between <code>"("</code> and <code>")"</code></td></tr>
    638   <tr><td><code>mpc_braces(mpc_parser_t *a, mpc_dtor_t ad);</code></td><td>Matches <code>a</code> between <code>"<"</code> and <code>">"</code></td></tr>
    639   <tr><td><code>mpc_brackets(mpc_parser_t *a, mpc_dtor_t ad);</code></td><td>Matches <code>a</code> between <code>"{"</code> and <code>"}"</code></td></tr>
    640   <tr><td><code>mpc_squares(mpc_parser_t *a, mpc_dtor_t ad);</code></td><td>Matches <code>a</code> between <code>"["</code> and <code>"]"</code></td></tr>
    641   <tr><td><code>mpc_tok_between(mpc_parser_t *a, mpc_dtor_t ad, <br /> const char *o, const char *c);</code></td><td>Matches <code>a</code> between <code>o</code> and <code>c</code>, where <code>o</code> and <code>c</code> have their trailing whitespace striped.</td></tr>
    642   <tr><td><code>mpc_tok_parens(mpc_parser_t *a, mpc_dtor_t ad);</code></td><td>Matches <code>a</code> between trailing whitespace consumed <code>"("</code> and <code>")"</code></td></tr>
    643   <tr><td><code>mpc_tok_braces(mpc_parser_t *a, mpc_dtor_t ad);</code></td><td>Matches <code>a</code> between trailing whitespace consumed <code>"<"</code> and <code>">"</code></td></tr>
    644   <tr><td><code>mpc_tok_brackets(mpc_parser_t *a, mpc_dtor_t ad);</code></td><td>Matches <code>a</code> between trailing whitespace consumed <code>"{"</code> and <code>"}"</code></td></tr>
    645   <tr><td><code>mpc_tok_squares(mpc_parser_t *a, mpc_dtor_t ad);</code></td><td>Matches <code>a</code> between trailing whitespace consumed <code>"["</code> and <code>"]"</code></td></tr>
    646 
    647 </table>
    648 
    649 
    650 Apply Functions
    651 ---------------
    652 
    653 <table>
    654 
    655   <tr><td><code>void mpcf_dtor_null(mpc_val_t *x);</code></td><td>Empty destructor. Does nothing</td></tr>
    656   <tr><td><code>mpc_val_t *mpcf_ctor_null(void);</code></td><td>Returns <code>NULL</code></td></tr>
    657   <tr><td><code>mpc_val_t *mpcf_ctor_str(void);</code></td><td>Returns <code>""</code></td></tr>
    658   <tr><td><code>mpc_val_t *mpcf_free(mpc_val_t *x);</code></td><td>Frees <code>x</code> and returns <code>NULL</code></td></tr>
    659   <tr><td><code>mpc_val_t *mpcf_int(mpc_val_t *x);</code></td><td>Converts a decimal string <code>x</code> to an <code>int*</code></td></tr>
    660   <tr><td><code>mpc_val_t *mpcf_hex(mpc_val_t *x);</code></td><td>Converts a hex string <code>x</code> to an <code>int*</code></td></tr>
    661   <tr><td><code>mpc_val_t *mpcf_oct(mpc_val_t *x);</code></td><td>Converts a oct string <code>x</code> to an <code>int*</code></td></tr>
    662   <tr><td><code>mpc_val_t *mpcf_float(mpc_val_t *x);</code></td><td>Converts a string <code>x</code> to a <code>float*</code></td></tr>
    663   <tr><td><code>mpc_val_t *mpcf_escape(mpc_val_t *x);</code></td><td>Converts a string <code>x</code> to an escaped version</td></tr>
    664   <tr><td><code>mpc_val_t *mpcf_escape_regex(mpc_val_t *x);</code></td><td>Converts a regex <code>x</code> to an escaped version</td></tr>
    665   <tr><td><code>mpc_val_t *mpcf_escape_string_raw(mpc_val_t *x);</code></td><td>Converts a raw string <code>x</code> to an escaped version</td></tr>
    666   <tr><td><code>mpc_val_t *mpcf_escape_char_raw(mpc_val_t *x);</code></td><td>Converts a raw character <code>x</code> to an escaped version</td></tr>
    667   <tr><td><code>mpc_val_t *mpcf_unescape(mpc_val_t *x);</code></td><td>Converts a string <code>x</code> to an unescaped version</td></tr>
    668   <tr><td><code>mpc_val_t *mpcf_unescape_regex(mpc_val_t *x);</code></td><td>Converts a regex <code>x</code> to an unescaped version</td></tr>
    669   <tr><td><code>mpc_val_t *mpcf_unescape_string_raw(mpc_val_t *x);</code></td><td>Converts a raw string <code>x</code> to an unescaped version</td></tr>
    670   <tr><td><code>mpc_val_t *mpcf_unescape_char_raw(mpc_val_t *x);</code></td><td>Converts a raw character <code>x</code> to an unescaped version</td></tr>
    671   <tr><td><code>mpc_val_t *mpcf_strtriml(mpc_val_t *x);</code></td><td>Trims whitespace from the left of string <code>x</code></td></tr>
    672   <tr><td><code>mpc_val_t *mpcf_strtrimr(mpc_val_t *x);</code></td><td>Trims whitespace from the right of string <code>x</code></td></tr>
    673   <tr><td><code>mpc_val_t *mpcf_strtrim(mpc_val_t *x);</code></td><td>Trims whitespace from either side of string <code>x</code></td></tr>
    674 </table>
    675 
    676 
    677 Fold Functions
    678 --------------
    679 
    680 <table>
    681 
    682 
    683   <tr><td><code>mpc_val_t *mpcf_null(int n, mpc_val_t** xs);</code></td><td>Returns <code>NULL</code></td></tr>
    684   <tr><td><code>mpc_val_t *mpcf_fst(int n, mpc_val_t** xs);</code></td><td>Returns first element of <code>xs</code></td></tr>
    685   <tr><td><code>mpc_val_t *mpcf_snd(int n, mpc_val_t** xs);</code></td><td>Returns second element of <code>xs</code></td></tr>
    686   <tr><td><code>mpc_val_t *mpcf_trd(int n, mpc_val_t** xs);</code></td><td>Returns third element of <code>xs</code></td></tr>
    687   <tr><td><code>mpc_val_t *mpcf_fst_free(int n, mpc_val_t** xs);</code></td><td>Returns first element of <code>xs</code> and calls <code>free</code> on others</td></tr>
    688   <tr><td><code>mpc_val_t *mpcf_snd_free(int n, mpc_val_t** xs);</code></td><td>Returns second element of <code>xs</code> and calls <code>free</code> on others</td></tr>
    689   <tr><td><code>mpc_val_t *mpcf_trd_free(int n, mpc_val_t** xs);</code></td><td>Returns third element of <code>xs</code> and calls <code>free</code> on others</td></tr>
    690   <tr><td><code>mpc_val_t *mpcf_all_free(int n, mpc_val_t** xs);</code></td><td>Calls <code>free</code> on all elements of <code>xs</code> and returns <code>NULL</code></td></tr>
    691   <tr><td><code>mpc_val_t *mpcf_strfold(int n, mpc_val_t** xs);</code></td><td>Concatenates all <code>xs</code> together as strings and returns result </td></tr>
    692 
    693 </table>
    694 
    695 
    696 Case Study - Maths Language
    697 ===========================
    698 
    699 Combinator Approach
    700 -------------------
    701 
    702 Passing around all these function pointers might seem clumsy, but having parsers be type-generic is important as it lets users define their own output types for parsers. For example we could design our own syntax tree type to use. We can also use this method to do some specific house-keeping or data processing in the parsing phase.
    703 
    704 As an example of this power, we can specify a simple maths grammar, that outputs `int *`, and computes the result of the expression as it goes along.
    705 
    706 We start with a fold function that will fold two `int *` into a new `int *` based on some `char *` operator.
    707 
    708 ```c
    709 mpc_val_t *fold_maths(int n, mpc_val_t **xs) {
    710 
    711   int **vs = (int**)xs;
    712 
    713   if (strcmp(xs[1], "*") == 0) { *vs[0] *= *vs[2]; }
    714   if (strcmp(xs[1], "/") == 0) { *vs[0] /= *vs[2]; }
    715   if (strcmp(xs[1], "%") == 0) { *vs[0] %= *vs[2]; }
    716   if (strcmp(xs[1], "+") == 0) { *vs[0] += *vs[2]; }
    717   if (strcmp(xs[1], "-") == 0) { *vs[0] -= *vs[2]; }
    718 
    719   free(xs[1]); free(xs[2]);
    720 
    721   return xs[0];
    722 }
    723 ```
    724 
    725 And then we use this to specify a basic grammar, which folds together any results.
    726 
    727 ```c
    728 mpc_parser_t *Expr   = mpc_new("expr");
    729 mpc_parser_t *Factor = mpc_new("factor");
    730 mpc_parser_t *Term   = mpc_new("term");
    731 mpc_parser_t *Maths  = mpc_new("maths");
    732 
    733 mpc_define(Expr, mpc_or(2,
    734   mpc_and(3, fold_maths,
    735     Factor, mpc_oneof("+-"), Factor,
    736     free, free),
    737   Factor
    738 ));
    739 
    740 mpc_define(Factor, mpc_or(2,
    741   mpc_and(3, fold_maths,
    742     Term, mpc_oneof("*/"), Term,
    743     free, free),
    744   Term
    745 ));
    746 
    747 mpc_define(Term, mpc_or(2, mpc_int(), mpc_parens(Expr, free)));
    748 mpc_define(Maths, mpc_whole(Expr, free));
    749 
    750 /* Do Some Parsing... */
    751 
    752 mpc_delete(Maths);
    753 ```
    754 
    755 If we supply this function with something like `(4*2)+5`, we can expect it to output `13`.
    756 
    757 
    758 Language Approach
    759 -----------------
    760 
    761 It is possible to avoid passing in and around all those function pointers, if you don't care what type is output by _mpc_. For this, a generic Abstract Syntax Tree type `mpc_ast_t` is included in _mpc_. The combinator functions which act on this don't need information on how to destruct or fold instances of the result as they know it will be a `mpc_ast_t`. So there are a number of combinator functions which work specifically (and only) on parsers that return this type. They reside under `mpca_*`.
    762 
    763 Doing things via this method means that all the data processing must take place after the parsing. In many instances this is not an issue, or even preferable.
    764 
    765 It also allows for one more trick. As all the fold and destructor functions are implicit, the user can simply specify the grammar of the language in some nice way and the system can try to build a parser for the AST type from this alone. For this there are a few functions supplied which take in a string, and output a parser. The format for these grammars is simple and familiar to those who have used parser generators before. It looks something like this.
    766 
    767 ```
    768 number "number" : /[0-9]+/ ;
    769 expression      : <product> (('+' | '-') <product>)* ;
    770 product         : <value>   (('*' | '/')   <value>)* ;
    771 value           : <number> | '(' <expression> ')' ;
    772 maths           : /^/ <expression> /$/ ;
    773 ```
    774 
    775 The syntax for this is defined as follows.
    776 
    777 <table class='table'>
    778   <tr><td><code>"ab"</code></td><td>The string <code>ab</code> is required.</td></tr>
    779   <tr><td><code>'a'</code></td><td>The character <code>a</code> is required.</td></tr>
    780   <tr><td><code>'a' 'b'</code></td><td>First <code>'a'</code> is required, then <code>'b'</code> is required..</td></tr>
    781   <tr><td><code>'a' | 'b'</code></td><td>Either <code>'a'</code> is required, or <code>'b'</code> is required.</td></tr>
    782   <tr><td><code>'a'*</code></td><td>Zero or more <code>'a'</code> are required.</td></tr>
    783   <tr><td><code>'a'+</code></td><td>One or more <code>'a'</code> are required.</td></tr>
    784   <tr><td><code>'a'?</code></td><td>Zero or one <code>'a'</code> is required.</td></tr>
    785   <tr><td><code>'a'{x}</code></td><td>Exactly <code>x</code> (integer) copies of <code>'a'</code> are required.</td></tr>
    786   <tr><td><code>&lt;abba&gt;</code></td><td>The rule called <code>abba</code> is required.</td></tr>
    787 </table>
    788 
    789 Rules are specified by rule name, optionally followed by an _expected_ string, followed by a colon `:`, followed by the definition, and ending in a semicolon `;`. Multiple rules can be specified. The _rule names_ must match the names given to any parsers created by `mpc_new`, otherwise the function will crash.
    790 
    791 The flags variable is a set of flags `MPCA_LANG_DEFAULT`, `MPCA_LANG_PREDICTIVE`, or `MPCA_LANG_WHITESPACE_SENSITIVE`. For specifying if the language is predictive or whitespace sensitive.
    792 
    793 Like with the regular expressions, this user input is parsed by existing parts of the _mpc_ library. It provides one of the more powerful features of the library.
    794 
    795 * * *
    796 
    797 ```c
    798 mpc_parser_t *mpca_grammar(int flags, const char *grammar, ...);
    799 ```
    800 
    801 This takes in some single right hand side of a rule, as well as a list of any of the parsers referenced, and outputs a parser that does what is specified by the rule. The list of parsers referenced can be terminated with `NULL` to get an error instead of a crash when a parser required is not supplied.
    802 
    803 * * *
    804 
    805 ```c
    806 mpc_err_t *mpca_lang(int flags, const char *lang, ...);
    807 ```
    808 
    809 This takes in a full language (zero or more rules) as well as any parsers referred to by either the right or left hand sides. Any parsers specified on the left hand side of any rule will be assigned a parser equivalent to what is specified on the right. On valid user input this returns `NULL`, while if there are any errors in the user input it will return an instance of `mpc_err_t` describing the issues. The list of parsers referenced can be terminated with `NULL` to get an error instead of a crash when a parser required is not supplied.
    810 
    811 * * *
    812 
    813 ```c
    814 mpc_err_t *mpca_lang_file(int flags, FILE* f, ...);
    815 ```
    816 
    817 This reads in the contents of file `f` and inputs it into `mpca_lang`.
    818 
    819 * * *
    820 
    821 ```c
    822 mpc_err_t *mpca_lang_contents(int flags, const char *filename, ...);
    823 ```
    824 
    825 This opens and reads in the contents of the file given by `filename` and passes it to `mpca_lang`.
    826 
    827 Case Study - Tokenizer
    828 ======================
    829 
    830 Another common task we might be interested in doing is tokenizing some block of
    831 text (splitting the text into individual elements) and performing some function
    832 on each one of these elements as it is read. We can do this with `mpc` too.
    833 
    834 First, we can build a regular expression which parses an individual token. For
    835 example if our tokens are identifiers, integers, commas, periods and colons we
    836 could build something like this `mpc_re("\\s*([a-zA-Z_]+|[0-9]+|,|\\.|:)")`.
    837 Next we can strip any whitespace, and add a callback function using `mpc_apply`
    838 which gets called every time this regex is parsed successfully
    839 `mpc_apply(mpc_strip(mpc_re("\\s*([a-zA-Z_]+|[0-9]+|,|\\.|:)")), print_token)`.
    840 Finally we can surround all of this in `mpc_many` to parse it zero or more
    841 times. The final code might look something like this:
    842 
    843 ```c
    844 static mpc_val_t *print_token(mpc_val_t *x) {
    845   printf("Token: '%s'\n", (char*)x);
    846   return x;
    847 }
    848 
    849 int main(int argc, char **argv) {
    850 
    851   const char *input = "  hello 4352 ,  \n foo.bar   \n\n  test:ing   ";
    852 
    853   mpc_parser_t* Tokens = mpc_many(
    854     mpcf_all_free,
    855     mpc_apply(mpc_strip(mpc_re("\\s*([a-zA-Z_]+|[0-9]+|,|\\.|:)")), print_token));
    856 
    857   mpc_result_t r;
    858   mpc_parse("input", input, Tokens, &r);
    859 
    860   mpc_delete(Tokens);
    861 
    862   return 0;
    863 }
    864 ```
    865 
    866 Running this program will produce an output something like this:
    867 
    868 ```
    869 Token: 'hello'
    870 Token: '4352'
    871 Token: ','
    872 Token: 'foo'
    873 Token: '.'
    874 Token: 'bar'
    875 Token: 'test'
    876 Token: ':'
    877 Token: 'ing'
    878 ```
    879 
    880 By extending the regex we can easily extend this to parse many more types of
    881 tokens and quickly and easily build a tokenizer for whatever language we are
    882 interested in.
    883 
    884 
    885 Error Reporting
    886 ===============
    887 
    888 _mpc_ provides some automatic generation of error messages. These can be enhanced by the user, with use of `mpc_expect`, but many of the defaults should provide both useful and readable. An example of an error message might look something like this:
    889 
    890 ```
    891 <test>:0:3: error: expected one or more of 'a' or 'd' at 'k'
    892 ```
    893 
    894 Misc
    895 ====
    896 
    897 Here are some other misc functions that mpc provides. These functions are susceptible to change between versions so use them with some care.
    898 
    899 * * *
    900 
    901 ```c
    902 void mpc_print(mpc_parser_t *p);
    903 ```
    904 
    905 Prints out a parser in some weird format. This is generally used for debugging so don't expect to be able to understand the output right away without looking at the source code a little bit.
    906 
    907 * * *
    908 
    909 ```c
    910 void mpc_stats(mpc_parser_t *p);
    911 ```
    912 
    913 Prints out some basic stats about a parser. Again used for debugging and optimisation.
    914 
    915 * * *
    916 
    917 ```c
    918 void mpc_optimise(mpc_parser_t *p);
    919 ```
    920 
    921 Performs some basic optimisations on a parser to reduce it's size and increase its running speed.
    922 
    923 
    924 Limitations & FAQ
    925 =================
    926 
    927 ### I'm getting namespace issues due to `libmpc`, what can I do?
    928 
    929 There is a re-naming of this project to `pcq` hosted on the [pcq branch](https://github.com/orangeduck/mpc/tree/pcq) which should be usable without namespace issues.
    930 
    931 ### Does _mpc_ support Unicode?
    932 
    933 _mpc_ Only supports ASCII. Sorry! Writing a parser library that supports Unicode is pretty difficult. I welcome contributions!
    934 
    935 
    936 ### Is _mpc_ binary safe?
    937 
    938 No. Sorry! Including NULL characters in a string or a file will probably break it. Avoid this if possible.
    939 
    940 
    941 ### The Parser is going into an infinite loop!
    942 
    943 While it is certainly possible there is an issue with _mpc_, it is probably the case that your grammar contains _left recursion_. This is something _mpc_ cannot deal with. _Left recursion_ is when a rule directly or indirectly references itself on the left hand side of a derivation. For example consider this left recursive grammar intended to parse an expression.
    944 
    945 ```
    946 expr : <expr> '+' (<expr> | <int> | <string>);
    947 ```
    948 
    949 When the rule `expr` is called, it looks the first rule on the left. This happens to be the rule `expr` again. So again it looks for the first rule on the left. Which is `expr` again. And so on. To avoid left recursion this can be rewritten (for example) as the following. Note that rewriting as follows also changes the operator associativity.
    950 
    951 ```
    952 value : <int> | <string> ;
    953 expr  : <value> ('+' <expr>)* ;
    954 ```
    955 
    956 Avoiding left recursion can be tricky, but is easy once you get a feel for it. For more information you can look on [wikipedia](http://en.wikipedia.org/wiki/Left_recursion) which covers some common techniques and more examples. Possibly in the future _mpc_ will support functionality to warn the user or re-write grammars which contain left recursion, but it wont for now.
    957 
    958 
    959 ### Backtracking isn't working!
    960 
    961 _mpc_ supports backtracking, but it may not work as you expect. It isn't a silver bullet, and you still must structure your grammar to be unambiguous. To demonstrate this behaviour examine the following erroneous grammar, intended to parse either a C style identifier, or a C style function call.
    962 
    963 ```
    964 factor : <ident>
    965        | <ident> '('  <expr>? (',' <expr>)* ')' ;
    966 ```
    967 
    968 This grammar will never correctly parse a function call because it will always first succeed parsing the initial identifier and return a factor. At this point it will encounter the parenthesis of the function call, give up, and throw an error. Even if it were to try and parse a factor again on this failure it would never reach the correct function call option because it always tries the other options first, and always succeeds with the identifier.
    969 
    970 The solution to this is to always structure grammars with the most specific clause first, and more general clauses afterwards. This is the natural technique used for avoiding left-recursive grammars and unambiguity, so is a good habit to get into anyway.
    971 
    972 Now the parser will try to match a function first, and if this fails backtrack and try to match just an identifier.
    973 
    974 ```
    975 factor : <ident> '('  <expr>? (',' <expr>)* ')'
    976        | <ident> ;
    977 ```
    978 
    979 An alternative, and better option is to remove the ambiguity completely by factoring out the first identifier. This is better because it removes any need for backtracking at all! Now the grammar is predictive!
    980 
    981 ```
    982 factor : <ident> ('('  <expr>? (',' <expr>)* ')')? ;
    983 ```
    984 
    985 
    986 ### How can I avoid the maximum string literal length?
    987 
    988 Some compilers limit the maximum length of string literals. If you have a huge language string in the source file to be passed into `mpca_lang` you might encounter this. The ANSI standard says that 509 is the maximum length allowed for a string literal. Most compilers support greater than this. Visual Studio supports up to 2048 characters, while gcc allocates memory dynamically and so has no real limit.
    989 
    990 There are a couple of ways to overcome this issue if it arises. You could instead use `mpca_lang_contents` and load the language from file or you could use a string literal for each line and let the preprocessor automatically concatenate them together, avoiding the limit. The final option is to upgrade your compiler. In C99 this limit has been increased to 4095.
    991 
    992 
    993 ### The automatic tags in the AST are annoying!
    994 
    995 When parsing from a grammar, the abstract syntax tree is tagged with different tags for each primitive type it encounters. For example a regular expression will be automatically tagged as `regex`. Character literals as `char` and strings as `string`. This is to help people wondering exactly how they might need to convert the node contents.
    996 
    997 If you have a rule in your grammar called `string`, `char` or `regex`, you may encounter some confusion. This is because nodes will be tagged with (for example) `string` _either_ if they are a string primitive, _or_ if they were parsed via your `string` rule. If you are detecting node type using something like `strstr`, in this situation it might break. One solution to this is to always check that `string` is the innermost tag to test for string primitives, or to rename your rule called `string` to something that doesn't conflict.
    998 
    999 Yes it is annoying but its probably not going to change!