1 What’s New in Pyparsing 3.1.0

author:

Paul McGuire

date:

October, 2024

abstract:

This document summarizes the changes made in the 3.1.x releases of pyparsing.

1.1 Supported Python versions

  • Added support for Python 3.12.

  • All internal string expressions using ‘%’ string interpolation and str.format() converted to f-strings.

1.2 New Features

  • Added new Tag ParserElement, for inserting metadata into the parsed results. This allows a parser to add metadata or annotations to the parsed tokens. The Tag element also accepts an optional value parameter, defaulting to True. See the new tag_metadata.py example in the examples directory.

    Example:

    # add tag indicating mood
    end_punc = "." | ("!" + Tag("enthusiastic")))
    greeting = "Hello" + Word(alphas) + end_punc
    
    result = greeting.parse_string("Hello World.")
    print(result.dump())
    
    result = greeting.parse_string("Hello World!")
    print(result.dump())
    

    prints:

    ['Hello', 'World', '.']
    
    ['Hello', 'World', '!']
    - enthusiastic: True
    
  • Extended expr[] notation for repetition of expr to accept a slice, where the slice’s stop value indicates a stop_on expression:

    test = "BEGIN aaa bbb ccc END"
    BEGIN, END = Keyword.using_each("BEGIN END".split())
    body_word = Word(alphas)
    
    # new slice syntax support
    expr = BEGIN + Group(body_word[...:END]) + END
    # equivalent to
    # expr = BEGIN + Group(ZeroOrMore(body_word, stop_on=END)) + END
    
    print(expr.parse_string(test))
    

    Prints:

    ['BEGIN', ['aaa', 'bbb', 'ccc'], 'END']
    
  • Added new class method ParserElement.using_each, to simplify code that creates a sequence of Literals, Keywords, or other ParserElement subclasses.

    For instance, to define suppressible punctuation, you would previously write:

    LPAR, RPAR, LBRACE, RBRACE, SEMI = map(Suppress, "(){};")
    

    You can now write:

    LPAR, RPAR, LBRACE, RBRACE, SEMI = Suppress.using_each("(){};")
    

    using_each will also accept optional keyword args, which it will pass through to the class initializer. Here is an expression for single-letter variable names that might be used in an algebraic expression:

    algebra_var = MatchFirst(
        Char.using_each(string.ascii_lowercase, as_keyword=True)
    )
    
  • Added new builtin python_quoted_string, which will match any form of single-line or multiline quoted strings defined in Python.

  • Word arguments are now validated if min and max are both given, that min <= max; raises ValueError if values are invalid.

  • Added ‘·’ (Unicode MIDDLE DOT) to the set of Latin1.identbodychars.

  • Added ieee_float expression to pyparsing.common, which parses float values, plus “NaN”, “Inf”, “Infinity”.

  • Minor performance speedup in trim_arity, to benefit any parsers using parse actions.

1.3 API Changes

  • Optional(expr) may now be written as expr | ""

    This will make this code:

    "{" + Optional(Literal("A") | Literal("a")) + "}"
    

    writable as:

    "{" + (Literal("A") | Literal("a") | "") + "}"
    

    Some related changes implemented as part of this work: - Literal("") now internally generates an Empty() (and no longer raises an exception) - Empty is now a subclass of Literal

  • Added new class property identifier to all Unicode set classes in pyparsing.unicode, using the class’s values for cls.identchars and cls.identbodychars. Now Unicode-aware parsers that formerly wrote:

    ppu = pyparsing.unicode
    ident = Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
    

    can now write:

    ident = ppu.Greek.identifier
    # or
    # ident = ppu.Ελληνικά.identifier
    
  • Added bool embed argument to ParserElement.create_diagram(). When passed as True, the resulting diagram will omit the <DOCTYPE>, <HEAD>, and <BODY> tags so that it can be embedded in other HTML source. (Useful when embedding a call to create_diagram() in a PyScript HTML page.)

  • Added recurse argument to ParserElement.set_debug to set the debug flag on an expression and all of its sub-expressions.

  • Reworked delimited_list function into the new DelimitedList class. DelimitedList has the same constructor interface as delimited_list, and in this release, delimited_list changes from a function to a synonym for DelimitedList. delimited_list and the older delimitedList method will be deprecated in a future release, in favor of DelimitedList.

  • ParseResults now has a new method deepcopy(), in addition to the current copy() method. copy() only makes a shallow copy - any contained ParseResults are copied as references - changes in the copy will be seen as changes in the original. In many cases, a shallow copy is sufficient, but some applications require a deep copy. deepcopy() makes a deeper copy: any contained ParseResults or other mappings or containers are built with copies from the original, and do not get changed if the original is later changed.

  • Added named field “url” to pyparsing.common.url, returning the entire parsed URL string.

  • Added exception type to trace_parse_action exception output.

  • Added <META> tag to HTML generated for railroad diagrams to force UTF-8 encoding with older browsers, to better display Unicode parser characters.

  • To address a compatibility issue in RDFLib, added a property setter for the ParserElement.name property, to call ParserElement.set_name.

  • Modified ParserElement.set_name() to accept a None value, to clear the defined name and corresponding error message for a ParserElement.

  • Updated railroad diagram generation for ZeroOrMore and OneOrMore expressions with stop_on expressions.

1.4 Discontinued Features

1.4.1 Python 2.x no longer supported

Removed Py2.x support and other deprecated features. Pyparsing now requires Python 3.6.8 or later. If you are using an earlier version of Python, you must use a Pyparsing 2.4.x version.

1.4.2 Other discontinued / deprecated features

  • ParserElement.validate() is deprecated. It predates the support for left-recursive parsers, and was prone to false positives (warning that a grammar was invalid when it was in fact valid). It will be removed in a future pyparsing release. In its place, developers should use debugging and analytical tools, such as ParserElement.set_debug() and ParserElement.create_diagram().

1.5 Fixed Bugs

  • Updated ci.yml permissions to limit default access to source.

  • Updated create_diagram() code to be compatible with railroad-diagrams package version 3.0.

  • Fixed bug in pyparsing.common.url, when input URL is not alone on an input line.

  • Fixed bug in srange, when parsing escaped ‘/’ and ‘' inside a range set.

  • Fixed exception messages for some ParserElements with custom names, which instead showed their contained expression names.

  • Fixed bug in Word when max=2. Also added performance enhancement when specifying exact argument.

  • Fixed bug when parse actions returned an empty string for an expression that had a results name, that the results name was not saved. That is:

    expr = Literal("X").add_parse_action(lambda tokens: "")("value")
    result = expr.parse_string("X")
    print(result["value"])
    

    would raise a KeyError. Now empty strings will be saved with the associated results name.

  • Fixed bug in SkipTo where ignore expressions were not properly handled while scanning for the target expression.

  • Fixed bug in NotAny, where parse actions on the negated expr were not being run. This could cause NotAny to incorrectly fail if the expr would normally match, but would fail to match if a condition used as a parse action returned False.

  • Fixed create_diagram() to accept keyword args, to be passed through to the template.render() method to generate the output HTML.

  • Fixed bug in python_quoted_string regex.

  • Fixed regression in Word(min).

  • Fixed bug in bad exception messages raised by Forward expressions.

  • Fixed regression in SkipTo, where ignored expressions were not checked when looking for the target expression.

  • Updated pep8 synonym wrappers for better type checking compatibility.

  • Fixed empty error message bug. This _should_ return pyparsing’s exception messages to a former, more helpful form. If you have code that parses the exception messages returned by pyparsing, this may require some code changes.

  • Fixed issue where PEP8 compatibility names for ParserElement static methods were not themselves defined as staticmethods. When called using a ParserElement instance, this resulted in a TypeError exception.

  • Fixed some cosmetics/bugs in railroad diagrams:

    • fixed groups being shown even when show_groups = False

    • show results names as quoted strings when show_results_names = True

    • only use integer loop counter if repetition > 2

1.6 New / Enhanced Examples

  • Added example mongodb_query_expression.py, to convert human-readable infix query expressions, such as:

    a==100 and b>=200
    

    and transform them into an equivalent query argument for the pymongo package:

    {'$and': [{'a': 100}, {'b': {'$gte': 200}}]}
    

    Supports many equality and inequality operators - see the docstring for the transform_query function for many more examples.

  • invRegex.py example renamed to inv_regex.py and updated to PEP-8 variable and method naming.

  • Removed examples sparser.py and pymicko.py, since each included its own GPL license in the header. Since this conflicts with pyparsing’s MIT license, they were removed from the distribution to avoid confusion among those making use of them in their own projects.

  • Updated the lucene_grammar.py example (better support for ‘*’ and ‘?’ wildcards) and corrected the test cases!

  • Added bf.py Brainf*ck parser/executor example. Illustrates using a pyparsing grammar to parse language syntax, and attach executable AST nodes to the parsed results.

  • Added tag_emitter.py to examples. This example demonstrates how to insert tags into your parsed results that are not part of the original parsed text.

  • Updated example select_parser.py to use PEP8 names and added Groups for better retrieval of parsed values from multiple SELECT clauses.

  • Added example email_address_parser.py.

  • Added example directx_x_file_parser.py to parse DirectX template definitions, and generate a Pyparsing parser from a template to parse .x files.

  • delta_time, lua_parser, decaf_parser, and roman_numerals examples cleaned up to use latest PEP8 names and add minor enhancements.

  • Fixed bug (and corresponding test code) in delta_time example that did not handle weekday references in time expressions (like “Monday at 4pm”) when the weekday was the same as the current weekday.

1.7 Acknowledgments

Again, thanks to the many contributors who submitted issues, questions, suggestions, and PRs.