The Lexer now only provides the tokens to the parser, while the
parser is responsible for determining which attributes are placed
on notes. This only needs to be done when the attributes are
actually needed, rather than for all tokens.
This removes the usedAttributes lexer option (and lexer options
entirely). The attributes are now enabled unconditionally. They
have less overhead now, and the need to explicitly enable them for
some use cases (e.g. formatting-preserving printing) doesn't seem
like a good tradeoff anymore.
There are some additional changes to the Lexer interface that
should be done after this, and the docs / upgrading guide haven't
been adjusted yet.
Doc strings have a trailing \n and these will get interpreted as
\r\n and removed from the string contents.
For nowdoc, fall back to single quote if there's a trailing \r.
For heredoc, escape all isolated \r -- unlike \n and \r\n this is
really a special character, because this is no longer relevant as
an actual newline character.
When detecting whether the string contains the end label, allow
leading whitespace in front of it. This is legal since the
introduction of flexible doc strings.
With the introduction of flexible doc strings, the ending label
is no longer required to be followed by a semicolon or newline.
We need to prevent doc string printing if the label is followed
by any non-label character.
This makes pretty printing round trip to another Float literal,
rather than a constant lookup. The 1e1000 form in particular is
chosen because that seems to be the typical form used in various
tests.
This needs to go through something like Encapsed or ShellExec to
determine quotation type. Explicitly throw an exception to avoid
getting an undefined method error.
This is a huge hack... We temporarily create a new node with the
correct structure and use that for printing.
I think it would be better to always use a separate node type for
NewAnonClass, rather than using a combination of New and Class,
but this would require some larger changes, as this node type would
have to be both Expr and ClassLike, which is not possible right now,
as the latter is a class rather than an interface...
The parser will now always generate Identifier nodes (for
non-namespaced identifiers). This obsoletes the useIdentifierNodes
parser option.
Node constructors still accepts strings and will implicitly create
an Identifier wrapper. Identifier implement __toString(), so that
outside of strict-mode many things continue to work without changes.
Instead assign attributes on Nop nodes and in the pretty printer
specially handle end<start offsets. It's a somewhat weird case,
but not wrong per se given the meaning the offsets have.
In this mode non-namespaced names that are currently represented
using strings will be represented using Identifier nodes instead.
Identifier nodes have a string $name subnode and coerce to string.
This allows preserving attributes and in particular location
information on identifiers.
Scalar\String_ and Scalar\Encapsed now have an additional "kind"
attribute, which may be one of:
* String_::KIND_SINGLE_QUOTED
* String_::KIND_DOUBLE_QUOTED
* String_::KIND_NOWDOC
* String_::KIND_HEREDOC
Additionally, if the string kind is one of the latter two, an
attribute "docLabel" is provided, which contains the doc string
label (STR in <<<STR) that was originally used.
The pretty printer will try to take the original kind of the string,
as well as the used doc string label into account.
To distinguish array() and [] syntax. The pretty printer respects
this attribute. The shortArraySyntax pretty printer option acts as
a default in case the attribute is not specified.