diff --git a/CHANGELOG.md b/CHANGELOG.md index 57ca173d..d3b663e4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -27,6 +27,8 @@ This release primarily improves our support for error recovery. * Due to the error handling changes, the `Parser` interface and `Lexer` API have changed. * The emulative lexer now directly postprocesses tokens, instead of using `~__EMU__~` sequences. This changes the protected API of the lexer. +* The `Name::slice()` method now returns `null` for empty slices, previously `new Name([])` was + used. `Name::concat()` now also supports concatenation with `null`. ### Removed diff --git a/README.md b/README.md index 320ecfae..2e7ccd7a 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,9 @@ PHP Parser This is a PHP 5.2 to PHP 7.1 parser written in PHP. Its purpose is to simplify static code analysis and manipulation. -[**Documentation for version 2.x**][doc_master] (stable; for running on PHP >= 5.4; for parsing PHP 5.2 to PHP 7.0). +[Documentation for version 3.x][doc_master] (beta; for running on PHP >= 5.5; for parsing PHP 5.2 to PHP 7.1). + +[**Documentation for version 2.x**][doc_2_x] (stable; for running on PHP >= 5.4; for parsing PHP 5.2 to PHP 7.0). [Documentation for version 1.x][doc_1_x] (unsupported; for running on PHP >= 5.3; for parsing PHP 5.2 to PHP 5.6). @@ -89,7 +91,7 @@ Documentation Component documentation: - 1. [Error](doc/component/Error.markdown) + 1. [Error handling](doc/component/Error_handling.markdown) 2. [Lexer](doc/component/Lexer.markdown) [doc_1_x]: https://github.com/nikic/PHP-Parser/tree/1.x/doc diff --git a/UPGRADE-3.0.md b/UPGRADE-3.0.md index 92fd4dad..6fad42e1 100644 --- a/UPGRADE-3.0.md +++ b/UPGRADE-3.0.md @@ -152,3 +152,5 @@ The following methods, arguments or options have been removed: * The constants on `NameTraverserInterface` have been moved into the `NameTraverser` class. * The emulative lexer now directly postprocesses tokens, instead of using `~__EMU__~` sequences. This changes the protected API of the emulative lexer. + * The `Name::slice()` method now returns `null` for empty slices, previously `new Name([])` was + used. `Name::concat()` now also supports concatenation with `null`. diff --git a/doc/3_Other_node_tree_representations.markdown b/doc/3_Other_node_tree_representations.markdown index 691640e6..0830f399 100644 --- a/doc/3_Other_node_tree_representations.markdown +++ b/doc/3_Other_node_tree_representations.markdown @@ -8,7 +8,7 @@ Simple serialization It is possible to serialize the node tree using `serialize()` and also unserialize it using `unserialize()`. The output is not human readable and not easily processable from anything -but PHP, but it is compact and generates fast. The main application thus is in caching. +but PHP, but it is compact and generates quickly. The main application thus is in caching. Human readable dumping ---------------------- @@ -86,6 +86,134 @@ array( ) ``` +JSON encoding +------------- + +Nodes (and comments) implement the `JsonSerializable` interface. As such, it is possible to JSON +encode the AST directly using `json_encode()`: + +```php +$code = <<<'CODE' +create(PhpParser\ParserFactory::PREFER_PHP7); +$nodeDumper = new PhpParser\NodeDumper; + +try { + $stmts = $parser->parse($code); + + echo json_encode($stmts, JSON_PRETTY_PRINT), "\n"; +} catch (PhpParser\Error $e) { + echo 'Parse Error: ', $e->getMessage(); +} +``` + +This will result in the following output (which includes attributes): + +```json +[ + { + "nodeType": "Stmt_Function", + "byRef": false, + "name": "printLine", + "params": [ + { + "nodeType": "Param", + "type": null, + "byRef": false, + "variadic": false, + "name": "msg", + "default": null, + "attributes": { + "startLine": 3, + "endLine": 3 + } + } + ], + "returnType": null, + "stmts": [ + { + "nodeType": "Stmt_Echo", + "exprs": [ + { + "nodeType": "Expr_Variable", + "name": "msg", + "attributes": { + "startLine": 4, + "endLine": 4 + } + }, + { + "nodeType": "Scalar_String", + "value": "\n", + "attributes": { + "startLine": 4, + "endLine": 4, + "kind": 2 + } + } + ], + "attributes": { + "startLine": 4, + "endLine": 4 + } + } + ], + "attributes": { + "startLine": 3, + "endLine": 5 + } + }, + { + "nodeType": "Expr_FuncCall", + "name": { + "nodeType": "Name", + "parts": [ + "printLine" + ], + "attributes": { + "startLine": 7, + "endLine": 7 + } + }, + "args": [ + { + "nodeType": "Arg", + "value": { + "nodeType": "Scalar_String", + "value": "Hello World!!!", + "attributes": { + "startLine": 7, + "endLine": 7, + "kind": 1 + } + }, + "byRef": false, + "unpack": false, + "attributes": { + "startLine": 7, + "endLine": 7 + } + } + ], + "attributes": { + "startLine": 7, + "endLine": 7 + } + } +] +``` + +There is currently no mechanism to convert JSON back into a node tree. Furthermore, not all ASTs +can be JSON encoded. In particular, JSON only supports UTF-8 strings. + Serialization to XML -------------------- diff --git a/doc/component/Error.markdown b/doc/component/Error_handling.markdown similarity index 66% rename from doc/component/Error.markdown rename to doc/component/Error_handling.markdown index eed81776..c1579e9d 100644 --- a/doc/component/Error.markdown +++ b/doc/component/Error_handling.markdown @@ -35,6 +35,8 @@ the source code of the parsed file. An example for printing an error: if ($e->hasColumnInfo()) { echo $e->getRawMessage() . ' from ' . $e->getStartLine() . ':' . $e->getStartColumn($code) . ' to ' . $e->getEndLine() . ':' . $e->getEndColumn($code); + // or: + echo $e->getMessageWithColumnInfo(); } else { echo $e->getMessage(); } @@ -46,27 +48,23 @@ file. Error recovery -------------- -> **EXPERIMENTAL** +The error behavior of the parser (and other components) is controlled by an `ErrorHandler`. Whenever an error is +encountered, `ErrorHandler::handleError()` is invoked. The default error handling strategy is `ErrorHandler\Throwing`, +which will immediately throw when an error is encountered. -By default the parser will throw an exception upon encountering the first error during parsing. An alternative mode is -also supported, in which the parser will remember the error, but try to continue parsing the rest of the source code. - -To enable this mode the `throwOnError` parser option needs to be disabled. Any errors that occurred during parsing can -then be retrieved using `$parser->getErrors()`. The `$parser->parse()` method will either return a partial syntax tree -or `null` if recovery fails. - -A usage example: +To instead collect all encountered errors into an array, while trying to continue parsing the rest of the source code, +an instance of `ErrorHandler\Collecting` can be passed to the `Parser::parse()` method. A usage example: ```php -$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7, null, array( - 'throwOnError' => false, -)); +$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::ONLY_PHP7); +$errorHandler = new PhpParser\ErrorHandler\Collecting; -$stmts = $parser->parse($code); -$errors = $parser->getErrors(); +$stmts = $parser->parse($code, $errorHandler); -foreach ($errors as $error) { - // $error is an ordinary PhpParser\Error +if ($errorHandler->hasErrors()) { + foreach ($errorHandler->getErrors() as $error) { + // $error is an ordinary PhpParser\Error + } } if (null !== $stmts) { @@ -74,4 +72,4 @@ if (null !== $stmts) { } ``` -The error recovery implementation is experimental -- it currently won't be able to recover from many types of errors. +The `NameResolver` visitor also accepts an `ErrorHandler` as a constructor argument. \ No newline at end of file diff --git a/doc/component/Lexer.markdown b/doc/component/Lexer.markdown index 422dd378..b22942dd 100644 --- a/doc/component/Lexer.markdown +++ b/doc/component/Lexer.markdown @@ -95,13 +95,14 @@ Lexer extension A lexer has to define the following public interface: - void startLexing(string $code); + void startLexing(string $code, ErrorHandler $errorHandler = null); array getTokens(); string handleHaltCompiler(); int getNextToken(string &$value = null, array &$startAttributes = null, array &$endAttributes = null); The `startLexing()` method is invoked with the source code that is to be lexed (including the opening tag) whenever the -`parse()` method of the parser is called. It can be used to reset state or preprocess the source code or tokens. +`parse()` method of the parser is called. It can be used to reset state or preprocess the source code or tokens. The +passes `ErrorHandler` should be used to report lexing errors. The `getTokens()` method returns the current token array, in the usual `token_get_all()` format. This method is not used by the parser (which uses `getNextToken()`), but is useful in combination with the token position attributes. @@ -122,9 +123,10 @@ node and the `$endAttributes` from the last token that is part of the node. E.g. if the tokens `T_FUNCTION T_STRING ... '{' ... '}'` constitute a node, then the `$startAttributes` from the `T_FUNCTION` token will be taken and the `$endAttributes` from the `'}'` token. -An application of custom attributes is storing the original formatting of literals: The parser does not retain -information about the formatting of integers (like decimal vs. hexadecimal) or strings (like used quote type or used -escape sequences). This can be remedied by storing the original value in an attribute: +An application of custom attributes is storing the exact original formatting of literals: While the parser does retain +some information about the formatting of integers (like decimal vs. hexadecimal) or strings (like used quote type), it +does not preserve the exact original formatting (e.g. leading zeros for integers or escape sequences in strings). This +can be remedied by storing the original value in an attribute: ```php use PhpParser\Lexer; @@ -135,9 +137,10 @@ class KeepOriginalValueLexer extends Lexer // or Lexer\Emulative public function getNextToken(&$value = null, &$startAttributes = null, &$endAttributes = null) { $tokenId = parent::getNextToken($value, $startAttributes, $endAttributes); - if ($tokenId == Tokens::T_CONSTANT_ENCAPSED_STRING // non-interpolated string - || $tokenId == Tokens::T_LNUMBER // integer - || $tokenId == Tokens::T_DNUMBER // floating point number + if ($tokenId == Tokens::T_CONSTANT_ENCAPSED_STRING // non-interpolated string + || $tokenId == Tokens::T_ENCAPSED_AND_WHITESPACE // interpolated string + || $tokenId == Tokens::T_LNUMBER // integer + || $tokenId == Tokens::T_DNUMBER // floating point number ) { // could also use $startAttributes, doesn't really matter here $endAttributes['originalValue'] = $value;