2011-05-31 18:01:00 +02:00
|
|
|
PHP Parser
|
|
|
|
==========
|
|
|
|
|
|
|
|
This is a PHP parser written in PHP. It's purpose is to simplify static code analysis and
|
|
|
|
manipulation.
|
|
|
|
|
2011-09-28 16:26:51 +02:00
|
|
|
***Note: This project is experimental. There are no known bugs in the parser itself, but the API is
|
|
|
|
subject to change.***
|
2011-05-31 18:01:00 +02:00
|
|
|
|
|
|
|
Components
|
|
|
|
==========
|
|
|
|
|
|
|
|
This package currently bundles several components:
|
|
|
|
|
2011-05-31 19:35:47 +02:00
|
|
|
* The `Parser` itself
|
|
|
|
* A `NodeDumper` to dump the nodes to a human readable string representation
|
2011-08-09 12:55:32 +02:00
|
|
|
* A `NodeTraverser` to traverse and modify the node tree
|
2011-05-31 19:35:47 +02:00
|
|
|
* A `PrettyPrinter` to translate the node tree back to PHP
|
2011-05-31 18:01:00 +02:00
|
|
|
|
2011-06-05 18:47:52 +02:00
|
|
|
Autoloader
|
|
|
|
----------
|
|
|
|
|
|
|
|
In order to automatically include required files `PHPParser_Autoloader` can be used:
|
|
|
|
|
2011-09-28 16:26:51 +02:00
|
|
|
require_once 'path/to/PHP-Parser/lib/PHPParser/Autoloader.php';
|
2011-06-05 18:47:52 +02:00
|
|
|
PHPParser_Autoloader::register();
|
|
|
|
|
2011-08-14 15:36:15 +02:00
|
|
|
Parser and Parser_Debug
|
2011-09-28 16:26:51 +02:00
|
|
|
-----------------------
|
2011-05-31 18:01:00 +02:00
|
|
|
|
2011-06-05 18:40:04 +02:00
|
|
|
Parsing is performed using `PHPParser_Parser->parse()`. This method accepts a `PHPParser_Lexer`
|
|
|
|
as the only parameter and returns an array of statement nodes. If an error occurs it throws a
|
2011-06-05 18:52:41 +02:00
|
|
|
PHPParser_Error.
|
2011-05-31 18:01:00 +02:00
|
|
|
|
|
|
|
$code = '<?php // some code';
|
|
|
|
|
2011-06-03 17:44:23 +02:00
|
|
|
try {
|
2011-06-05 18:40:04 +02:00
|
|
|
$parser = new PHPParser_Parser;
|
|
|
|
$stmts = $parser->parse(new PHPParser_Lexer($code));
|
2011-06-05 18:52:41 +02:00
|
|
|
} catch (PHPParser_Error $e) {
|
2011-06-03 17:44:23 +02:00
|
|
|
echo 'Parse Error: ', $e->getMessage();
|
|
|
|
}
|
2011-05-31 18:01:00 +02:00
|
|
|
|
2011-08-14 15:36:15 +02:00
|
|
|
The `PHPParser_Parser_Debug` class also parses PHP code, but outputs a debug trace while doing so.
|
2011-05-31 18:01:00 +02:00
|
|
|
|
|
|
|
Node Tree
|
|
|
|
---------
|
|
|
|
|
2011-09-21 21:43:19 +02:00
|
|
|
The output of the parser is an array of statement nodes. All nodes implement the `PHPParser_Node`
|
|
|
|
interface (and extend `PHPParser_NodeAbstract`). Furthermore nodes are divided into three categories:
|
2011-05-31 18:01:00 +02:00
|
|
|
|
2011-06-05 18:40:04 +02:00
|
|
|
* `PHPParser_Node_Stmt`: A statement
|
|
|
|
* `PHPParser_Node_Expr`: An expression
|
|
|
|
* `PHPParser_Node_Scalar`: A scalar (which is a string, a number, aso.)
|
|
|
|
`PHPParser_Node_Scalar` inherits from `PHPParser_Node_Expr`.
|
2011-05-31 18:01:00 +02:00
|
|
|
|
2011-06-05 18:40:04 +02:00
|
|
|
Each node may have subnodes. For example `PHPParser_Node_Expr_Plus` has two subnodes, namely `left`
|
2011-09-29 18:51:12 +02:00
|
|
|
and `right`, which represent the left hand side and right hand side expressions of the plus operation.
|
2011-05-31 18:01:00 +02:00
|
|
|
Subnodes are accessed as normal properties:
|
|
|
|
|
|
|
|
$node->left
|
|
|
|
|
|
|
|
The subnodes which a certain node can have are documented as `@property` doccomments in the
|
|
|
|
respective files.
|
|
|
|
|
2011-07-14 13:21:41 +02:00
|
|
|
Additionally all nodes have two methods, `getLine()` and `getDocComment()`.
|
|
|
|
`getLine()` returns the line a node started in.
|
|
|
|
`getDocComment()` returns the doccomment before the node or `null` if there was none.
|
|
|
|
|
2011-05-31 18:01:00 +02:00
|
|
|
NodeDumper
|
|
|
|
----------
|
|
|
|
|
2011-06-05 18:40:04 +02:00
|
|
|
Nodes can be dumped into a string representation using the `PHPParser_NodeDumper->dump()` method:
|
2011-05-31 18:01:00 +02:00
|
|
|
|
|
|
|
$code = <<<'CODE'
|
2011-05-31 19:35:47 +02:00
|
|
|
<?php
|
|
|
|
function printLine($msg) {
|
|
|
|
echo $msg, "\n";
|
|
|
|
}
|
2011-05-31 18:01:00 +02:00
|
|
|
|
2011-05-31 19:35:47 +02:00
|
|
|
printLine('Hallo World!!!');
|
|
|
|
CODE;
|
2011-05-31 18:01:00 +02:00
|
|
|
|
2011-06-03 17:44:23 +02:00
|
|
|
try {
|
2011-06-05 18:40:04 +02:00
|
|
|
$parser = new PHPParser_Parser;
|
|
|
|
$stmts = $parser->parse(new PHPParser_Lexer($code));
|
2011-05-31 18:01:00 +02:00
|
|
|
|
2011-06-05 18:40:04 +02:00
|
|
|
$nodeDumper = new PHPParser_NodeDumper;
|
2011-05-31 18:01:00 +02:00
|
|
|
echo '<pre>' . htmlspecialchars($nodeDumper->dump($stmts)) . '</pre>';
|
2011-06-05 18:52:41 +02:00
|
|
|
} catch (PHPParser_Error $e) {
|
2011-06-03 17:44:23 +02:00
|
|
|
echo 'Parse Error: ', $e->getMessage();
|
2011-05-31 18:01:00 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
This script will have an output similar to the following:
|
|
|
|
|
|
|
|
array(
|
|
|
|
0: Stmt_Func(
|
|
|
|
byRef: false
|
|
|
|
name: printLine
|
|
|
|
params: array(
|
|
|
|
0: Stmt_FuncParam(
|
|
|
|
type: null
|
|
|
|
name: msg
|
|
|
|
byRef: false
|
|
|
|
default: null
|
|
|
|
)
|
|
|
|
)
|
|
|
|
stmts: array(
|
|
|
|
0: Stmt_Echo(
|
|
|
|
exprs: array(
|
|
|
|
0: Variable(
|
|
|
|
name: msg
|
|
|
|
)
|
|
|
|
1: Scalar_String(
|
|
|
|
value:
|
|
|
|
|
|
|
|
)
|
|
|
|
)
|
|
|
|
)
|
|
|
|
)
|
|
|
|
)
|
|
|
|
1: Expr_FuncCall(
|
|
|
|
func: Name(
|
|
|
|
parts: array(
|
|
|
|
0: printLine
|
|
|
|
)
|
|
|
|
)
|
|
|
|
args: array(
|
2011-09-22 20:34:35 +02:00
|
|
|
0: Arg(
|
2011-05-31 18:01:00 +02:00
|
|
|
value: Scalar_String(
|
|
|
|
value: Hallo World!!!
|
|
|
|
)
|
|
|
|
byRef: false
|
|
|
|
)
|
|
|
|
)
|
|
|
|
)
|
|
|
|
)
|
|
|
|
|
2011-08-09 12:55:32 +02:00
|
|
|
NodeTraverser
|
|
|
|
-------------
|
|
|
|
|
|
|
|
The node traverser allows traversing the node tree using a visitor class. A visitor class must
|
2011-09-24 23:37:47 +02:00
|
|
|
implement the `NodeVisitor` interface, which defines the following four methods:
|
2011-08-09 12:55:32 +02:00
|
|
|
|
2011-09-24 23:37:47 +02:00
|
|
|
public function beforeTraverse(array $nodes);
|
|
|
|
public function enterNode(PHPParser_Node $node);
|
|
|
|
public function leaveNode(PHPParser_Node $node);
|
|
|
|
public function afterTraverse(array $nodes);
|
2011-08-09 12:55:32 +02:00
|
|
|
|
2011-09-24 23:37:47 +02:00
|
|
|
The `beforeTraverse` method is called once before the traversal begins and is passed the nodes the
|
2011-08-09 12:55:32 +02:00
|
|
|
traverser was called with. This method can be used for resetting values before traversation or
|
|
|
|
preparing the tree for traversal.
|
|
|
|
|
|
|
|
The `afterTraverse` method is similar to the `beforeTraverse` method, with the only difference that
|
|
|
|
it is called once after the traversal.
|
|
|
|
|
2011-09-24 23:37:47 +02:00
|
|
|
The `enterNode` and `leaveNode` methods are called on every node, the former when it is entered,
|
|
|
|
i.e. before its subnodes are traversed, the latter when it is left.
|
2011-08-09 12:55:32 +02:00
|
|
|
|
2011-09-24 23:37:47 +02:00
|
|
|
All four methods can either return the changed node or not return at all (or return `null`) in which
|
|
|
|
case the current node is not changed. The `leaveNode` method can furthermore return two special
|
|
|
|
values: If `false` is returned the current node will be removed from the parent array. If an `array`
|
|
|
|
is returned the current node will be merged into the parent array at the offset of the current node.
|
|
|
|
I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will be
|
|
|
|
`array(A, X, Y, Z, C)`.
|
2011-08-09 12:55:32 +02:00
|
|
|
|
|
|
|
The above described visitors are registered in the `NodeTraverser` class:
|
|
|
|
|
|
|
|
$visitor = new MyVisitor;
|
|
|
|
|
|
|
|
$traverser = new PHPParser_NodeTraverser;
|
|
|
|
$traverser->addVisitor($visitor);
|
|
|
|
|
|
|
|
$stmts = $parser->parse($lexer);
|
2011-09-24 23:37:47 +02:00
|
|
|
$stmts = $traverser->traverse($stmts);
|
2011-08-09 12:55:32 +02:00
|
|
|
|
|
|
|
With `MyVisitor` being something like that:
|
|
|
|
|
|
|
|
class MyVisitor extends PHPParser_NodeVisitorAbstract
|
|
|
|
{
|
2011-09-24 23:37:47 +02:00
|
|
|
public function enterNode(PHPParser_Node $node) {
|
2011-08-09 12:55:32 +02:00
|
|
|
// ...
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
As you can see above you don't need to define all four methods if you extend
|
|
|
|
`PHPParser_NodeVisitorAbstract` instead of directly implementing the interface.
|
|
|
|
|
2011-05-31 18:01:00 +02:00
|
|
|
PrettyPrinter
|
|
|
|
-------------
|
|
|
|
|
|
|
|
The pretty printer compiles nodes back to PHP code. "Pretty printing" here is just the formal
|
|
|
|
name of the process and does not mean that the output is in any way pretty.
|
|
|
|
|
2011-06-05 18:40:04 +02:00
|
|
|
$prettyPrinter = new PHPParser_PrettyPrinter_Zend;
|
2011-06-02 22:52:24 +02:00
|
|
|
echo '<pre>' . htmlspecialchars($prettyPrinter->prettyPrint($stmts)) . '</pre>';
|
2011-05-31 18:01:00 +02:00
|
|
|
|
|
|
|
For the code mentioned in the above section this should create the output:
|
|
|
|
|
|
|
|
function printLine($msg)
|
|
|
|
{
|
|
|
|
echo $msg, "\n";
|
|
|
|
}
|
2011-08-04 18:19:45 +02:00
|
|
|
printLine('Hallo World!!!');
|
|
|
|
|
|
|
|
You can also pretty print only a single expression using the `prettyPrintExpr()` method.
|