Small docs touchups and typo fixes

This commit is contained in:
nikic 2014-09-12 00:20:22 +02:00
parent 7a3789f1a9
commit e65fd664d1
5 changed files with 84 additions and 73 deletions

View File

@ -1,16 +1,16 @@
Introduction
============
This project is a PHP 5.5 (and older) parser **written in PHP itself**.
This project is a PHP 5.2 to PHP 5.6 parser **written in PHP itself**.
What is this for?
-----------------
A parser is useful for [static analysis][0] and manipulation of code and basically any other
A parser is useful for [static analysis][0], manipulation of code and basically any other
application dealing with code programmatically. A parser constructs an [Abstract Syntax Tree][1]
(AST) of the code and thus allows dealing with it in an abstract and robust way.
There are other ways of dealing with source code. One that PHP supports natively is using the
There are other ways of processing source code. One that PHP supports natively is using the
token stream generated by [`token_get_all`][2]. The token stream is much more low level than
the AST and thus has different applications: It allows to also analyze the exact formatting of
a file. On the other hand the token stream is much harder to deal with for more complex analysis.
@ -26,13 +26,13 @@ programmatic PHP code analysis are incidentally PHP developers, not C developers
What can it parse?
------------------
The parser uses a PHP 5.5 compliant grammar, which is backwards compatible with at least PHP 5.4, PHP 5.3
and PHP 5.2 (and maybe older).
The parser uses a PHP 5.6 compliant grammar, which is backwards compatible with all PHP version from PHP 5.2
upwards (and maybe older).
As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP
version it runs on), additionally a wrapper for emulating new tokens from 5.3, 5.4 and 5.5 is provided. This
allows to parse PHP 5.5 source code running on PHP 5.2, for example. This emulation is very hacky and not
yet perfect, but it should work well on any sane code.
version it runs on), additionally a wrapper for emulating new tokens from 5.3, 5.4, 5.5 and 5.6 is provided.
his allows to parse PHP 5.6 source code running on PHP 5.3, for example. This emulation is very hacky and not
perfect, but it should work well on any sane code.
What output does it produce?
----------------------------
@ -56,7 +56,7 @@ array(
)
```
This matches the semantics the program had: An echo statement, which takes two strings as expressions,
This matches the structure of the code: An echo statement, which takes two strings as expressions,
with the values `Hi` and `World!`.
You can also see that the AST does not contain any whitespace information (but most comments are saved).

View File

@ -3,11 +3,6 @@ Installation
There are multiple ways to include the PHP parser into your project:
Installing from the Zip- or Tarball
-----------------------------------
Download the latest version from [the download page][2], unpack it and move the files somewhere into your project.
Installing via Composer
-----------------------
@ -34,6 +29,10 @@ Run the following command to install the parser into the `vendor/PHP-Parser` fol
git submodule add git://github.com/nikic/PHP-Parser.git vendor/PHP-Parser
Installing from the Zip- or Tarball
-----------------------------------
Download the latest version from [the download page][2], unpack it and move the files somewhere into your project.
[1]: http://getcomposer.org/composer.phar

View File

@ -26,31 +26,38 @@ This ensures that there will be no errors when traversing highly nested node tre
Parsing
-------
In order to parse some source code you first have to create a `PhpParser\Parser` object (which
needs to be passed a `PhpParser\Lexer` instance) and then pass the code (including `<?php` opening
tags) to the `parse` method. If a syntax error is encountered `PhpParser\Error` is thrown, so this
exception should be `catch`ed.
In order to parse some source code you first have to create a `PhpParser\Parser` object, which
needs to be passed a `PhpParser\Lexer` instance:
```php
<?php
$parser = new PhpParser\Parser(new PhpParser\Lexer);
// or
$parser = new PhpParser\Parser(new PhpParser\Lexer\Emulative);
```
Use of the emulative lexer is required if you want to parse PHP code from newer versions than the one
you're running on. For example it will allow you to parse PHP 5.6 code while running on PHP 5.3.
Subsequently you can pass PHP code (including the opening `<?php` tag) to the `parse` method in order to
create a syntax tree. If a syntax error is encountered, an `PhpParser\Error` exception will be thrown:
```php
<?php
$code = '<?php // some code';
$parser = new PhpParser\Parser(new PhpParser\Lexer);
$parser = new PhpParser\Parser(new PhpParser\Lexer\Emulative);
try {
$stmts = $parser->parse($code);
// $stmts is an array of statement nodes
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The `parse` method will return an array of statement nodes (`$stmts`).
### Emulative lexer
Instead of `PhpParser\Lexer` one can also use `PhpParser\Lexer\Emulative`. This class will emulate tokens
of newer PHP versions and as such allow parsing PHP 5.5 on PHP 5.2, for example. So if you want to parse
PHP code of newer versions than the one you are running, you should use the emulative lexer.
A parser instance can be reused to parse multiple files.
Node tree
---------
@ -104,7 +111,7 @@ with a PHP keyword.
Every node has a (possibly zero) number of subnodes. You can access subnodes by writing
`$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it
in the above example you would write `$stmts[0]->exprs`. If you wanted to access name of the function
in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function
call, you would write `$stmts[0]->exprs[1]->name`.
All nodes also define a `getType()` method that returns the node type. The type is the class name
@ -131,7 +138,7 @@ namely `PhpParser\PrettyPrinter\Standard`.
<?php
$code = "<?php echo 'Hi ', hi\\getTarget();";
$parser = new PhpParser\Parser(new PhpParser\Lexer);
$parser = new PhpParser\Parser(new PhpParser\Lexer);
$prettyPrinter = new PhpParser\PrettyPrinter\Standard;
try {
@ -143,10 +150,10 @@ try {
->exprs // sub expressions
[0] // the first of them (the string node)
->value // it's value, i.e. 'Hi '
= 'Hallo '; // change to 'Hallo '
= 'Hello '; // change to 'Hello '
// pretty print
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
$code = $prettyPrinter->prettyPrint($stmts);
echo $code;
} catch (PhpParser\Error $e) {
@ -156,7 +163,7 @@ try {
The above code will output:
<?php echo 'Hallo ', hi\getTarget();
<?php echo 'Hello ', hi\getTarget();
As you can see the source code was first parsed using `PhpParser\Parser->parse()`, then changed and then
again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`.
@ -164,8 +171,8 @@ again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`.
The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a
single expression using `prettyPrintExpr()`.
The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag and handle
inline HTML as the first/last sentence more gracefully.
The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag
and handle inline HTML as the first/last statement more gracefully.
Node traversation
-----------------
@ -180,9 +187,8 @@ structure of a program using this `PhpParser\NodeTraverser` looks like this:
```php
<?php
$code = "<?php // some code";
$parser = new PhpParser\Parser(new PhpParser\Lexer);
$parser = new PhpParser\Parser(new PhpParser\Lexer\Emulative);
$traverser = new PhpParser\NodeTraverser;
$prettyPrinter = new PhpParser\PrettyPrinter\Standard;
@ -190,6 +196,8 @@ $prettyPrinter = new PhpParser\PrettyPrinter\Standard;
$traverser->addVisitor(new MyNodeVisitor);
try {
$code = file_get_contents($fileName);
// parse
$stmts = $parser->parse($code);
@ -197,7 +205,7 @@ try {
$stmts = $traverser->traverse($stmts);
// pretty print
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
$code = $prettyPrinter->prettyPrintFile($stmts);
echo $code;
} catch (PhpParser\Error $e) {
@ -205,14 +213,16 @@ try {
}
```
A same node visitor for this code might look like this:
The corresponding node visitor might look like this:
```php
<?php
use PhpParser\Node;
class MyNodeVisitor extends PhpParser\NodeVisitorAbstract
{
public function leaveNode(PhpParser\Node $node) {
if ($node instanceof PhpParser\Node\Scalar\String) {
public function leaveNode(Node $node) {
if ($node instanceof Node\Scalar\String) {
$node->value = 'foo';
}
}
@ -221,7 +231,7 @@ class MyNodeVisitor extends PhpParser\NodeVisitorAbstract
The above node visitor would change all string literals in the program to `'foo'`.
All visitors must implement the `PhpParser\NodeVisitor` interface, which defined the following four
All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four
methods:
public function beforeTraverse(array $nodes);
@ -240,11 +250,12 @@ The `enterNode` and `leaveNode` methods are called on every node, the former whe
i.e. before its subnodes are traversed, the latter when it is left.
All four methods can either return the changed node or not return at all (i.e. `null`) in which
case the current node is not changed. The `leaveNode` method can furthermore return two special
values: If `false` is returned the current node will be removed from the parent array. If an `array`
is returned the current node will be merged into the parent array at the offset of the current node.
I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will be
`array(A, X, Y, Z, C)`.
case the current node is not changed. The `leaveNode` method can additionally return two special
values:
If `false` is returned the current node will be removed from the parent array. If an array is returned
it will be merged into the parent array at the offset of the current node. I.e. if in `array(A, B, C)`
the node `B` should be replaced with `array(X, Y, Z)` the result will be `array(A, X, Y, Z, C)`.
Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract`
class, which will define empty default implementations for all the above methods.
@ -283,10 +294,9 @@ We start off with the following base code:
```php
<?php
const IN_DIR = '/some/path';
const OUT_DIR = '/some/other/path';
$inDir = '/some/path';
$outDir = '/some/other/path';
// use the emulative lexer here, as we are running PHP 5.2 but want to parse PHP 5.3
$parser = new PhpParser\Parser(new PhpParser\Lexer\Emulative);
$traverser = new PhpParser\NodeTraverser;
$prettyPrinter = new PhpParser\PrettyPrinter\Standard;
@ -295,7 +305,7 @@ $traverser->addVisitor(new PhpParser\NodeVisitor\NameResolver); // we will need
$traverser->addVisitor(new NodeVisitor\NamespaceConverter); // our own node visitor
// iterate over all .php files in the directory
$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator(IN_DIR));
$files = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($inDir));
$files = new RegexIterator($files, '/\.php$/');
foreach ($files as $file) {
@ -310,11 +320,11 @@ foreach ($files as $file) {
$stmts = $traverser->traverse($stmts);
// pretty print
$code = '<?php ' . $prettyPrinter->prettyPrint($stmts);
$code = $prettyPrinter->prettyPrintFile($stmts);
// write the converted file to the target directory
file_put_contents(
substr_replace($file->getPathname(), OUT_DIR, 0, strlen(IN_DIR)),
substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)),
$code
);
} catch (PhpParser\Error $e) {
@ -323,7 +333,7 @@ foreach ($files as $file) {
}
```
Now lets start with the main code, the `NodeVisitor_NamespaceConverter`. One thing it needs to do
Now lets start with the main code, the `NodeVisitor\NamespaceConverter`. One thing it needs to do
is convert `A\\B` style names to `A_B` style ones.
```php
@ -340,14 +350,14 @@ class NodeVisitor_NamespaceConverter extends PhpParser\NodeVisitorAbstract
```
The above code profits from the fact that the `NameResolver` already resolved all names as far as
possible, so we don't need to do that. All the need to create a string with the name parts separated
possible, so we don't need to do that. We only need to create a string with the name parts separated
by underscores instead of backslashes. This is what `$node->toString('_')` does. (If you want to
create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create
a new name from the string and return it. Returning a new node replaces the old node.
Another thing we need to do is change the class/function/const declarations. Currently they contain
only the shortname (i.e. the last part of the name), but they need to contain the complete class
name:
only the shortname (i.e. the last part of the name), but they need to contain the complete name inclduing
the namespace prefix:
```php
<?php

View File

@ -1,7 +1,7 @@
Other node tree representations
===============================
It is possible to convert the AST in several textual representations, which serve different uses.
It is possible to convert the AST into several textual representations, which serve different uses.
Simple serialization
--------------------
@ -13,33 +13,34 @@ but PHP, but it is compact and generates fast. The main application thus is in c
Human readable dumping
----------------------
Furthermore it is possible to dump nodes into a human readable form using the `dump` method of
Furthermore it is possible to dump nodes into a human readable format using the `dump` method of
`PhpParser\NodeDumper`. This can be used for debugging.
```php
<?php
$code = <<<'CODE'
<?php
function printLine($msg) {
echo $msg, "\n";
}
printLine('Hallo World!!!');
function printLine($msg) {
echo $msg, "\n";
}
printLine('Hello World!!!');
CODE;
$parser = new PhpParser\Parser(new PhpParser\Lexer);
$parser = new PhpParser\Parser(new PhpParser\Lexer);
$nodeDumper = new PhpParser\NodeDumper;
try {
$stmts = $parser->parse($code);
echo '<pre>' . htmlspecialchars($nodeDumper->dump($stmts)) . '</pre>';
echo $nodeDumper->dump($stmts), "\n";
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The above output will have an output looking roughly like this:
The above script will have an output looking roughly like this:
```
array(
@ -77,7 +78,7 @@ array(
args: array(
0: Arg(
value: Scalar_String(
value: Hallo World!!!
value: Hello World!!!
)
byRef: false
)
@ -97,20 +98,21 @@ interfacing with other languages and applications or for doing transformation us
<?php
$code = <<<'CODE'
<?php
function printLine($msg) {
echo $msg, "\n";
}
printLine('Hallo World!!!');
function printLine($msg) {
echo $msg, "\n";
}
printLine('Hello World!!!');
CODE;
$parser = new PhpParser\Parser(new PhpParser\Lexer);
$parser = new PhpParser\Parser(new PhpParser\Lexer);
$serializer = new PhpParser\Serializer\XML;
try {
$stmts = $parser->parse($code);
echo '<pre>' . htmlspecialchars($serializer->serialize($stmts)) . '</pre>';
echo $serializer->serialize($stmts);
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
@ -185,7 +187,7 @@ Produces:
<subNode:value>
<node:Scalar_String line="6">
<subNode:value>
<scalar:string>Hallo World!!!</scalar:string>
<scalar:string>Hello World!!!</scalar:string>
</subNode:value>
</node:Scalar_String>
</subNode:value>

View File

@ -42,7 +42,7 @@ getNextToken
------------
`getNextToken` returns the ID of the next token and sets some additional information in the three variables which it
accepts by-ref. If no more tokens are available it has to return `0`, which is the ID of the `EOF` token.
accepts by-ref. If no more tokens are available it must return `0`, which is the ID of the `EOF` token.
The first by-ref variable `$value` should contain the textual content of the token. It is what will be available as `$1`
etc in the parser.