2018-11-29 14:20:06 +01:00
![](img/logo.png)
2018-11-29 11:41:26 +01:00
# JSON Machine
2018-11-10 19:48:19 +01:00
2020-11-12 16:10:46 +01:00
*This README is in sync with code. For README of a specific version see its commited README.md.*\
See [CHANGELOG.md ](CHANGELOG.md ) to keep up with changes in new versions and master.\
**0.4 is the last version to support PHP 5.6**. Since 0.5 PHP 7.0+ will be required.
2020-11-09 12:49:04 +01:00
2019-04-10 19:44:18 +02:00
[![Build Status ](https://travis-ci.com/halaxa/json-machine.svg?branch=master )](https://travis-ci.com/halaxa/json-machine)
2020-04-06 12:32:18 +02:00
[![Latest Stable Version ](https://poser.pugx.org/halaxa/json-machine/v/stable )](https://packagist.org/packages/halaxa/json-machine)
[![Monthly Downloads ](https://poser.pugx.org/halaxa/json-machine/d/monthly )](https://packagist.org/packages/halaxa/json-machine)
2020-04-16 16:12:26 +02:00
2019-04-10 19:44:18 +02:00
---
2018-12-01 20:09:38 +01:00
## TL;DR;
2020-05-08 07:49:19 +02:00
JSON Machine is an efficient drop-in replacement of inefficient iteration of big JSON files or streams for PHP 5.6+:
2018-12-01 20:09:38 +01:00
2018-12-02 20:29:47 +01:00
```diff
2018-12-01 20:09:38 +01:00
< ?php
2020-11-26 20:50:58 +01:00
use \JsonMachine\JsonMachine;
2019-12-20 17:59:55 +01:00
// this often causes Allowed Memory Size Exhausted
- $users = json_decode(file_get_contents('500MB-users.json'));
2020-11-26 20:50:58 +01:00
2020-04-17 15:13:22 +02:00
// this usually takes few kB of memory no matter the file size
2020-11-26 20:50:58 +01:00
+ $users = JsonMachine::fromFile('500MB-users.json');
2018-12-01 20:09:38 +01:00
foreach ($users as $id => $user) {
2018-12-02 20:30:39 +01:00
// just process $user as usual
2018-12-01 20:09:38 +01:00
}
```
2019-12-15 15:05:19 +01:00
Random access like `$users[42]` or counting results like `count($users)` **is not possible** by design.
2019-12-13 18:23:35 +01:00
Use above-mentioned `foreach` and find the item or count the collection there.
2020-04-16 16:01:18 +02:00
Requires `ext-json` if used out of the box. See [custom decoder ](#custom-decoder ).
2020-11-11 12:39:29 +01:00
2019-12-13 18:23:35 +01:00
## Introduction
JSON Machine is an efficient, easy-to-use and fast JSON stream parser based on generators
developed for unpredictably long JSON streams or documents. Main features are:
- Constant memory footprint for unpredictably large JSON documents.
- Ease of use. Just iterate JSON of any size with `foreach` . No events and callbacks.
- Efficient iteration on any subtree of the document, specified by [Json Pointer ](#json-pointer )
- Speed. Performace critical code contains no unnecessary function calls, no regular expressions
2020-04-16 16:01:18 +02:00
and uses native `json_decode` to decode JSON document chunks by default. See [custom decoder ](#custom-decoder ).
2020-11-09 14:28:55 +01:00
- Thoroughly tested. More than 100 tests and 700 assertions.
2018-11-10 19:48:19 +01:00
2018-11-29 19:54:39 +01:00
## Parsing JSON documents
### Simple document
2020-11-26 20:50:58 +01:00
Let's say that `fruits.json` contains this really big JSON document:
2018-11-10 19:48:19 +01:00
```json
2020-11-26 20:50:58 +01:00
// fruits.json
2018-11-10 19:48:19 +01:00
{
"apple": {
"color": "red"
},
"pear": {
"color": "yellow"
}
}
```
It can be parsed this way:
```php
< ?php
2020-11-26 20:50:58 +01:00
use \JsonMachine\JsonMachine;
2018-11-10 20:25:38 +01:00
2020-11-26 20:50:58 +01:00
$fruits = JsonMachine::fromFile('fruits.json');
foreach ($fruits as $name => $data) {
2018-11-10 19:48:19 +01:00
// 1st iteration: $name === "apple" and $data === ["color" => "red"]
// 2nd iteration: $name === "pear" and $data === ["color" => "yellow"]
}
```
2019-12-21 22:34:40 +01:00
Parsing a json array instead of a json object follows the same logic.
2018-11-10 19:48:19 +01:00
The key in a foreach will be a numeric index of an item.
2020-04-17 13:08:40 +02:00
If you prefered JSON Machine to return objects instead of arrays, use `new ExtJsonDecoder()` as decoder
which by default decodes objects - same as `json_decode`
2019-12-20 17:59:55 +01:00
```php
2020-04-17 13:08:40 +02:00
< ?php
use JsonMachine\JsonDecoder\ExtJsonDecoder;
use JsonMachine\JsonMachine;
2020-11-26 20:50:58 +01:00
$objects = JsonMachine::fromFile('path/to.json', '', new ExtJsonDecoder);
2019-12-20 17:59:55 +01:00
```
2018-11-29 19:54:39 +01:00
### Parsing a subtree
2020-11-26 20:50:58 +01:00
If you want to iterate only `results` subtree in this `fruits.json` :
2018-11-10 19:48:19 +01:00
```json
2018-11-10 20:09:07 +01:00
// fruits.json
2018-11-10 19:48:19 +01:00
{
2020-11-26 20:50:58 +01:00
"results": {
2018-11-10 19:48:19 +01:00
"apple": {
"color": "red"
},
"pear": {
"color": "yellow"
}
}
}
```
2018-11-10 20:25:38 +01:00
do it like this:
2018-11-10 19:48:19 +01:00
```php
< ?php
2020-11-26 20:50:58 +01:00
use \JsonMachine\JsonMachine;
$fruits = JsonMachine::fromFile("fruits.json", "/results" /* < - Json Pointer * / ) ;
foreach ($fruits as $name => $data) {
2018-11-10 20:09:07 +01:00
// The same as above, which means:
// 1st iteration: $name === "apple" and $data === ["color" => "red"]
// 2nd iteration: $name === "pear" and $data === ["color" => "yellow"]
2018-11-10 19:48:19 +01:00
}
```
2018-11-10 20:09:07 +01:00
2018-11-29 11:41:26 +01:00
> Note:
2018-11-10 19:48:19 +01:00
>
2018-11-12 13:34:35 +01:00
> Value of `fruits-key` is not loaded into memory at once, but only one item in
> `fruits-key` at a time. It is always one item in memory at a time at the level/subtree
2018-11-29 19:54:39 +01:00
> you are currently iterating. Thus the memory consumption is constant.
2018-12-01 20:09:38 +01:00
< a name = "json-pointer" > < / a >
#### Few words about Json Pointer
It's a way of addressing one item in JSON document. See the [Json Pointer RFC 6901 ](https://tools.ietf.org/html/rfc6901 ).
It's very handy, because sometimes the JSON structure goes deeper, and you want to iterate a subtree,
not the main level. So you just specify the pointer to the JSON array or object you want to iterate and off you go.
When the parser hits the collection you specified, iteration begins. It is always a second parameter in all
`JsonMachine::from*` functions. If you specify pointer to scalar value (which logically cannot be iterated)
or non existent position in the document, an exception is thrown.
Some examples:
| Json Pointer value | Will iterate through |
|--------------------|---------------------------------------------------------------------------------------------------|
2019-12-22 13:34:50 +01:00
| `""` (empty string - default) | `["this", "array"]` or `{"a": "this", "b": "object"}` will be iterated (main level) |
2019-12-20 17:59:55 +01:00
| `"/result/items"` | `{"result":{"items":["this","array","will","be","iterated"]}}` |
| `"/0/items"` | `[{"items":["this","array","will","be","iterated"]}]` (supports array indexes) |
| `"/"` (gotcha! - a slash followed by an empty string, see the [spec ](https://tools.ietf.org/html/rfc6901#section-5 )) | `{"":["this","array","will","be","iterated"]}` |
2018-12-01 20:09:38 +01:00
2020-04-16 16:01:18 +02:00
< a name = "custom-decoder" > < / a >
## Using custom decoder
As a third parameter of all `JsonMachine::from*` functions is optional instance of
`JsonMachine\JsonDecoder\Decoder` . If none specified, `ExtJsonDecoder` is used by
default. It requires `ext-json` PHP extension to be present, because it uses
`json_decode` . When `json_decode` doesn't do what you want, you can make or use your
own decoder which must implement `JsonMachine\JsonDecoder\Decoder` .
### Available decoders
- `ExtJsonDecoder` - **Default.** Uses `json_decode` to decode keys and values.
2020-04-17 13:08:40 +02:00
Constructor takes the same params as `json_decode` .
2020-04-16 16:01:18 +02:00
- `PassThruDecoder` - uses `json_decode` to decode keys but returns values as pure JSON strings.
2020-04-17 13:08:40 +02:00
Constructor takes the same params as `json_decode` .
Example:
```php
< ?php
use JsonMachine\JsonDecoder\PassThruDecoder;
use JsonMachine\JsonMachine;
2020-11-25 01:52:07 +02:00
$jsonMachine = JsonMachine::fromFile('path/to.json', '', new PassThruDecoder);
2020-04-17 13:08:40 +02:00
```
2018-11-29 19:54:39 +01:00
2019-12-20 17:59:55 +01:00
## Parsing stream API responses
Stream API response or any other JSON stream is parsed exactly the same way as file is. The only difference
2018-11-29 19:54:39 +01:00
is, you use `JsonMachine::fromStream($streamResource)` for it, where `$streamResource` is the stream
2019-12-20 17:59:55 +01:00
resource with the JSON document. The rest is the same as with parsing files. Here are some examples of
popular http clients which support streaming responses:
2018-11-29 13:53:38 +01:00
2018-11-29 19:54:39 +01:00
### GuzzleHttp
2018-11-12 13:05:38 +01:00
Guzzle uses its own streams, but they can be converted back to PHP streams by calling
2018-11-29 19:54:39 +01:00
`\GuzzleHttp\Psr7\StreamWrapper::getResource()` . Pass the result of this function to
`JsonMachine::fromStream` function and you're set up. See working
[GuzzleHttp example ](src/examples/guzzleHttp.php ).
2019-12-20 17:59:55 +01:00
### Symfony HttpClient
A stream response of Symfony HttpClient works as iterator. And because JSON Machine is
2020-04-16 13:30:24 +02:00
based on iterators, the integration with Symfony HttpClient is very simple. See
2019-12-20 17:59:55 +01:00
[HttpClient example ](src/examples/symfonyHttpClient.php ).
2018-11-29 19:54:39 +01:00
## Efficiency of parsing streams/files
JSON Machine reads the stream or file 1 JSON item at a time and generates corresponding 1 PHP array at a time.
This is the most efficient way, because if you had say 10,000 users in JSON file and wanted to parse it using
`json_decode(file_get_contents('big.json'))` , you'd have the whole string in memory as well as all the 10,000
PHP structures. Following table demonstrates a concept of the difference:
2019-12-20 17:59:55 +01:00
| | String items in memory at a time | Decoded PHP items in memory at a time | Total |
|-----------------------------------------------------------------|---------------------------------:|--------------------------------------:|------:|
| `json_decode()` | 10000 | 10000 | 20000 |
| `JsonMachine::fromStream()` , `::fromFile()` , `::fromIterable()` | 1 | 1 | 2 |
2018-11-29 19:54:39 +01:00
This means, that `JsonMachine::fromStream` is constantly efficient for any size of processed JSON. 100 GB no problem.
## Efficiency of parsing in-memory JSON strings
There is also a method `JsonMachine::fromString()` . You may wonder, why is it there. Why just not use
`json_decode` ? True, when parsing short strings, JSON Machine may be overhead. But if you are
forced to parse a big string and the stream is not available, JSON Machine may be better than `json_decode` .
The reason is that unlike `json_decode` it still traverses the JSON string one item at a time and doesn't
load the whole resulting PHP structure into memory at once.
Let's continue with the example with 10,000 users. This time they are all in string in memory.
When decoding that string with `json_decode` , 10,000 arrays (objects) is created in memory and then the result
is returned. JSON Machine on the other hand creates single array for found item in the string and yields it back
to you. When you process this item and iterate to the next one, another single array is created. This is the same
behaviour as with streams/files. Following table puts the concept into perspective:
2019-12-20 17:59:55 +01:00
| | String items in memory at a time | Decoded PHP items in memory at a time | Total |
|-----------------------------|---------------------------------:|--------------------------------------:|------:|
| `json_decode()` | 10000 | 10000 | 20000 |
| `JsonMachine::fromString()` | 10000 | 1 | 10001 |
2018-11-29 19:54:39 +01:00
2018-12-13 20:26:42 +01:00
The reality is even brighter. `JsonMachine::fromString` consumes about **5 times less memory** than `json_decode` .
2018-11-10 19:48:19 +01:00
## Error handling
2018-11-12 12:37:48 +01:00
When any part of the JSON stream is malformed, `SyntaxError` exception is thrown. Better solution is on the way.
2018-11-29 13:53:38 +01:00
## Running tests
```bash
2019-03-16 19:20:12 +01:00
tests/run.sh
2018-11-29 13:53:38 +01:00
```
2019-03-16 19:20:12 +01:00
This uses php and composer installation already present in your machine.
### Running tests on all supported PHP platforms
2019-03-22 10:50:16 +01:00
[Install docker ](https://docs.docker.com/install/ ) to your machine and run
2019-03-16 19:20:12 +01:00
```bash
tests/docker-run-all-platforms.sh
```
This needs no php nor composer installation on your machine. Only Docker.
2018-11-29 14:20:06 +01:00
2018-12-01 20:09:38 +01:00
## Installation
```bash
composer require halaxa/json-machine
```
or clone or download this repository (not recommended).
2020-04-04 23:09:06 +02:00
## Do you like it?
Star it, share it, show it :)
2020-04-04 23:07:25 +02:00
2018-11-29 14:20:06 +01:00
## License
2018-12-01 20:09:38 +01:00
Apache 2.0
2018-11-29 19:54:39 +01:00
Cogwheel element: Icons made by [TutsPlus ](https://www.flaticon.com/authors/tutsplus )
2018-11-29 14:20:06 +01:00
from [www.flaticon.com ](https://www.flaticon.com/ )
is licensed by [CC 3.0 BY ](http://creativecommons.org/licenses/by/3.0/ )
2020-04-06 18:18:49 +02:00