Andrea Marco Sartori e21558c39d Update README
2023-01-14 13:13:10 +10:00
2023-01-14 13:13:03 +10:00
2022-09-16 07:17:51 +10:00
2022-09-16 07:17:51 +10:00
2022-09-16 07:17:51 +10:00
2022-09-16 07:17:51 +10:00
2022-09-16 07:17:51 +10:00
2022-09-15 23:37:30 +02:00
2022-09-16 07:17:51 +10:00
2022-09-15 23:37:30 +02:00
2023-01-04 16:51:20 +10:00
2022-09-16 07:17:51 +10:00
2022-09-16 07:17:51 +10:00
2022-09-15 23:37:30 +02:00
2023-01-14 10:31:18 +10:00
2022-12-29 21:32:35 +10:00
2022-09-16 07:17:51 +10:00
2022-09-16 07:17:51 +10:00
2023-01-14 13:13:10 +10:00

🧩 JSON Parser

Author PHP Version Build Status Coverage Status Quality Score PHPStan Level Latest Version Software License PSR-7 PSR-12 Total Downloads

Zero-dependencies pull parser to read large JSON from any source in a memory-efficient way.

📦 Install

Via Composer:

composer require cerbero/json-parser

🔮 Usage

JSON Parser provides a minimal API to read large JSON from any source:

use Cerbero\JsonParser\JsonParser;

// the JSON source in this example is an API endpoint
$source = 'https://randomuser.me/api/1.4?seed=json-parser&results=5';

foreach (new JsonParser($source) as $key => $value) {
    // instead of loading the whole JSON, we keep in memory only one key and value at a time
}

Depending on our taste, we can instantiate the parser in 3 different ways:

use Cerbero\JsonParser\JsonParser;

// classic object instantiation
new JsonParser($source);

// static instantiation, facilitates methods chaining
JsonParser::parse($source);

// namespaced function
use function Cerbero\JsonParser\parseJson;

parseJson($source);

Sources

A wide range of JSON sources is supported, here is the full list:

  • strings, e.g. {"foo":"bar"}
  • iterables, i.e. arrays or instances of Traversable
  • files, e.g. /path/to/large_file.json
  • resources, e.g. streams
  • API endpoint URLs, e.g. https://endpoint.json or any instance of Psr\Http\Message\UriInterface
  • PSR-7 requests, i.e. any instance of Psr\Http\Message\RequestInterface
  • PSR-7 messages, i.e. any instance of Psr\Http\Message\MessageInterface
  • PSR-7 streams, i.e. any instance of Psr\Http\Message\StreamInterface
  • Laravel HTTP client responses, i.e. any instance of Illuminate\Http\Client\Response
  • user-defined sources, i.e. any instance of Cerbero\JsonParser\Sources\Source

If the source we need to parse is not supported by default, we can implement our own custom source.

Click here to see how to implement a custom source.

To implement a custom source, we need to extend Source and implement 3 methods:

use Cerbero\JsonParser\Sources\Source;
use Traversable;

class CustomSource extends Source
{
    public function getIterator(): Traversable
    {
        // return a Traversable holding the JSON source, e.g. a Generator yielding chunks of JSON
    }

    public function matches(): bool
    {
        // return TRUE if this class can handle the JSON source
    }

    protected function calculateSize(): ?int
    {
        // return the size of the JSON in bytes or NULL if it can't be calculated
    }
}

The parent class Source gives us access to 2 properties:

  • $source: the JSON source we pass to the parser, i.e.: new JsonParser($source)
  • $config: the configuration we set by chaining methods, e.g.: $parser->pointer('/foo')

The method getIterator() defines the logic to read the JSON source in a memory-efficient way. It feeds the parser with small pieces of JSON. Please refer to the already existing sources to see some implementations.

The method matches() determines whether the JSON source passed to the parser can be handled by our custom implementation. In other words, we are telling the parser if it should use our class for the JSON to parse.

Finally, calculateSize() computes the whole size of the JSON source. It's used to track the parsing progress, however it's not always possible to know the size of a JSON source. In this case, or if we don't need to track the progress, we can return null.

Pointers

A JSON pointer is a standard used to point to nodes within a JSON. This package leverages JSON pointers to extract only some sub-trees from large JSONs.

Consider this JSON for example. To extract only the first gender and avoid parsing the rest of the JSON, we can set the /0/gender pointer:

$json = JsonParser::parse($source)->pointer('/0/gender');

foreach ($json as $key => $value) {
    // 1st and only iteration: $key === 'gender', $value === 'female'
}

JSON Parser takes advantage of the - character to define any array index, so we can extract all the genders with the /-/gender pointer:

$json = JsonParser::parse($source)->pointer('/-/gender');

foreach ($json as $key => $value) {
    // 1st iteration: $key === 'gender', $value === 'female'
    // 2nd iteration: $key === 'gender', $value === 'female'
    // 3rd iteration: $key === 'gender', $value === 'male'
    // and so on for all the objects in the array...
}

If we want to extract more sub-trees, we can set multiple pointers. Let's extract all genders and countries:

$json = JsonParser::parse($source)->pointers(['/-/gender', '/-/location/country']);

foreach ($json as $key => $value) {
    // 1st iteration: $key === 'gender', $value === 'female'
    // 2nd iteration: $key === 'country', $value === 'Germany'
    // 3rd iteration: $key === 'gender', $value === 'female'
    // 4th iteration: $key === 'country', $value === 'Mexico'
    // and so on for all the objects in the array...
}

We can also specify a callback to execute when JSON pointers are found. This is handy when we have different pointers and we need to run custom logic for each of them:

$json = JsonParser::parse($source)->pointers([
    '/-/gender' => fn (string $gender, string $key) => new Gender($gender),
    '/-/location/country' => fn (string $country, string $key) => new Country($country),
]);

foreach ($json as $key => $value) {
    // 1st iteration: $key === 'gender', $value instanceof Gender
    // 2nd iteration: $key === 'country', $value instanceof Country
    // and so on for all the objects in the array...
}

⚠️ Please note the parameters order of the callbacks: the value is passed before the key.

The same can also be achieved by chaining the method pointer() multiple times:

$json = JsonParser::parse($source)
    ->pointer('/-/gender', fn (string $gender, string $key) => new Gender($gender))
    ->pointer('/-/location/country', fn (string $country, string $key) => new Country($country));

foreach ($json as $key => $value) {
    // 1st iteration: $key === 'gender', $value instanceof Gender
    // 2nd iteration: $key === 'country', $value instanceof Country
    // and so on for all the objects in the array...
}

If the callbacks are enough to handle the pointers and we don't need to run any common logic for all pointers, we can avoid to manually call foreach() by chaining the method traverse():

JsonParser::parse($source)
    ->pointer('/-/gender', $this->storeGender(...))
    ->pointer('/-/location/country', $this->storeCountry(...))
    ->traverse();

// no foreach needed

Otherwise if some common logic for all pointers is needed and we prefer methods chaining to manual loops, we can pass a callback to the traverse() method:

JsonParser::parse($source)
    ->pointer('/-/gender', fn (string $gender, string $key) => new Gender($gender))
    ->pointer('/-/location/country', fn (string $country, string $key) => new Country($country))
    ->traverse(function (Gender|Country $value, string $key) {
        // 1st iteration: $key === 'gender', $value instanceof Gender
        // 2nd iteration: $key === 'country', $value instanceof Country
        // and so on for all the objects in the array...
    });

// no foreach needed

⚠️ Please note the parameters order of the callbacks: the value is passed before the key.

📆 Change log

Please see CHANGELOG for more information on what has changed recently.

🧪 Testing

composer test

💞 Contributing

Please see CONTRIBUTING and CODE_OF_CONDUCT for details.

🧯 Security

If you discover any security related issues, please email andrea.marco.sartori@gmail.com instead of using the issue tracker.

🏅 Credits

⚖️ License

The MIT License (MIT). Please see License File for more information.

Description
🧩 Zero-dependencies pull parser to read large JSON from any source in a memory-efficient way.
Readme 364 KiB
Languages
PHP 100%