standards

PHP and JSON: Cut #987

JSON Decoding in PHP 5.2.1 is Broken

As of PHP 5.2.1, json_decode() no longer follows the published standards for JSON-encoded texts.

Why not? For no reason other than the convenience of those ignorant of JSON standards.

Prior to PHP 5.2.1, this:

var_dump(json_decode('true'));

resulted in:

NULL

As of PHP 5.2.1, it results in:

bool(true)

Nice and handy, perhaps … but a blatant violation of JSON specifications, since 'true' is not a valid JSON encoded text.

A little history

Back in August, I spent a lot of time with JSON. I was working on adding Prototype/script.aculo.us support to the Solar Framework, and wanted a handy utility for passing options around in JavaScript with JSON.

Rather than roll some new JSON interpreter, I chose to leverage the Services_JSON package along with ext/json to build a package that was compatible with ext/json, but usable for those who did not have that extension installed. With compatibility built in, developers could move application code back and forth between systems without having to worry about whether or not the extension was installed — if it was, the application would benefit from the added performance of a native extension. If it wasn’t, everything should work exactly the same way.

In the course of my research for the Solar_Json package, I learned a lot about JSON and how it is supposed to behave. JSON.org is a spartan but complete resource about the format, and includes the JSON Checker and a comprehensive JSON test suite. There’s also a link to RFC 4627, which details JSON’s structure in a proposal for the formal application/json media type.

While digging through all this JSON goodness, I came to appreciate ext/json’s strict adherence to JSON’s format. The version of ext/json bundled with PHP 5.2.0 (version 1.2.1) was right on the money in its parsing, and by the time I was done, Solar_Json matched it every step of the way. To ensure ext/json compatibility, I wrote a series of unit tests (26 in all) to ensure that Solar_Json’s “pure PHP” implementation matched ext/json’s output exactly.

It was challenging, but it worked out well. The result was Solar_Json.

JSON Bundled with PHP, Confusion Ensues

To my delight, ext/json was bundled with PHP 5.2.0, and enabled by default. This was great news for PHP developers everywhere who are working with rich applications that need to exchange a lot of data with JavaScript.

All was good, for awhile.

(Yesterday, Paul M. Jones re-ran the JSON unit tests I'd written for Solar_Json using PHP 5.2.1 in preparation for a new release of Solar. He mentioned that some of the tests started failing, which sparked this discussion. Thanks, Paul!)

Sure, a couple people (myself included) didn’t fully understand JSON for awhile. I even opened (and quickly closed) a bug in the way I thought certain strings should be decoded by the json_decode() function. Others had the same confusion.

What’s so confusing?

Well, the common thing that people want to do is something like this:


var_dump(json_decode(true));
// or
var_dump(json_decode('true'));

The confusing part about these two snippets is that they both return NULL instead of true. Based on the two bug reports (#38440 and #38680), it’s common for people to expect the output to be a boolean true in these examples.

However, if you understand JSON at all, you’ll know that NULL is a perfectly reasonable result, because true and 'true' are not valid JSON texts.

Standard, Shmandard

Note that I said “if you understand JSON at all”, you’ll realize that NULL is a perfectly reasonable result when attempting to decode an invalid JSON text.

I should amend that: If you understand JSON at all and actually care about standards and compatibility, you’ll realize that NULL is a perfectly reasonable result of parsing an invalid JSON text.

Just like other formats, there’s actually a specification that defines what is valid and what is not when it comes to JSON. No, really.

Section 2 of the standard states very plainly:

A JSON text is a serialized object or array.

JSON-text = object / array

Translated, that means that a valid JSON-text is either an object or an array. It’s not a string literal, an integer, a boolean. The list of what a valid JSON-text can be is short. It can be an object. It can be an array. It can be … whoops, that’s it. An object, an array, or it just isn’t JSON.

I mean, think about it: JavaScript Object Notation. Not JavaScript Boolean Notation. Not JavaScript Assorted Stuff Notation. Objects. Arrays thrown in because in JavaScript, they’re basically the same thing. Period, the end.

DAMN, that’s inconvenient, you may be thinking. Yep, it is. But, it is what it is. If you don’t like it, submit an RFC to have it changed. That’s the way this crazy thing called the internet works.

Put another way: if you don’t like it, you do not just start making things up. Apparently, enough people unclear on the concept of JSON complained about their lack of understanding that PHP now just does whatever it wants with JSON. Check this out for the details. (And to reiterate, I’m not knocking the people who aren’t clear about JSON. I was one of them too, up until I actually researched how JSON is supposed to behave.)

Imagine if the core team behind every language did that. Hey, if you don’t like the standards, just ignore them! We can explain it away with documentation, right?

Cut #987

The cavalier attitude taken by the PHP internals team on this issue is inexcusable. Yep, cavalier — a colleague who spoke to a member of the PHP internals team about this change confirmed that the break from the JSON spec is deliberate and intentional.

To make matters worse, the version number of ext/json did not change between PHP 5.2.0 and PHP 5.2.1. In both releases, ext/json claims to be at version 1.2.1, despite this significant change.

While some are lobbying to compile the definitive business case for PHP (and I even piped in and agreed that it was necessary), some PHP internals folks are effectively shooting that effort in the foot by disregarding published standards.

I’ve spent the better part of the last two years defending my choice of PHP 5 as my preferred language, first at Feedster, now at Mashery. With all the buzz about other languages these days, the case for PHP is getting harder to make. Incidents like this will not make the case for PHP any easier.

Is this a big flap over a little thing? That’s certainly one way of looking at it. I see this flagrant disregard for published specs as one more cut toward a death by a thousand cuts.

Talented and notable developers are dropping PHP, or seriously considering other languages. If PHP’s next 10 years are to be as poignant as its first, a significant attitude adjustment is required.