Home > Uncategorized > Wishlist for JSON

Wishlist for JSON

JSON is great. It’s a nice implementation of the “heterogeneous structure of hashes and arrays” pattern that I have come across over and over. Just off the top of my head:

  • Perl, Python, Ruby, Javascript, and pretty much any language like them uses them as their basic data structure
  • The Tibco Rendezvous message format.
  • The NeXTSTEP property list file format.
  • The XML-RPC data format (even though it’s encoded in XML).
  • PDF files use them to specify data
  • Other times I can’t mention because I encountered them inside companies

This pattern is seriously everywhere. This is why JSON is nice: now there’s a sort of quasi-standard encoding people can reach for when they encounter this problem.

However, there are two things I reeeaallly wish it had, that many of the formats I mentioned above do have.

  1. A raw binary type. All it’s got right now are unicode strings, which are great if you’ve got known-good unicode data. But if you want to dump data of unknown encoding, or that might be malformed unicode, or that is just a big binary blob, JSON can’t help you. You’d have to convert it to 7-bit first using an encoding like quoted-printable or Base64. This is a bummer, and as an extra bummer, the data would no longer be self-describing — you’d have to store elsewhere (or know out-of-band) that the data is encoded this way.
  2. A date type. Right now your only option is to encode a date as a string or a number. I kind of like this guy’s proposal for doing that. He proposes encoding dates like so:
    ["FOO": @2007-06-28@]

With these two features added, JSON would be a lot more useful. Yeah, it wouldn’t be JavaScript compatible any more, but these features should make it into JavaScript also! Apparently there is a chance that the date literal notation will make it into JavaScript v2…

Categories: Uncategorized Tags:
  1. buffalo
    June 28th, 2007 at 16:32 | #1

    Despite my usual unreasoning desire to disagree with Josh, JSON is so good that it’s just hard to argue with. Although it works a hellofalot better in languages where you can say “give me a list or a hash or whatever you’ve got there”.

    But what do you mean by “self describing”? To me JSON is the canonical non-self describing language – all its structure is inferred in the spec and none of it comes from the data you’re getting.

  2. josh
    June 28th, 2007 at 17:14 | #2

    I call JSON self-describing because the data alone gives you a complete description of its structure. The data alone tells you whether you’ve got an array of arrays of arrays or a hash mapping strings to booleans (or something else).

    I don’t think I understand what *you* mean by “self-describing.”

  3. June 29th, 2007 at 11:52 | #3

    I’m wary of dates, since they’re not as self-describing as they look. What calendar is it in? What time zone? With or without leap seconds? How do you represent dates before 1 AD? Dates and times have important semantics that aren’t captured in naive representations.

    “@2007-06-28@” may work for some limited cases, but to use it as a general format is asking for trouble (in the same way sending character data without a defined encoding is asking for trouble). There are standard, unambiguous representations of dates, but they might not be as human-writeable as desired.

    Also, different applications need different semantics from their date/time objects. Some don’t care about time zones; others do. Some need “time of day” or “day of week” objects. For many, date/time data is useless without being able to encode relative times and dates. I’m not sure any single interchange format can be useful in the real world while still remaining simple to write and parse. (At least with character encoding, basically everyone can agree on UTF-8.)

  4. josh
    July 2nd, 2007 at 09:41 | #4

    I see your point Matt, but if you make some smart choices I think the problem can be defined elegantly.

    First things first: more complicated ideas like “time of day” or “day of week” definitely don’t belong in JSON. The simplest reason is that no date-related API I have ever encountered treats these as anything more complicated than an integer, which JSON already has. Part of the beauty of JSON is that the types it contains map very nicely onto all sorts of type systems and standard libraries that have evolved organically over time. No matter what language I’m using, if I want to call a function that takes or returns “time of day” or “day of week” value, it is almost invariably passed/returned as an integer. So there’s no sense in adding something like this to JSON, any more than adding a way to annotate a string to say “this is a person’s last name.”

    As far as calendars, time zones, leap seconds, and the like, I think a nice way to reconcile it all would be to introduce two different types: timestamp and date.

    A timestamp would define a point in time, and be represented by unix epoch seconds. Advantages: pretty much everyone uses it already, it can be extended arbitrarily both forwards and backwards by throwing more bits at the problem, can represent arbitrary precision using real numbers, programs have easy access to it through their existing standard libraries, and almost every date library ever invented can manipulate it natively. Sure, it doesn’t work for people who have super-specialized needs, but those people KNOW that time_t does’t work for them. :)

    A timestamp would say nothing about calendar day, time zone, time of day, or anything like that. It would be just a timestamp.

    The other type could be a calendar day. Year, month, day, that’s it. Although this is slightly ambiguous (since it doesn’t specify the calendar), again I think we’re talking about a situation where assuming it’s gregorian will work for the vast, vast majority of applications. And again, people for whom this doesn’t work will definitely know who they are.

    With these two types, I think you’d have a generally useful solution that covers the vast majority of programmers’ needs, and that don’t catch anyone off guard.

  1. No trackbacks yet.