Home > Uncategorized > Ruby 1.9.1 released

Ruby 1.9.1 released

January 31st, 2009 Josh Leave a comment Go to comments

Ruby 1.9.1 was released today. You wouldn’t know it from the version number “1.9.1″ but this is a big deal in the Ruby community. It’s the Python 3000 of Ruby, except it’s an even bigger deal than that because it’s not only a bunch of backward-incompatible language changes, but it’s also a totally new implementation with much better performance than the Ruby 1.8.x line.

I’m glad to see this release, because I was starting to worry it would never happen, sort of like Perl 6. Now that Perl 6 has become the Duke Nukem Forever of programming languages, I was worried that Ruby 2.0 (oops, 1.9.x now) was going to suffer the same fate, and cause stagnation in the Ruby community.

But now it’s out, and the performance number show it still lagging behind Python a bit, but significantly faster than Ruby 1.8. It also apparently has native threads, which Ruby 1.8 did not.

I like Ruby quite a lot as a language. But I’m finding it harder to buy into its culture and its future. One major example is the documentation (or lack thereof) that accompanies this release. Being that this is a major release with major improvements and backward-incompatible changes, you would expect a great deal of information about what has changed. But here is what the “NEWS” file (their release notes) looks like:

NEWS

This document is a list of user visible feature changes made between releases except for bug fixes.

Note that each entry is kept so brief that no reason behind or reference information is supplied with. For a full list of changes with all sufficient information, see the ChangeLog file.

Changes since the 1.8.7 release

See doc/NEWS-1.8.7 for changes between 1.8.6 and 1.8.7.

Compatibility issues

  • language core
    • New syntax and semantics
      • Block arguments are always local
      • New semantics for block arguments
      • defined? and local variables
      • Parser expects that your source code has only valid byte
        sequence in some character encoding. Use magic comments
        to tell the parser which encoding you use.
      • New semantics for constant definition in instance_eval
        or in module_eval.

And so on, for pages and pages. This list is meaningless to anyone who isn’t already familiar with the changes that have been happening in Ruby 1.9.x. There’s simply no way to know what any of it is talking about. Now the top of the file claims that the ChangeLog has rationale and reference information, but take a look for yourself: it isn’t any better.

Not to mention that there appears to be absolutely no documentation accompanying this release. No documentation about the language, no documentation about the libraries (it’s probably possible to generate some using rubydoc, but no pre-built version appears to be available anywhere), just none. The one document that’s included about writing extensions makes absolutely no mention of threads, despite the fact that Ruby now has threads (which presumably extension writers have to deal with somehow).

And while one would like to think that it’s just a matter of time, my experience with Ruby libraries is that you come to expect very sparse and under-documented stuff like this.

Like I said, this is disappointing to me, because Ruby is my favorite language of its kind. I’ve written a lot of Ruby. Hell, at Amazon I started what I called the “Ruby Revolution,” and poured tons of time and energy into it. But although Matz has designed a very nice language, the support structures around that language seem more and more amateurish the more I think about them.

I mean, I’m just one guy, and Gazelle has approximately zero users at the moment, but I still write up descriptive release notes and have an extensive manual.

But for a more fair comparison, look at Python 3000. Look at its list of what’s changed since Python 2.6, for which every changed references a more detailed “Python Enhancement Proposal” which gives rationale and yet further references to discussion. And look at its list of documentation: a tutorial, a language reference, a library reference, detailed information about extending and embedding, and more. It makes Ruby’s lack of documentation even more stark.

To be fair, the Pickaxe book has always been Ruby’s unofficial manual, and it appears that the Ruby 1.9 version of the book is already in “beta”. I think I’ll buy a copy just so I can have some idea of what the heck is going on with Ruby 1.9

But it’s still hard to shake the feeling that Python is more mature, and a safer bet.

A bet for what you ask? I have another project (something impractically big and ambitious) kicking around in my head for which something like Ruby or Python is the right answer. And I’ve been trying to decide which way to go…

Update: I did just think of one significant difference between Ruby 1.9.x and Python 3000. The difference has to do with character encoding. Python has taken an “everything is Unicode” approach to character encoding — it doesn’t support any other encodings it doesn’t support any other character sets internally, which means that round-trips from any encodings that are not injective onto Unicode are lossy (revised to speak more precisely, since many people took the wrong meaning from my original wording). Ruby on the other hand supports arbitrary encodings, for both Ruby source files and for data that Ruby programs deal with.

This is one issue about which Ruby seems like the somewhat safer bet. “Everything is Unicode” seems fine to us westerners, but apparently isn’t not as favored by people from Japan. I can’t find a reference now, but the two biggest reasons seem to be that Japanese characters all get blown up to 2 or 3 bytes or something, and Han Unification which means that unless you get the font right, a Japanese character can get displayed as the corresponding Chinese character with the same meaning.

But regardless of the reasons, putting all your eggs in one encoding basket (as Python has done) seems less future-proof than supporting arbitrary encodings, as Ruby 1.9.x does. +1 Ruby.

Categories: Uncategorized Tags:
  1. Joe
    January 31st, 2009 at 08:09 | #1

    While Ruby 1.9.1 has native threads, they can’t be run in parallel. Ruby and Python will both need another major breaking release to get to there.

    The benchmarks you cite are from mid 2008, before either the new python or ruby official release.

  2. Nathan Youngman
    January 31st, 2009 at 08:29 | #2

    Python certainly does seem to have a more mature ecosystem, which I expect has little to do with it being ~3 years older than Ruby.

    I was thinking about picking up “The Ruby Programming Language” which is reviewed as better than PickAxe and has some 1.9 coverage. But it’s a year old, and Dave Thomas has blogged about pretty significant changes to multilingual support in 1.9.1, which may only be documented in the new PickAxe. Hmm.

  3. josh
    January 31st, 2009 at 09:27 | #3

    @Joe: Yes, that’s true (that only one thread runs at a time in Ruby and Python), but I don’t think that either language its is on its way to ditching its global lock. Python tried and rejected fine-grained locking because all the locking/unlocking slowed down single-threaded code by 2x.

    Yeah, the benchmarks are a bit old, but I don’t get the impression that Ruby has undergone a lot of optimization work since YARV was merged. And Python 3000 is actually slower than Python 2.x, though only by about 10%.

    @Nathan: I did end up buying the PDF pickaxe book, and it’s 1.9 coverage is pretty good, except that there doesn’t seem to be a comprehensive “what’s new in 1.9.x” list anywhere. Then again, the book is only in beta, so perhaps this will appear before the final version.

  4. chromatic
    January 31st, 2009 at 11:00 | #4

    One quick correction regarding Perl 6: we release a new stable version of Rakudo (Perl 6 on Parrot) on the third Tuesday of every month, and we’ve done so for over a year. You can see daily statistics of progress at http://rakudo.de/ — and I’ve never seen Python or Ruby or PHP or any other language offer regular releases, a comprehensive specification and test suite, and daily, public review of progress toward full completion.

  5. josh
    January 31st, 2009 at 11:58 | #5

    @chromatic: regular releases are great for development transparency. But Perl 6 was announced nine years ago! It has been in development for approximately one third of my life. There is no more reason to believe it will be finished in the next year than in any of the last nine. On the contrary, each year it isn’t finished makes it more likely (at least IMO) that it never will be.

    I’m sure you’re doing very good work, I just think that Perl 6 got too big and too complicated to be able to sustain itself.

  6. Rudd-O
    January 31st, 2009 at 12:10 | #6

    Um, unless the japs decide to use only 256 characters to represent their language, there is NO WAY that an asian character (more than 60.000 ideograms exist) is going to take less than two or three bytes. And, hello, this is 2008, a couple bytes here and there is NOTHING. But I’ll grant you the problem with Han unification.

  7. January 31st, 2009 at 13:16 | #7

    “Python certainly does seem to have a more mature ecosystem, which I expect has little to do with it being ~3 years older than Ruby”

    That’s not it. It goes much deeper. Python has always had a a very well thought out design and documentation. I assure you that, three years ago, Python had much better documentation, a more mature ecosystem, and a more mature community than Ruby has today. It’s the core difference in the communitys, Python values pragmatic, stable, sound evolution over glitz and glamour.

  8. chromatic
    January 31st, 2009 at 13:26 | #8

    Actually, Rakudo has been in development since November 2007.

    Perl 6 and Parrot have had a lot of false starts, like many other ambitious projects. For example, Python 3000 was first announced in February 2000, and it finally came out as Python 3.0 in late 2008 — and the Python developers didn’t have to invent anything as inventive as Perl 6 grammars.

    You’re welcome to doubt that any Perl 6 effort will ever release a complete implementation, but I invite everyone to look at the passing test rates for Rakudo as well as the monthly stable release cycle. Software development often defies predictions, but I trust groups which make and then meet their commitments reliably.

  9. troelskn
    January 31st, 2009 at 13:32 | #9

    > Python has taken an “everything is Unicode” approach to character encoding — it doesn’t support any other encodings.

    Unicode is not an encoding – It’s a charset. It’s also the only sane way to represent characters in memory.

  10. Paul Prescod
    January 31st, 2009 at 14:36 | #10

    As Troelskn said, Unicode is not an encoding.

    If it is a “risk”, it is risk that has been taken by Windows, Mac OS X, Java, Open Office and XML. Without getting into the technical merits, I can’t see how it is “risky” to adopt the same standard as every operating system, programming language and software program. Most of them made the switch around a decade ago, so it isn’t as if it is new and experimental technology.

    With respect to Ruby documentation: my experience is that the Ruby community uses blogs and Google as their documentation system. You figure out the part that’s interesting to you and document it. Then Google indexes it and everybody labels that feature “documented”. Yes, its ridiculous.

    For example, Python’s exception hierarchy is in this document:

    http://docs.python.org/library/exceptions.html#bltin-exceptions

    I have never been able to find anything more official about Ruby’s than this blog post:

    http://blog.nicksieger.com/articles/2006/09/06/rubys-exception-hierarchy

  11. January 31st, 2009 at 15:41 | #11

    “This is one issue about which Ruby seems like the somewhat safer bet. “Everything is Unicode” seems fine to us westerners, but apparently isn’t not as favored by people from Japan.”

    No, this is bullshit fear mongering by the Japanese. First of all, Python’s attitude is not “everything is Unicode” in terms of encoding because… Unicode is NOT an encoding! Unicode is a mapping of code points (numbers in the Platonic sense) to characters or character parts. UTF-8 is an encoding. UTF-16 is an encoding. Unicode is not an encoding. Python 3.0’s attitude is to translate on input all bytestrings it receives into unicode strings on the basis of an encoding. What encoding? Well, you can specify whatever encoding you want. There are different defaults and so on, but it’s up to you if you specify. The same thing applies when you write a unicode string out. First you turn it into a byte string by means of some encoding, then you write it out.

    “I can’t find a reference now, but the two biggest reasons seem to be that Japanese characters all get blown up to 2 or 3 bytes or something,”

    Japanese characters are always going to get blown up to 2 or 3 bytes, since there are an ass-ton of them. ‘S just the way it is. Now, it is true that when you take a Unicode string that consists primarily of Japanese text and encode into UTF-8 or UTF-16 the resulting file will be somewhat larger than if the same Unicode string were encoded with Shift-JIS. However, again Unicode is not an encoding, so if you want to use Shift-JIS, Python 3.0 won’t stand in your way. (In fact, on my Japanese Mac, the default encoding in Python 3.0 is “X-MAC-JAPANESE” and not UTF-8/16 at all.)

    “and Han Unification which means that unless you get the font right, a Japanese character can get displayed as the corresponding Chinese character with the same meaning.”

    Again, this is stupid. When Unicode was invented, some Asians complained about how the Chinese-style characters were only being encoded once, even though they are written slightly differently in different countries. The biggest example of this is 骨, where the box inside the box at the top switches from right to left depending on your font. This is not a big deal. The character is legible either way. 令 shifts around a bit more so that if you’re not familiar with other writing systems you might get confused, but here’s the first thing: Encoding this stuff differently would be like having a different encoding for cursive Q or 7 and z with the dash in the middle. In some countries different variants are more frequent, but it’s fundamentally a font issue, not a character issue. The other thing is, today in 2009, now that Unicode exists as opposed to when it was planned, there’s no alternative way to do things. Unicode encodes every character that is in Shift_JIS. Some people might bitch that it shouldn’t encode it the same way that all the characters in the Chinese standards are encoded but a) it’s too late for that and b) the only alternative one has to using Unicode is to mix different encodings into the same file, which is a spectacularly idiotic idea. If for some reason one wanted to show both ways of writing 令 (and the only purpose for that would be if you were writing a Chinese textbook in Japanese or something meta, along those lines), switching encodings in the middle of a bytestream would be insane, since it would screw up whatever programs you tried to use to handle the file. The simpler, sane solution to the so-called problem is, specify a font if you want to ensure that 令 looks a certain way. The end.

    “But regardless of the reasons, putting all your eggs in one encoding basket (as Python has done) seems less future-proof than supporting arbitrary encodings, as Ruby 1.9.x does. 1 Ruby.”

    Again, this is completely inaccurate. Python 3.0 supports arbitrary encodings. Unicode is not an encoding.

  12. Paul Prescod
    January 31st, 2009 at 15:55 | #12

    Chromatic’s history of Python 3000 is misleading (unintentially, I hope). Python 3000 is so-named because it was a not originally supposed to be a real release in any foreseeable time frame. The release was “promised” sometime before the year 3000. It was basically a way to label ideas that could not be feasibly implemented in the existing code base rather than simply dismiss them as impractical.

    Personally, my bug trackers always have a version identifier that represents the Nirvana release where all problems will be solved. It doesn’t mean that I’ve committed to release such a version during this lifetime.

    Around 2006 Guido decided that he actually did the have the bandwidth to execute a backwards-incompatible Python release. At that point, Python 3000 was renamed Python 3K and a professional scoping and scheduling process began.

    The scoping mailing list starts on March 2006:

    http://mail.python.org/pipermail/python-3000/2006-March/thread.html

    On June 29, 2006, a “skeleton” schedule was released:

    http://svn.python.org/view/peps/trunk/pep-3000.txt?rev=43672&view=markup

    “At this moment, I hope to have a first alpha release out sometime in 2007; it may take
    another year after that (or more) before the first proper release, named Python 3.0.”

    The first release was in August 2007. Approximately a year and a half later the final version of Python 3.0 was released

    I think that we all know that the Perl 6 development process has been much less predictable and not at all comparable. Even Ruby 1.9’s relatively minor update schedule has been fairly opaque, to me, at least, compared to Python’s, which has been documented since the beginning and updated regularly.

  13. Paweł Kondzior
    January 31st, 2009 at 16:09 | #13

    First link was not hyperlinked correctly. So here is the right link

    http://eigenclass.org/hiki.rb?Changes in Ruby 1.9

  14. josh
    January 31st, 2009 at 16:35 | #14

    @all the “unicode is not an encoding” people:

    http://blog.reverberate.org/2009/01/31/unicode-not-an-encoding/

    @troelskn: What makes Unicode the only sane way to store data in memory? As long as the support for a particular encoding/charset includes the operations you need (count characters, extract an individual character, extract substrings, etc). what is so insane about storing data in its native encoding, annotated with what encoding it is? You can always dispatch these operations to the proper implementation based on the native encoding.

    @Paul: your point is taken, but it’s never too late for other cultures to go down their own path because the current crop of software isn’t doing it for them. For example, haven’t I heard rumors about China and Russia working on their own operating systems? I know it’s not for any reason having to do with Unicode, but the point is just that the world’s a big place with lots of people who have totally different perspectives than us.

    “Risk” is probably the wrong word, since it will probably never keep your software from becoming popular or succeeding. But it might keep it from being useful to someone at some point, because they really want to not be stuck with what to them are the downsides of Unicode.

    I guess the point is, why commit when you don’t have to? I haven’t studied Ruby’s charset/encoding support in depth, but it appears to demonstrate that you don’t have to.

    @Carl: your arguments boil down to “the problems with Unicode are not important because they are not important to me.” Not very convincing. Also, you say that Python 3.0 supports “arbitrary encodings.” It may be able to transcode from arbitrary encodings, but this transcoding is not lossless if everything is Unicode internally.

  15. January 31st, 2009 at 17:29 | #15

    [Paul Prescod] > I have never been able to find anything more official about Ruby’s than this blog post: http://blog.nicksieger.com/articles/2006/09/06/rubys-exception-hierarchy

    I use Zen Spider’s Ruby QuickRef for thumbnail help. (http://www.zenspider.com/Languages/Ruby/QuickRef.html) Among other things, it shows an exception hierarchy. Yet there’s no indication what version of Ruby any of this is coming from, which is a shame.

    What I find really disturbing, however, is the exception hierarchy changes every few weeks or months! That’s not to fault zenspider. You’ll notice the blog.nicksiegler.com link above advises you to ask Ruby itself to display its exception hierarchy: “that way you’ll always have an up-to-date list.”

    Yikes. I’m not at all keen on an exception hierarchy that’s so sensitive to Ruby version/release.

  16. Paul Prescod
    January 31st, 2009 at 17:56 | #16

    @josh: “I guess the point is, why commit when you don’t have to? I haven’t studied Ruby’s charset/encoding support in depth, but it appears to demonstrate that you don’t have to.”

    You’ve got two different engineering decisions made. One is more flexible and yet many extremely knowledgeable software developers make the other choice. One system has been in widespread use for about a decade. The other has had its first production release a week ago. I’d suggest you reserve judgement until you have some data.

    For example, can you answer these questions?

    how do the APIs compare in complexity and power? For example, are operations between strings from different encoding sources predictable? Is the regular expression engine equally powerful for all encodings?

    how does the performance compare? For example, how fast is it to find the last character in a string, or slice a chunk based on indexes at the beginning and end? How is regular expression performance?

    I mean sure: if there is NO COST then it’s great to have more options. But when did you ever see an engineering decision where one option gave you more flexibility at NO COST and yet competent engineers consistently (with one or two exceptions) make the opposite choice?

  17. josh
    January 31st, 2009 at 18:20 | #17

    @Paul: You suggest that I reserve judgment and yet you appear to have made up your mind.

    I don’t think there’s any requirement that the regular expression engine be “equally powerful for all encodings.” If it’s just as powerful for Unicode as Python is (which I suspect it is — I’m sure all regular expression implementers who want good Unicode support are reading this these days), but less powerful for other encodings, it still has Python beat.

    Without having looked at a single line of any of these implementations, I feel fairly confident in claiming that the performance difference is roughly one indirect branch per operation you want to perform in time, and one byte per string of space. Because the implementation is: store the encoding as 1-byte integer into a table of encodings, and then dispatch the operations by doing a table lookup into the table of encodings and branch to the appropriate implementation for the string you’re currently manipulating/inspecting.

    “I mean sure: if there is NO COST then it’s great to have more options. But when did you ever see an engineering decision where one option gave you more flexibility at NO COST and yet competent engineers consistently (with one or two exceptions) make the opposite choice?”

    I think that western engineers are very prone to treating text encoding as a problem solved completely by Unicode. Which I will agree it mostly is, but I think it fits perfectly into the engineer mentality to subscribe to a solution that very much looks as though it has exhaustively solved a particular problem.

    I think your best point is the one about complexity/predictability when it comes to using the APIs. But I somehow doubt that it will add much more complexity than the text/bytes distinction that both languages already had to introduce to support encodings properly. I will be interested to get more experience on this point however.

  18. Isaac Gouy
    January 31st, 2009 at 18:21 | #18

    josh > Yeah, the benchmarks are a bit old, but…
    Shiny new – Python 3 , Ruby 1.9

  19. thedarky
    January 31st, 2009 at 19:10 | #19

    The reason why the Ruby’s core documentation is so lacking or in some cases simply nonexistent has nothing to do with Ruby’s culture and everything to do with the fact that AFAIK nobody from the Japanese Ruby core group speaks English very well. I would even go as far to as to say that their English is very bad. It is *not* their fault and there is very little that they can reasonably do about it.

    Now I’d guess that the majority of developers have no idea how difficult and time consuming it is to write in one’s non native language (in case you wouldn’t have guessed: it is really, really hard). The Ruby core dev team has basically two options: write docs in Japanese or write almost nothing at all. Since writing documentation in Japanese would be of little use for the Ruby community as a whole (I’d also guess the core guys would consider this to be very rude/unfair to the rest of us), what logically follows is what we have now: no docs for Ruby.

    Everybody knows that Ruby documentation sucks, the problems is that almost nobody seems to understand *why* it so. The problem is also amplified by the fact that the core devs can’t solve the problem by themselves are to shy/modest to ask for help from the western community.

    Does anybody else know of any other platform that is/has been faced by the same language barrier/difficulty? Have they managed to solve it somehow?

  20. gmoney
    January 31st, 2009 at 19:29 | #20

    wow. troll fodder is indeed the main ruby side effect.

  21. anon
    January 31st, 2009 at 20:57 | #21

    “That’s not it. It goes much deeper. Python has always had a a very well thought out design and documentation.”

    Well, there’s something to be said about “Design by Committee”. A more meticulous, thoroughly documented design process doesn’t necessarily mean a better product.

    Still, I think both Python and Ruby are great. Ruby seriously lacks in documentation. I found this extremely frustrating when I first picked up Ruby 4 years ago. I think it’s gotten a lot better since, but a good book (like The Ruby Programming Language) is still needed. I actually feel the pickaxe was debilitating to a lot of people, since it didn’t do a good job of explaining how to use Ruby’s best features. I fear that it will be a while before we get a book that does a good job of Ruby 1.9 features.

    It would be awesome if some intrepid western developer would spend the time to write thorough, free documentation like Python has.

  22. Paul Prescod
    January 31st, 2009 at 22:03 | #22

    josh: “You suggest that I reserve judgment and yet you appear to have made up your mind.”

    No, I haven’t made up my mind. I would love to hear that Ruby has solved the various design challenges with no trade-offs.

    But I’ve been around the discussion of the implementation of multi-lingualization in many languages and know that the idea of going beyond Unicode is frequently embraced until the cost in complexity and performance is considered. As such, I’m skeptical that Ruby has found the magic wand to sweep those costs aside. Sometime next year it will all be documented and benchmarked and we’ll know.

    But just to prove my point, try running this program in Ruby 1.8 and Ruby 1.9.1.

    # encoding: utf-8

    data = (”ひらがな平仮名” * 10**5)
    for i in 1..data.size do
    data[-i]
    end

    On my computer it ran in less than a second in Ruby 1.8 and Python (the equivalent) but I had to stop Ruby 1.9 after 23 minutes. I got bored waiting.

    I thought I would help Ruby out by converting to a different encoding like UTF-16 or UCS-4. But no matter what I tried, I got an error message:

    $ ruby19 -e “‘abc’.encode(’UTF-16′)”
    -e:1:in `encode’: code converter not found (UTF-8 to UTF-16) (Encoding::ConverterNotFoundError)
    from -e:1:in `’

    I had to keep checking that I was really using the “production quality” Ruby 1.9.1 release. It’s kind of weird that this doesn’t work. No encoder from UTF-8 to UTF-16? Really?

    So in 30 minutes of testing I ran into both of the problems I thought might arise: poor performance and confusing and complicated API.

    Even so, I’m not going to write off the Ruby scheme. I will say that it’s experimental and thus riskier than Python’s technique which is roughly the same as Java’s and Javascript etc. But maybe these pathological cases won’t occur in real code. Or maybe they will, and they’ll fix them in Ruby 1.9.2. Either way, the situation is not nearly as simple as you make it out. It’s not just a virtual method table per encoding.

    There are tough engineering tradeoffs and Ruby’s way has yet to prove itself as production-ready much less clearly superior.

  23. josh
    January 31st, 2009 at 22:50 | #23

    @Paul: I LOVE that you brought up specific arguments — seriously. This discussion just got some actual substance.

    I also like your examples. With regard to your first example, if it’s taking that long (and I could replicate the experiment) then it’s clearly counting from the beginning each time, which will clearly be inefficient. But that’s to be expected because it’s UTF-8.

    With regard to your attempt to help Ruby out, it appears from reading my 1.9 Pickaxe book (which you couldn’t have done because I’m guessing you haven’t bought it) that the reason for the error is that Ruby wants you to specify little-endian or big endian. Again, this makes sense if you think about it a bit, because Ruby’s exposing the fact that internally, it’s always storing the data as a set of ordered bytes. So you’ll get more luck if you do this:

    ruby19 -e “‘abc’.encode(’UTF-16LE′)”

    You can see a list of all the encodings that are loaded by default by doing this:

    ./ruby -e ‘Encoding.list.each {|enc| puts enc.name}’

    When I replicate your experiment with it encoded as UTF-32LE, it runs in 50 seconds on my machine. Which is clearly sub-optimal — the equivalent program using ASCII on Ruby 1.8 takes under a second on my machine. Not sure why the performance disparity — there’s certainly nothing preventing the implementation I described, which would make every encoding-specific operation about as expensive as a C virtual function call. Hopefully they will optimize this in the future.

    So you’ve demonstrated that the Ruby 1.9 implementation of multiple encodings is not currently very efficient. I don’t think you’ve demonstrated that it’s confusing (though it still might be, I don’t have the experience to say).

    But one performance-related characteristic of the Python approach is that you always pay the cost to transcode into Unicode first. So take a program that does nothing but count the number of characters in a file. Python will always run the entire file through a transcoder first, then perform the len() operation on Unicode internally. Using the Ruby model, it could conceivably perform the least amount of work possible — nothing but an algorithm optimized to do len() on the input byte data directly, for whatever encoding the input is in. The cost to read data from the outside world into a string is essentially a memcpy (unless you want to do validation up-front). With Python you always pay a transcode up-front, unless your data is already in Python’s internal format (UCS-2? UCS-4? UTF-8? Some mixture? I don’t actually know what Python does here, and would be interested for more info.)

    So I think that in the end, Ruby’s approach actually has *greater* performance potential, though I can’t vouch for whether it’s currently optimized very well.

  24. January 31st, 2009 at 23:55 | #24

    You seem to think that nine years of development is too long, Josh. Tell us, how long should have Parrot Perl 6 specification Rakudo have taken? On what do you base your estimates? How exactly is Perl 6 “too big and too complicated”? What is complicated? What should be removed to reduce complexity?

  25. James
    February 1st, 2009 at 00:18 | #25

    I don’t see why writing docs in Japanese is a problem – floss has a huge volunteer translation effort, although mostly in English->other languages, but the reverse does happen.

    I think the encoding of a string is the wrong flag to store. You should be storing the language of the string instead, as that’s what matters. Conflating the encoding to mean the language is the problem here.

    I’ll note the top two results for [han unification ruby] both reject the notion: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/18730 http://www.orbeon.com/blog/2006/07/12/ruby-not-all-rosy/

  26. February 1st, 2009 at 04:35 | #26

    I left Ruby and Rails for Python and Django about six months ago. I loved Ruby syntax but I had problems stemming from lack of documentation, weird defaults, and not so great libraries. I had problems getting things like timezones to work properly — stuff that I did in PHP with no problems.

    I found that Python was made for human beings. I think it’s because of Guido’s emphasis on software engineering and making tools for programmers. Python is not difficult to understand, it’s very consistent and does what I think it will do. The quality of its libraries and documentation are very high. I’m not a Python fanboy or anything; I’d ditch it as soon as it started to suck. I’m not a genius programmer. I use Python because it works for me.

    When I have time I work on a basic site in Japanese (http://momhawaii.org) and I have no problems with strange characters anywhere. UTF8 might suck at presenting the Japanese language (I’ve had problems before) but with Python and Django I get the characters that I put in. It doesn’t seem to lose any bits of data.

    Use whatever works for you. Everything has its pros and cons.

    FYI Django Setup
    Make sure your templates are in shift-jis encoding in your text editor. (I use Komodo Edit) Add
    FILE_CHARSET=’shift_jis’ and DEFAULT_CHARSET=’shift_jis’ to settings.py.
    HTML META Tag

  27. Deepak Kannan
    February 1st, 2009 at 04:55 | #27

    take a look at:
    http://eigenclass.org/hiki.rb?cmd=view&p=Changes in Ruby 1.9&key=Ruby

  28. February 1st, 2009 at 06:21 | #28

    The benchmarks you reference are quite old. For a recent comparison between Ruby VMs, take a look at the shootout I ran in December.

  29. zorg
    February 1st, 2009 at 06:41 | #29

    I have dabbled with both Ruby and Python, and personally I really cannot see what Ruby has over Python

    the languages offer similar features (dynamic types, repl, first class functions and closures, OOP…) but …

    Python is faster, has a great documentation and a huge amount of libraries

  30. Paul Prescod
    February 1st, 2009 at 12:17 | #30

    @Josh: when I build ruby-1.9.1-p0, I get only three encodings:

    ASCII-8BIT
    UTF-8
    US-ASCII

    I would have expected at least the Unicode encodings to be built by default but it seems not.

    With respect to programmer clarity. It is now the case that a Ruby programmer must know the details of Ruby’s implementation of a particular encoding to predict how their program will perform. In some cases, operations will be O(1) and in others O(N), depending on how clever the encoding implementer is. I know from experience that this is the kind of thing that Guido avoids like the plague — Python explicitly rejected some features of its unofficial predecessor language (ABC) because of some issues like that.

    Whether or not you agree with his position, I think it is fair to say that the Ruby way is not clearly better. We’re talking about spending MULTIPLE MINUTES to process less than 100K of data. A programmer could EASILY embed that in a program and the first time it happens to load a non-ASCII file, the performance would grind to a halt. Like “tie up the Web server for an eternity” grind to a halt.

    With respect to “lazy decoding”. Decoding is seldom a bottleneck in a data processing program. The data usually has to come from somewhere before it is decoded and often has to go somewhere else after it is re-encoded.

    In any case, I would not dispute (and have never disputed) that Ruby’s strategy is superior in some situations. I merely dispute that it is “better” in some platonic sense. It’s possible that a year from now we might have enough information to select one as “better in most circumstances”. Or, perhaps it will never be clear.

    BTW: your blog’s “preview” does not seem to actually preview. The post looks different in the preview and in the blog.

  31. Paul Prescod
    February 1st, 2009 at 14:05 | #31

    Curious: if I have two strings: one is UTF-8 and one is Shift-JIS string. I ask Ruby if they match.

    Which string is re-encoded? It matters for performance reasons.

    What if I do this matching over and over again in a loop with one of the strings varying. Does the unvarying one get re-encoded over and over again?

    Is it consistent across operators which string will be re-encoded? Or should I just encode everything to a neutral encoding “to be safe”?

    Furthermore, let’s say that a new character is standardized in a Japanese-only standard and also in Unicode and in (let’s say) two more character sets.

    How can I inform Ruby that they are really the same character? Do I have to modify the source code of four codec classes? And do I need to define 12 mappings?

  32. Leonard Chin
    February 1st, 2009 at 21:37 | #32

    You may be interested to know that there *is* an official, comprehensive documentation project for Ruby in Japanese.

    Project Home: http://doc.loveruby.net/
    Usable mirror: http://doc.okkez.net/

    1.8.x should be fully covered, though 1.9.1 is a work in progress. The project doesn’t yet have sufficient resources to do a ja -> en translation.

  33. dude
    February 2nd, 2009 at 08:36 | #33

    I would have stuck with Python were it not for the hideous syntax… I mean, __len__? Really? :’s? Really? I just can’t stomach it, I feel like I’m writing BASIC or something.

    Ruby is like grown up Python with a few issues to work through. It needs some work on the VM, primarily. Documentation I don’t find so useful, never really find the need to refer to it.

  34. Catherine
    February 2nd, 2009 at 11:19 | #34

    @thedarky: I would much rather see extensive Japanese documentation of Ruby, with volunteer translations in English, than no docs at all. Sure, documentation in any language is time-consuming, but it should be part of a “professional” release.

    If the core Ruby dev team’s problem really is lack of English fluency (and I have no reason to doubt you), then I would prefer they just wrote their docs in Japanese.

  35. Isaac Gouy
    February 2nd, 2009 at 11:32 | #35

    @Antonio Cangiano: The benchmarks you reference are quite old. For a recent comparison between Ruby VMs, take a look at the shootout I ran in December.

    The benchmarks josh referenced are 6 months old.

    For the recent (31 Jan 2009) comparison take a look at these Ruby 1.9 :)

    For a comprehensive comparison of Ruby implementations take a look at the Ruby shootout.

  36. February 22nd, 2009 at 09:00 | #36

    Hi I agree totally about the original comments of lack of ruby documentation/specification/reference document. It has been so frustrating to learn this language.

    when you go the ruby doc site, to download the core ref, you have to literally look around to find a downloadable ref doc, for offline perusal. Yes, there are still millions of people around the world, who don’t have unlimited broadband connections to the internet !!!

    compare that with the other languages like PHP , etc..

    there’s always that extra tedious steps you have to go through to find more about the language.

    It really feels amateurish in terms of community or “existing infrastructure” when compared with the other “open source” languages.

    My belief is that without a drastic change in the culture and philosophy of the ruby community, ruby will become a marginal language some years down the road.

    some time back,I did post some comments at the following site :-

    http://www.sitepoint.com/blogs/2009/01/22/is-rubys-popularity-fading/

  37. rubyist_to_be_or_not_to_be
    February 22nd, 2009 at 09:12 | #37

    Hi I agree totally about the original comments of lack of ruby documentation/specification/reference document. It has been so frustrating to learn this language.

    when you go the ruby doc site, to download the core ref, you have to literally look around to find a downloadable ref doc, for offline perusal. Yes, there are still millions of people around the world, who don’t have unlimited broadband connections to the internet !!!

    compare that with the other languages like PHP , etc..

    there’s always that extra tedious steps you have to go through to find more about the language.

    It really feels amateurish in terms of community or “existing infrastructure” when compared with the other “open source” languages.

    My belief is that without a drastic change in the culture and philosophy of the ruby community, ruby will become a marginal language some years down the road.

    some time back,I did post some comments at the following site :-

    http://www.sitepoint.com/blogs/2009/01/22/is-rubys-popularity-fading/

    have moved to PHP , found it is a lot easier to do what I want to do with it and also easier to find information about the language. which equates with less time wasted on the non productive part and more time focusing on my goals.

  38. Nathan Youngman
    April 13th, 2009 at 01:13 | #38

    @Josh I’ve still been debating Python vs. Ruby, heh.

    re: “Python will always run the entire file through a transcoder first…”

    There is a byte type in Python 3, so if you just wanted to grab the length or whatnot, you could use that and skip the transcoder.

  39. September 2nd, 2009 at 07:07 | #39

    I feel doing an English doc project for Ruby from scratch, would work out much better than translating all docs in ja > en.

  1. January 31st, 2009 at 19:04 | #1
  2. February 1st, 2009 at 11:13 | #2
  3. April 10th, 2009 at 15:53 | #3