Tuesday, December 21, 2010

Enter the nanojit

One of Mozilla's marquee features in Firefox 4 is ever-faster JavaScript. Of course, the real speed gains are only to be had on their officially supported architectures, which now only includes x86, x86_64 and ARM, because of modern JavaScript interpreters' heavy dependence on just-in-time compilation. (See also Google's work on this for Chrome, which naturally is also Intel and ARM ISAs only.) Since PowerPC is the old and busted and x86 is the old yet new hotness, x86 gets the goodies. This doesn't mean that the pure-C++ version of Fx 4's JavaScript interpreter is a slouch (and Fx 4 is faster than Fx 3.6, plus our own compiler juicing, of course), and there is JIT support in Firefox for other architectures such as SPARC even though Mozilla does not consider SPARC a tier 1 platform. Which brings us to PPC JIT.

Some terminology: the pure C (and now C++) JavaScript interpreter is properly called SpiderMonkey and descends directly from Brendan Eich's work when he was at Netscape. All Mozilla derivatives use it; Classilla uses the last C version which was in Firefox 3.0.19, and so does Camino, while SeaMonkey and Firefox use the C++ version which was introduced in 3.5 with TraceMonkey. TraceMonkey's key addition to SpiderMonkey is a clever piece of code called the nanojit, which watches execution through code paths and remembers the "trace" (hence the name). That trace is in fact generated as direct machine code, and run as such, so it is very fast. However, the nanojit is unable to effectively optimize certain types of code, notably code with evals in it and code with many type combinations, and it must run the program to a certain extent to determine what code is "hot" (so very branchy code causes the tracing JIT to give up, because it would use too much memory and CPU time to figure out where the hotness lays). To handle those cases, Mozilla added JägerMonkey to Firefox 4.0, which is a true method JIT, borrowing the assembler that powers it (Nitro) from the enemy WebKit. Naturally, it is also limited to the currently supported architectures.

So that's our year in review; now for the diplomatic cables. If you look in the nanojit, you see tons of files for lots of architectures, including x86 (of course), ARM, SPARC, even MIPS, and of course PPC. Great, you say, the nanojit in Firefox seems to generate PowerPC code. Then you look at the configuration scripts and your heart sinks:

PowerPC nanojit doesn't get compiled or included in Firefox 3.5 or 3.6.

But wait, I hear you cry! The code is there, right? What's NativePPC.cpp for? Well, it's truly part of the nanojit, but it is incomplete. While it works as part of Tamarin, which is Adobe's implementation of ActionScript (and is indeed part of PPC Flash Player), it does not support enough instructions to work as part of Firefox or in fact any Gecko browser, and this is why your red-hot quad 2.5GHz G5 still benches slower on SunSpider than a measly Pentium 4: the P4 gets native code, we don't.

Well, that's about to change. With big thanks to Edwin Smith at Adobe who put up with my constant queries, I'm proud to announce that we're making progress on doing the obvious: expanding the PowerPC nanojit so it can be used as part of TenFourFox. Ed helped me out with some of the fiddly aspects, and we're now full speed ahead on implementing the LIR instructions needed to fully support the length and breadth of JavaScript that the Mozilla framework requires. Already it is passing most of the LIR assembler tests, and when it passes all of the supported ones, I'll start work on integrating it with the main browser. At the end of this, we'll pass our work back to Mozilla so it can be a part of Firefox, if they want it.

When will you get to play with the new hotness, and how much faster will it get? Conservatively, you can expect up to 2-3x improvement on many JavaScript benchmarks, possibly more. This is still not enough to reach x86 parity because we don't implement the method JIT (yet: more on that later when that work actually commences, if it's possible -- TraceMonkey depends heavily on Nitro, so we'll have to see how feasible such a port would actually be). However, it is still dramatically faster and wrings that much more performance out of our beloved machines. More to the point, it will work on any supported TenFourFox processor architecture.

However, when you'll get it is another matter. While this might be ready for beta 9 or 10, such a massive sea change requires heavy testing and may have subtle bugs, so it may sit 4.0 out and appear in 4.1, or it may appear in a later beta. I'll post more about this later when I have enough testing data to decide, but I've got enough working code already that I couldn't resist a little tease for y'all. Great things are afoot!

1 comment:

  1. There's an effort started to get these changes in the tree, see bug 621175 ( https://bugzilla.mozilla.org/show_bug.cgi?id=621175 )

    Cheers.

    ReplyDelete

Due to an increased frequency of spam, comments are now subject to moderation.