Implementation Options for ECMAScript Internationalization API in SpiderMonkey

Norbert Lindenberg, 2013-01-07

What’s the ECMAScript Internationalization API?

The API is defined by standard ECMA-402, developed by Ecma TC39, the group that also maintains the ECMAScript Language Specification. Mozilla is a very active participant in TC39 (Brendan Eich, Allen Wirfs-Brock, David Herman), and sponsored development of the Internationalization API (Norbert Lindenberg).

The API provides JavaScript applications with core internationalization features to let them better support the user’s language and culture and provide a more consistent localized experience for their users. Until now, it has been very difficult for web application designers to do something as simple as sort names correctly according to the user’s language. The new standard changes this: It provides string comparison for sorting, number and currency formatting (such as “1.234,56 €” for German), and date and time formatting capabilities (such as 2012年12月12日 for Japanese).

An important aspect of the API is that applications can choose the language, and are not bound to the localization of the browser or the operating system (implementations of the API determines which languages are supported). This addresses the large number of multilingual users who go to web sites in different languages, or users who have to use browsers that aren’t localized for their native language. In addition, the API lets applications tailor the results to their specific needs, e.g., specify the currency with which numbers are displayed, select the date-time components used in a date format, or ignore punctuation in sorting.

Google Chrome is the first browser to ship with an implementation of the API – it’s prefixed in Chrome 23, unprefixed in Chrome 24 (currently beta). Microsoft demoed an implementation at the Unicode Conference in October 2012, but hasn’t announced release plans yet. Plans for Safari and Opera are unknown.

Implementation based on bundled ICU

The solution that’s easiest to implement and provides the highest level of functionality is using ICU and bundling it with the Firefox download. ICU (International Components for Unicode) is a comprehensive open-source internationalization library supported primarily by IBM, Google, and Apple. An ICU-based implementation of the ECMAScript Internationalization API for Firefox is under development, a current build for Mac OS X is available.

Issues with bundling ICU

Concerns have been raised about the increase in download size, the increase in mass storage size, and the increase in RAM use caused by ICU.

Download size is seen as a problem for user acquisition, as users may cancel a download that takes too long. Mozilla doesn’t appear to have numbers though about the correlation between download sizes and cancellations, and after going through the Firefox download experience on Windows and comparing it to the Chrome download experience, I suspect that the number of security warnings, dialogs, and cancel buttons on the way might also have a significant impact.

RAM use could be a problem on Firefox OS and Android, which have to run with very limited memory. However, operating systems typically don’t load complete libraries into RAM; they page them in as needed (possibly preloading some proactively). For desktop systems RAM use was not seen as a big issue (Justin Lebar)

Mass storage size in itself wasn’t seen as a serious issue; it’s the easiest one to measure however and can serve as a proxy for download size (the compression ratio seems to be about 3:1).

Steps taken to mitigate the issues

The following steps to reduce mass storage and download size have already been implemented (size numbers are for Mac OS X):

With these steps, the increase in download size for Mac OS X (which still includes each library twice) is 6.7 MB (from 47.4 MB to 54.1 MB).

Possible additional steps

The following steps could be taken to reduce the size further, but involve either product trade-offs or major engineering effort:

Steps proposed that don’t help

A few steps have been proposed, but will not help:

Implementation based on OS support

The ECMAScript Internationalization API can also be implemented on top of the internationalization APIs provided by the operating system. Some operating systems include ICU; for others an adaptation to other APIs is necessary.

Using OS implementations of ICU

The following caveats apply whenever ICU is used as a system library:

Firefox OS

Someone mentioned that the B2G sources include ICU, and after downloading all 9271 MB of those sources I can confirm: it’s there. I haven’t seen an actual build yet, but given that any OS needs some internationalization support, I assume ICU is actually used. It’s a somewhat old version, 4.6 from December 2010, but not too old for our purposes. Since Firefox OS is targeting low end smartphones, it’s most important here to not waste resources, so we should use what’s there. SpiderMonkey is part of the OS here, so it might be possible to use C++ APIs. We should look into upgrading to a more modern version though.

Android

Android includes ICU, but only for use by system applications. Mozilla could ask Google to add ICU to the NDK to make it available for downloaded Firefox. I hear Adobe is interested in this as well, and there’s a chance it could happen. On the other hand, Mozilla also wants OEMs and carriers to ship Firefox on devices. In that case, it may be possible to treat Firefox as a system application and give it access to ICU even without changes to the NDK.

Windows

Windows provides it’s own internationalization API, unrelated to ICU. The ECMAScript Internationalization API was designed to accommodate weaknesses in the Windows API, but building on it will lead to some loss in functionality compared to ICU: No way to implement full time zone support; calendars limited to the traditional calendar for each locale plus Gregorian – no Islamic calendar combined with English; minimal set of supported date and time formats. It may also be difficult to obtain all the information about supported functionality that the ECMAScript Internationalization API requires.

Mac OS X

Mac OS X uses ICU internally, but its interfaces aren’t provided, indicating it’s not for applications to use. It’s technically possible to link against the library, but risky. In Mac OS 10.7 the library is in /usr/lib, not in /usr/library as stated in the article. The recommended internationalization APIs for Mac OS X (UCCreateCollator, NSNumberFormatter, CFNumberFormatter, NSDateFormatter, CFDateFormatter) don’t seem to support some of the options specified in the ECMAScript Internationalization API, such as numeric sorting or selection of the numbering system.

Desktop Linux

Desktop Linux distributions often include ICU, and some RPMs are available. Versions may be rather old though, for example CentOS 6.3 (released July 2012) includes ICU 4.2.1 (released July 2009), and they may be unsuitable for application use because of function renaming (as they are in CentOS).

Implementation in Google Chrome

Google has decided to go with bundling ICU for all platforms. Dave Mandelin has collected some data:

The above certainly suggests to me that adding ICU is not a big deal. The biggest risk is probably still the increased download size. The Firefox stub installer, still in progress IIUC, will probably make download size matter less.

Tentative recommendations

Given available information, I recommend using different solutions for different platforms: