The challenge of Brahmic scripts
August 22, 2019
In this introduction to a series of articles on creating fonts for Brahmic scripts, we look at the features of Brahmic scripts that present some challenges in font development.
Brahmic scripts are a large family of scripts that are descendants of the ancient Brahmi script, which originated in India around 250 BC. Most of the scripts of India are part of the family: Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Tamil, Telugu, and numerous others. In neighboring countries, Sinhala and Tibetan are major additional Brahmic scripts. In South-East Asia, Khmer, Lao, Myanmar, and Thai are used in everyday life, while Balinese, Javanese, and numerous others find occasional use. Finally, the Siddham script spread with Buddhism all the way to Japan.
Altogether, Brahmic scripts are the primary writing systems for the native languages of some 1½ billion people. Unfortunately, illiteracy is a problem in some of the countries using them, so the number of actual users of the scripts is somewhat lower.
Brahmic scripts have some characteristics that can make it a little difficult to develop fonts for them. We’ll look at these characteristics using a made-up font that shows them in a generic way. This font relies on fairly recent rendering technology – if your operating system and browser were released in late 2016 or later, it should display as intended; if they’re older, it may be time to upgrade.
At the core of a Brahmic script are consonants that have inherent vowels, default vowels that don’t need to be written, usually in the range from /ɑ/ to /ɔ/ or /ə/. Examples, with glyphs from our generic font, are က ka, သ sa, တ ta, ရ ra. To give the consonant a different vowel, or a long vowel, a vowel mark or matra is added, which can go on any side of the consonant, for example သု su, သာ sā, သီ sī, or သေ se. There are even split matras, which have two or three parts, such as သော so. The ones where the vowel, which phonetically follows the consonant, shows up on the left of the consonant, or consists of multiple parts including one on the left, can be particularly challenging for font developers.
Cases where a line or word ends with a vowel-less consonant are handled in one of two ways:
- A special mark, a vowel killer or virama, is added to the consonant: ஸ் s.
- The omission of the vowel isn’t indicated at all and has to be deduced from the context: ᨔ s or sa.
For cases where the inherent vowel should be omitted because two consonants follow each other without intervening vowel, Brahmic scripts use several different mechanisms:
- A special mark, a vowel killer or virama, is inserted between the consonants: ஸ்த sta.
- The first consonant is written in a reduced half-form: স্ত sta.
- The first and second consonant merge into a conjunct: স্ত sta.
- The second consonant is written below the first one as a (often reduced) conjunct form or subjoined consonant: သ္တ sta.
- The second consonant is written after the first one as a (often reduced) conjunct form: ᬓ᭄ᬲ ksa.
- The first consonant (most commonly ra) is written in a reduced reph form on top of the second: র্ত rta.
- The first consonant is omitted – whether that happened or not and what has been omitted has to be deduced from the context: ᨈ nta or tta or ta.
- The omission of the vowel isn’t indicated at all and has to be deduced from the context: ᨔᨈ sta or sata.
Use of these mechanisms varies widely between scripts: Devanagari has a large number of conjuncts, especially for use when writing Sanskrit, while many other scripts have none. Tamil relies almost entirely on its virama, the pulli. Myanmar extends the reph idea to several consonants beyond ra; the primary use is for nga. Javanese uses conjunct forms and a virama, while Buginese can be somewhat ambiguous due to its reliance on the last two mechanisms.
In addition to consonants and vowels, text in Brahmic scripts can contain a number of other marks, such as anusvara or chandrabindu for nasalized sounds, nukta to represent sounds from other languages, visarga for final /h/, and tone marks for tonal languages in Southeast Asia. Such marks can occur above, below, or after the base consonant. Scripts may have special characters for medial consonants ya, wa, ra, or others. Medial ra in several scripts wraps around the base consonant, as in သြ sra.
Most Brahmic scripts also have independent vowels, which can be used when a syllable starts with a vowel rather than a consonant. It’s not unusual to attach other marks to such vowels.
Overall this means that text in a Brahmic script can’t be treated as a simple sequence of characters that flows in a single direction. Instead, it has to be treated as a sequence of clusters, each of which is a two-dimensional arrangement of glyphs. A cluster consists of a base, which could be a consonant, a conjunct, or an independent vowel, and various marks attached to it, including conjunct forms, reph forms, vowel marks, and other marks. Complex clusters such as သ္တြော stro are common. Clusters in many cases represent syllables, but this is not assured – for example, final consonants of syllables may become separate clusters or even the base of a cluster that integrates the beginning of the following syllable.
Sequences of clusters usually flow from left to right. Spacing them horizontally is not always easy though: Because of below-base conjunct forms, groups of above-base marks written side-by-side, or other marks that are wider than the bases they attach to, clusters often need additional spacing to avoid collisions between above- or below-base marks.
Traditionally, the bases of clusters in many Brahmic scripts were thought of as hanging from a top line. In Devanagari, Bengali, Gurmukhi, and Tibetan this top line is clearly visible as part of the base glyphs; in other scripts it may show in auxiliary lines in manuscripts, as in the Javanese manuscript below. Beneath the base may be one, two, or occasionally more subjoined consonants and other marks; above it one or sometimes two marks. This vertical stacking of glyphs means that some scripts need significant vertical space for each line – in manuscripts for such scripts it’s not unusual that the total line height is three times the height of typical base glyphs. This can cause difficulties when text in a Brahmic script is combined with text in a more linear script: Either the line height is set for the Brahmic script, leaving lots of unused space around the text in the other script, or it is set for the other script, and severe workarounds may be necessary to squeeze text in the Brahmic script into inadequate space.
All these issues can make the development of fonts for Brahmic scripts quite challenging.
I’d like to thank Muthu Nedumaran and Menasse Zaudou for their feedback on a draft of this article.
Peter T. Daniels, William Bright (eds.): The world’s writing systems. Oxford University Press, 1996. Sections 30-45 cover the major Brahmic scripts.
The Unicode Consortium: The Unicode Standard, Version 12.0. The Unicode Consortium, 2019. Chapters 12–17 describe the 62 Brahmic scripts that are currently encoded (and a few non-Brahmic scripts). Chapter 12: Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam. Chapter 13: Sinhala, Newa, Tibetan, Limbu, Meetei Mayek, Chakma, Lepcha, Saurashtra, Masaram Gondi, Gunjala Gondi. Chapter 14: Brahmi, Bhaiksuki, Phags-pa (the only Brahmic script written from top to bottom), Marchen, Zanabazar Square, Soyombo. Chapter 15: Syloti Nagri, Kaithi, Sharada, Takri, Siddham, Mahajani, Khojki, Khudawadi, Multani, Tirhuta, Modi, Nandinagari, Grantha, Ahom, Sora Sompeng, Dogra. Chapter 16: Thai, Lao, Myanmar, Khmer, Tai Le, New Tai Lue, Tai Tham, Tai Viet, Kayah Li, Cham. Chapter 17: Tagalog, Hanunóo, Buhid, Tagbanwa, Buginese, Balinese, Javanese, Rejang, Batak, Sundanese, Makasar.
Ethnologue. SIL International, 2019. Includes the numbers of native speakers of languages that are normally written in Brahmic scripts. The largest populations, rounded to millions: Hindi 341, Bengali 228, Marathi 83, Telugu 82, Tamil 75, Gujarati 56, Bhojpuri 52, Kannada 44, Malayalam 37, Oriya 37, Maithili 34, Burmese 33, Eastern Punjabi 33, Magahi 21, Thai 21, Marwari 21, Nepali 17, Khmer 17, Chhattisgarhi 16, Assamese 15, Sinhala 15, Chittagonian 13, Sylheti 10.
J. Noorduyn: Variation in the Bugis/Makasarese script. In: Bijdragen tot de Taal-, Land- en Volkenkunde, Manuscripts of Indonesia 149 (1993). Mentions that Buginese generally does not express consonant gemination or syllable-final consonants.
John Hudson: Problems of adjacency. 2014. Includes a discussion of spacing issues in Telugu.
John Hudson: Constrained. Unconstrained. Variable. 2018. A discussion of the issues of fitting fonts for Indic scripts into pre-defined user interfaces.
Annabel Teh Gallop: Javanese manuscript art: Serat Selarasa. British Library, 2014.