Alphabets In Different Languages
Smashing Newsletter
A spoken language may use one or more different writing scripts (or writing systems), which can be based on an alphabet of letters. Different languages may. Of the dozens of alphabets in use today, the most popular is the Latin alphabet, which was derived from the Greek, and which many languages modify by adding letters formed using diacritical marks. While most alphabets have letters composed of lines ( linear writing ), there are also exceptions such as the alphabets used in Braille. How many letters are there in the German alphabet? Despite some formatting differences and the presence of the umlaut, which has different effects on pronunciation depending on where you put it, the German alphabet has the 26 letters that English speakers are used to. Of course, learning an alphabet and learning a language are two different things.
Upgrade your inbox and get our editors’ picks 2× a month. Earlier issues.
The beauty of typography has no borders. While most of us work with the familiar Latin alphabet, international projects usually require quite extensive knowledge about less familiar writing systems from around the world. The aesthetics and structure of such designs can be strongly related to the shape and legibility of the letterforms, so learning about international writing systems will certainly help you create more attractive and engaging Web designs.
Pick any language you like: Arabic, Chinese, Japanese, maybe Nepali? Each is based on a different writing system, which makes it interesting to figure out how they work. Today, we’ll cover five categories of writing systems. This may sound tedious and academic, but it’s not. If you take the time to understand them, you’ll find that they all give us something special. We’ve tried to present at least one special feature of each language from which you can draw inspiration and apply to your own typography work. We’ll cover: East Asian writing systems, Arabic and Indic scripts (Brahmic).
You may also want to check out the following Smashing Magazine articles:
We will cover Cyrillic, Hebrew and other writing systems in the second part of this post.
East Asian Writing Systems
Obviously, the Chinese uses Chinese characters (where they are known as hanzi). But Chinese characters are also used in various forms in Japanese (where they are known as kanji) and Korean (hanja). In this section, we will look at four East Asian writing systems: Chinese, Japanese, Korean and Vietnamese.
Chinese Characters
Chinese characters are symbols that do not comprise an alphabet. This writing system, in which each character generally represents either a complete one-syllable word or a single-syllable part of a word, is called logo-syllabic. This also means that each character has its own pronunciation, and there is no way to guess it. Add to this the fact that being literate in Chinese requires memorizing about 4,000 characters and you’ve got quite a language to learn. Fortunately for us, we don’t need to learn Chinese in order to appreciate the beauty of its writing.
Because many commonly used Chinese characters have 10 to 30 strokes, certain stroke orders have been recommended to ensure speed, accuracy and legibility in composition. So, when learning a character, one has to learn the order in which it is written, and the sequence has general rules, such as: top to bottom, left to right, horizontal before vertical, middle before sides, left-falling before right-falling, outside before inside, inside before enclosing strokes.
The Eight Principles of Yong
The strokes in Chinese characters fall into eight main categories: horizontal (一), vertical (丨), left-falling (丿), right-falling (丶), rising, dot (、), hook (亅) and turning (乛, 乚, 乙, etc.). The “Eight Principles of Yong” outlines how to write these strokes, which are common in Chinese characters and can all be found in the character for “yǒng” (永, which translates as “forever” or “permanence”). It was believed that practicing these principles frequently as a budding calligrapher would ensure beauty in one’s writing.
Four Treasures of the Study
“Four treasures of the study” is an expression that refers to the brush, ink, paper and ink stone used in Chinese and other East Asian calligraphic traditions. The head of the brush can be made of the hair (or feather) of a variety of animals, including wolf, rabbit, deer, chicken, duck, goat, pig and tiger. The Chinese and Japanese also have a tradition of making a brush from the hair of a newborn, as a once-in-a-lifetime souvenir for the child.
Seal and Seal Paste
The artist usually completes their work of calligraphy by adding their seal at the very end, in red ink. The seal serves as a signature and is usually done in an old style.
Horizontal and Vertical Writing
Many East Asian scripts (such as Chinese, Japanese and Korean) can be written horizontally or vertically, because they consist mainly of disconnected syllabic units, each conforming to an imaginary square frame. Traditionally, Chinese is written in vertical columns from top to bottom; the first column on the right side of the page, and the text starting on the left.
In modern times, using a Western layout of horizontal rows running from left to right and being read from top to bottom has become more popular. Signs are particularly challenging for written Chinese, because they can be written either left to right or right to left (the latter being more of a traditional layout, with each “column” being one character high), as well as top to bottom.
Different Styles
In Chinese calligraphy, Chinese characters can be written in five major styles. These styles are intrinsic to the history of Chinese script.
Seal script is the oldest style and continues to be widely practiced, although most people today cannot read it. It is considered an ancient script, generally not used outside of calligraphy or carved seals, hence the name.
In clerical script, characters are generally “flat” in appearance. They are wider than the seal script and the modern standard script, both of which tend to be taller than wider. Some versions of clerical are square, and others are wider. Compared to seal script, forms are strikingly rectilinear; but some curvature and influence from seal script remains.
The semi-cursive script approximates normal handwriting, in which strokes and (more rarely) characters are allowed to run into one another. In writing in the semi-cursive script, the brush leaves the paper less often than with the regular script. Characters appear less angular and rounder. The characters are also bolder.
The cursive script is a fully cursive script, with drastic simplifications and ligatures, requiring specialized knowledge to be read. Entire characters may be written without lifting the brush from the paper at all, and characters frequently flow into one another. Strokes are modified or eliminated completely to facilitate smooth writing and create a beautiful abstract appearance. Characters are highly rounded and soft in appearance, with a noticeable lack of angular lines.
The regular script is one of the last major calligraphic styles to develop from a neatly written early-period semi-cursive form of clerical script. As the name suggests, this script is “regular,” with each stroke written slowly and carefully, the brush being lifted from the paper and all strokes distinct from each other.
Japanese
A rather different writing system is Japanese, which is syllabic, meaning that each symbol represents (or approximates) a syllable, combining to form words. No full-fledged script for written Japanese existed until the development of Man’yōgana (万葉仮名), an ancient writing system that employs Chinese characters to represent the Japanese language. The Japanese appropriated Kanji (derived from their Chinese readings) for their phonetic value rather than semantic value.
The modern kana systems, Hiragana and Katakana, are simplifications and systemizations of Man’yōgana. Thus, the modern Japanese writing system uses three main scripts: Kanji, which is used for nouns and stems of adjectives and verbs; Hiragana, which is used for native Japanese words and written in the highly cursive flowing sōsho style; and Katakana, which is used for foreign borrowings and was developed by Buddhist monks as a shorthand. In Japan, cursive script has traditionally been considered suitable for women and was called women’s script (女手 or onnade), while clerical style has been considered suitable for men and was called men’s script (男手 or otokode).
The three scripts are often mixed single sentences.
As we can see, the modern kana systems are simplifications of Man’yōgana. It is interesting to see how they have been simplified.
Development of hiragana from man’yōgana.
Katakana, with man’yōgana equivalents. (The segments of man’yōgana adapted into katakana are highlighted.)
Korean Squares
Korean is itself a very different writing system. It uses Hangul, a “featural” writing system. The shapes of the letters are not arbitrary but encode phonological features of the phonemes they represent.
Hangul has existed since the middle of the 15th century (approximately 1440). But tradition prevailed, and scholars continued to use Classical Chinese as the literary language, and it was not until 1945 that Hangul became popular in Korea.
Jamo (자모; 字母), or natsori (낱소리), are the units that make up the Hangul alphabet. “Ja” means letter or character, and “mo” means mother, suggesting that the jamo are the building blocks of the script. When writing out words, signs are grouped by syllables into squares. The layout of signs inside the square depends greatly on the syllable structure as well as which vowels are used.
|
|
|
|
|
|
|
|
|
We won’t get into the detailed rules, but here is an example for inspiration:
Vietnamese Rotation
The Vietnamese writing system in use today (called Chữ Quốc Ngữ) is adapted from the Latin alphabet, with some digraphs (i.e. pairs of characters used to write individual phonemes) and nine additional diacritics (accent marks) for tones and certain letters. Over the course of several centuries—from 1527, when Portuguese Christian missionaries began using the Latin alphabet to transcribe the Vietnamese language, to the early 20th century, when the French colonial administration made the Latin-based alphabet official—the Chinese character-based writing systems for Vietnamese gradually became limited to a small number of scholars and specialists.
However, the Chinese philosophy still exerts a strong influence. The stylized work above is by painter Tran Dat, who introduced a harmony between the shapes of Chinese and Vietnamese characters. If you rotate the first image 90 degrees counter-clockwise, you can make out the Vietnamese words. It is meant to be displayed vertically so that it appears as ancient Chinese text at first.
Arabic
Here we’ll explore the beauty of Arabic, which has many styles and techniques. The Arabic alphabet was developed from the Nabataean script (which was itself derived from the Aramaic script) and contains a total of 28 letter. These 28 letters come from 18 basics shapes, to which one, two or three dots are added, above or below the letter. Arabic uses a writing system that we haven’t seen yet: an abjad, which is basically an alphabet that doesn’t have any vowels—the reader must supply them.
Contextual Shaping
The shape of these letters changes depending on their position in the word (isolated, initial, medial or final). Here, for example, is the letter kaaf:
Diacritics
The Arabic script is an impure abjad, though. Short consonants and long vowels are represented by letters, but short vowels and long consonants are not generally indicated in writing. The script includes numerous diacritics, which serve to point out consonants in modern Arabic. These are nice and worth taking a look at.
Alif as a Unit of Proportion
Geometric principles and rules of proportion play an essential role in Arabic calligraphy. They govern the first letter of the alphabet, the alif, which is basically a straight vertical stroke.
- The height of the alif varies from 3 to 12 dots, depending on the calligrapher and style of script.
- The width of the alif (the dot) is a square impression formed by pressing the tip of the reed pen to paper. Its appearance depends on how the pen was cut and the pressure exerted by the fingers.
- The imaginary circle, which uses alif as its diameter, is a circle within which all Arabic letters could fit.
Different Styles
Arabic script has many different styles—over 100 in fact. But there are six primary styles, which can generally be distinguished as being either geometric (basically Kufic and its variations) and cursive (Naskh, Ruq’ah, Thuluth, etc.).
Kufi (or Kufic) is noted for its proportional measurements, angularity and squareness.
Tuluth means “one third,” referring to the proportion of the pen relative to an earlier style called Tumaar. It is notable for its cursive letters and use as an ornamental script.
Nasakh, meaning “copy,” is one of the earliest scripts with a comprehensive system of proportion. It is notable for its clarity for reading and writing and was used to copy the Qur’an.
Ta’liq means “hanging,” in reference to the shape of the letters. It is a cursive script developed by the Persians in the early part of the 9th century AD. It is also called Farsi (or Persian).
Diwani was developed by the Ottomans from the Ta’liq style. This style became a favorite script in the Ottoman chancellery, and its name is derived from the word “Diwan,” which means “royal court.” Diwani is distinguished by the complexity of lines within letters and the close juxtaposition of letters within words.
Riq’a is a style that evolved from Nasakh and Thuluth. It is notable for the simplicity and small movements that are required to write in it, thanks to its short horizontal stems, which is why it is the most common script for everyday use. It is considered a step up from the Nasakh script, which children are taught first. In later grades, students are introduced to Riq’a.
Teardrop-Shaped Composition
Here is an animation showing the composition of the Al Jazeera logo:
Bi-Directionnality
When left-to-right text is mixed with right-to-left in the same paragraph, each text should be written in its own direction, known as “bi-directional text.”
Material Used
In case you want to try, you’ll want to know what material to use. There is a lot of typical tools, such as brush pens, scissors, a knife to cut the pens and an ink pot. But the traditional instrument of the Arabic calligrapher is the qalam, a pen made of dried reed or bamboo. “The traditional way to hold the pen,” wrote Safadi in 1987, “is with middle finger, forefinger and thumb well spaced out along the [pen’s] shaft. Only the lightest possible pressure is applied.”
As for the ink, you have many options: black and brown (often used because their intensity and consistency can be varied greatly) as well as yellow, red, blue, white, silver and gold. The important thing is that the greater strokes of the composition be very dynamic in their effect.
A Few Techniques
The development of Arabic calligraphy led to several decorative styles that were intended to accommodate special needs or tastes and to please or impress others. Here are a few outstanding techniques and scripts.
Gulzar is defined by Safadi (1979) in Islamic calligraphy as the technique of filling the area within the outlines of relatively large letters with various ornamental devices, including floral designs, geometric patterns, hunting scenes, portraits, small script and other motifs. Gulzar is often used in composite calligraphy, where it is also surrounded by decorative units and calligraphic panels.
Maraya or muthanna is the technique of mirror writing, where the composition on the left reflects the composition on the right.
Tughra is a unique calligraphic device that is used as a royal seal. The nishanghi or tughrakesh is the only scribe trained to write tughra. The emblems became quite ornate and were particularly favored by Ottoman officialdom.
In zoomorphic calligraphy, the words are manipulated into the shape of a human figure, bird, animal or object.
Sini
Sini is a Chinese Islamic calligraphic form for the Arabic script. It can refer to any type of Chinese Islamic calligraphy but is commonly used to refer to one with thick tapered effects, much like Chinese calligraphy. It is used extensively in eastern China, one of whose famous Sini calligraphers is Hajji Noor Deen.
Perso-Arabic Script: Nasta’liq Script
The predominant style in Persian calligraphy has traditionally been the Nasta’liq script. Although it is sometimes used to write Arabic-language text (where it is known as Ta’li, with Farsi used mainly for titles and headings), it has always been more popular in Persian, Turkic, and South Asian spheres. It is extensively practiced as a form of art in Iran, Pakistan and Afghanistan. Battleship movie online hd. Nasta’liq means “suspended,” which is a good way to describe the way each letter in a word is suspended from the previous one (i.e. lower, rather than on the same level).
The Perso-Arabic script is exclusively cursive. That is, the majority of letters in a word connect to each other. This feature is also included on computers. Unconnected letters are not widely accepted. In Perso-Arabic, as in Arabic, words are written from right to left, while numbers are written from left to right. To represent non-Arabic sounds, new letters were created by adding dots, lines and other shapes to existing letters.
Indic Scripts (Brahmic)
The Indic or Brahmic scripts are the most extensive family of writing systems that we haven’t looked at yet: abugidas. An abugidas is a segmental writing system which is based on consonants and in which vowel notation is obligatory but secondary. This contrasts with an alphabet proper (in which vowels have a status equal to that of consonants) and with an abjad (in which vowel marking is absent or optional).
Indic scripts are used throughout South Asia, Southeast Asia and parts of Central and East Asia (e.g. Hindi, Sanskrit, Konkani, Marathi, Nepali, Sindhi and Sherpa). They are so widespread that they vary a lot, but Devanagari is the most important one.
Devanagari Ligatures and Matra
Hindi and Nepali are both written in the Devanāgarī (देवनागरी) alphabet. Devanagari is a compound word with two roots: deva, meaning “deity,” and nagari, meaning “city.” Together, they imply a script that is both religious and urban or sophisticated.
To represent sounds that are foreign to Indic phonology, additional letters have been coined by choosing an existing Devanagari letter that represents a similar sound and adding a dot (called a nukta) beneath it. It is written from left to right, lacks distinct letter cases and is recognizable by a distinctive horizontal line running along the tops of the letters and linking them together.
In addition, a few other diacritics are used at the end of words, such as the dots illustrated below and the diagonal line, called virama, drawn under the last letter of a word if it is a consonant.
One interesting aspect of Brahmic and in particular of Devanagari here is the horizontal line used for successive consonants that lack a vowel between them. They may physically join together as a “conjunct,” or ligature, a process called samyoga (meaning “yoked together” in Sanskrit). Sometimes, the individual letters can still be discerned, while at other times the conjunction creates new shapes.
Here is a close-up of a nice ligature, the ddhrya ligature:
A letter in Devanagari has the default vowel of /a/. To indicate the same consonant followed by another vowel, additional strokes are added to the consonant letter. These strokes are called matras, or dependant forms of the vowel.
Thai Stacking Diactritics
The writing system of Thai is based on Pali, Sanskrit and Indian concepts, and many Mon and Khmer words have entered the language.
To represent a vowel other than the inherent one, extra strokes or marks are added around the basic letter. Thai has its own system of diacritics derived from Indian numerals, which denote different tones. Interestingly, like many non-Roman scripts, it has stacking diacritics.
Tibetan Mantras
Image credit
The form of Tibetan letters is based on an Indic alphabet of the mid-7th century. The orthography has not altered since the most important orthographic standardization, which took place during the early 9th century. The spoken language continues to change. As a result, in all modern Tibetan dialects, there is a great divergence of reading from the spelling.
The Tibetan script has 30 consonants, otherwise known as radicals. Syllables are separated by a tseg ་, and because many Tibetan words are monosyllabic, this mark often functions almost as a space.
As in other parts of East Asia, nobles, high lamas and persons of high rank were expected to have strong abilities in calligraphy. But the Tibetan script was done using a reed pen instead of a brush. As for a mantra, it is a sound, syllable, word or group of words that is considered capable of “creating transformation.”
The use of mantras is widespread throughout spiritual movements that are based on or off-shoots of practices from earlier Eastern traditions and religions. The mantras used in Tibetan Buddhist practice are in Sanskrit, to preserve the original mantras. Visualizations and other practices are usually done in the Tibetan language.
Vajrasattva mantra in Tibetan.
Summary
So what should you take away from this article? We have seen that Arabic and Chinese calligraphy have many different scripts variations. From geometric to cursive to regular script, there is no such thing as one calligraphic style for a language.
Sometimes there is even no such thing as one script per language. This is why Japanese is interesting: it is written in three different scripts that mix nicely. The construction of the Korean language is also fascinating: characters are grouped into squares that create syllables. Writing systems are ultimately diverse in construction, which makes them so interesting.
Many languages also have various components that can be used in our typography. Arabic and Thai, among many others, have a large system of diacritics. Arabic has a decorative aspect. Ligatures are directly related to our Latin alphabet but can be quite elaborated in such scripts as Devanagari.
You could do a lot to spice up your own designs. Did you catch the red Chinese seal, which contrasts with the usual black ink. Have you thought of rotating your fonts to give them a whole new look, as Vietnamese calligraphers do? What about the Arabic teardrop-shaped writing? If you missed all of this, you have no choice but to scroll back up and take a closer look.
Bonus: How to Integrate These Languages on a Website?
Working with foreign languages in international design projects can get a bit tricky. Obviously, studying the specifics of the language that you are supposed to work with will help you better anticipate user’s needs and avoid embarassing problems or misunderstandings. Tilt.its.psu.edu presents general guidelines for integration of various international languages in websites.
Licensing
This page is based on the copyrighted Wikipedia articles (”Hindi”,”Chinese Script Styles”, “Four Treasures of the Study”, “Hangul”); it is used under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA). You may redistribute it, verbatim or modified, providing that you comply with the terms of the CC-BY-SA.
This list is a Language recognition chart. It describes a variety of simple clues one can use to determine what language a document is written in with high accuracy.
- 2Latin alphabet (possibly extended)
- 2.1Romance languages
- 2.2Germanic languages
- 2.3Baltic languages
- 2.4Slavic languages
- 2.4.5Serbian (Srpski/Српски)
- 2.5Celtic languages
- 2.8Iranian languages
- 2.9Finno-Ugric languages
- 2.10Eskimo–Aleut languages
- 2.11Southern Athabaskan languages
- 2.15Vietnamese (tiếng Việt)
- 2.16Chinese, Romanized
- 2.16.1Standard Mandarin (現代標準漢語)
- 2.17Austronesian languages
- 2.18Turkic languages
- 2.18.1Turkish (Türkçe/Türkiye_Türkçesi)
- 3Chinese (中文)
- 7Greek (Ελληνικά)
- 7.5Greek in Greeklish
- 10Slavic languages using the Cyrillic alphabet
- 11Arabic alphabet
- 12Syriac Alphabet
- 13Dravidian languages
- 14Bengali
- 15Canadian Aboriginal syllabics
- 16Other North American syllabics
- 17Artificial languages
Characters[edit]
The language of a foreign text can often be identified by looking up characters specific to that language.
- ABCDEFGHIJKLMNOPQRSTUVWXYZ (Latin alphabet)
- and no other – English, Indonesian, Latin, Malay, Swahili, Zulu
- àéëïij – Dutch (Except for the ligature ij, these letters are very rare in Dutch. Even fairly long Dutch texts often have no diacritics.)
- áêéèëïíîôóúû Afrikaans
- êôúû – West Frisian
- ÆØÅæøå – Danish, Norwegian
- single diacritics, mostly umlauts
- ÄÖäö – Finnish (BCDFGQWXZÅbcfgqwxzå only found in names and loanwords, occasionally also ŠšŽž)
- ÅÄÖåäö – Swedish (occasionally é)
- ÄÖÕÜäöõü – Estonian
- ÄÖÜäöüß – German
- Circumflexes
- ÇÊÎŞÛçêîşû – Kurdish
- ĂÎÂŞŢăîâşţ – Romanian
- ÂÊÎÔÛŴŶÁÉÍÏâêîôûŵŷáéíï – Welsh; (ÓÚẂÝÀÈÌÒÙẀỲÄËÖÜẄŸóúẃýàèìòùẁỳäëöüẅÿ used also but much less commonly)
- ĈĜĤĴŜŬĉĝĥĵŝŭ – Esperanto
- Three or more types of diacritics
- ÇĞİÖŞÜğçıöşü – Turkish
- ÁÐÉÍÓÚÝÞÆÖáðéíóúýþæö – Icelandic
- ÁÉÍÓÖŐÚÜŰáéíóöőúüű – Hungarian
- ÀÇÉÈÍÓÒÚÜÏàçéèíóòúüï· – Catalan
- ÀÂÇÉÈÊËÎÏÔŒÙÛÜŸàâçéèêëîïôœùûüÿ – French; (diacritics on uppercase characters are often optional; Ÿ and ÿ are found only in certain proper names)
- ÁÀÇÉÈÍÓÒÚËÜÏáàçéèíóòúëüï (· only in Gascon dialect) – Occitan
- ÁÉÍÓÚÂÊÔÀãõçáéíóúâêôà (ü Brazilian and k, w and y not in native words) – Portuguese
- ÁÉÍÑÓÚÜáéíñóúü ¡¿ – Spanish
- ÀÉÈÌÒÙàéèìòù – Italian
- ÁÉÍÓÚÝÃẼĨÕŨỸÑG̃áéíóúýãẽĩõũỹñg̃ - Guarani (the only language to use g̃)
- ÁĄĄ́ÉĘĘ́ÍĮĮ́ŁŃ áąą́éęę́íįį́łń (FQRVfqrv not in native words) – Southern Athabaskan languages
- ’ÓǪǪ́ āą̄ēę̄īį̄óōǫǫ́ǭúū – Western Apache
- 'ÓǪǪ́ óǫǫ́ – Navajo
- ’ÚŲŲ́ úųų́ – Chiricahua/Mescalero
- ąłńóż Lechitic languages
- ćęśź Polish
- ćśůź Silesian
- ãéëòôù Kashubian
- A, Ą, Ã, B, C, D, E, É, Ë, F, G, H, I, J, K, L, Ł, M, N, Ń, O, Ò, Ó, Ô, P, R, S, T, U, Ù, W, Y, Z, Ż – Kashubian
- ČŠŽ
- and no other – Slovene
- ĆĐ – Bosnian, Croatian, SerbianLatin
- ÁĎÉĚŇÓŘŤÚŮÝáďéěňóřťúůý – Czech
- ÁÄĎÉÍĽĹŇÓÔŔŤÚÝáäďéíľĺňóôŕťúý – Slovak
- ĀĒĢĪĶĻŅŌŖŪāēģīķļņōŗū – Latvian; (ŌŖ and ōŗ no longer used in most modern day Latvian)
- ĄĘĖĮŲŪąęėįųū – Lithuanian
- ĐÀẢÃÁẠĂẰẲẴẮẶÂẦẨẪẤẬÈẺẼÉẸÊỀỂỄẾỆÌỈĨÍỊÒỎÕÓỌÔỒỔỖỐỘƠỜỞỠỚỢÙỦŨÚỤƯỪỬỮỨỰỲỶỸÝỴ đàảãáạăằẳẵắặâầẩẫấậèẻẽéẹêềểễếệìỉĩíịòỏõóọồổỗốơờởỡớợùủũúụưừửữứựỳỷỹýỵ – Vietnamese
- ꞗĕŏŭo᷄ơ᷄u᷄ – Middle Vietnamese
- ā ē ī ō ū – May be seen in some Japanese texts in Rōmaji or transcriptions (see below) or Hawaiian and Māori texts.
- é – Sundanese
- ñ - Basque
- ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه و ي Arabic script
- Arabic, Malay (Jawi), Kurdish (Soranî), Panjabi / Punjabi, Pashto, Sindhi, Urdu, others.
- پ چ ژ گ – Persian (Farsi)
- Brahmic family of scripts
- Bengali script
- অ আ কা কি কী উ কু ঊ কূ ঋ কৃ এ কে ঐ কৈ ও কো ঔ কৌ ক্ কত্ কং কঃ কঁ ক খ গ ঘ ঙ চ ছ জ ঝ ঞ ট ঠ ড ঢ ণ ত থ দ ধ ন প ফ ব ভ ম য র ৰ ল ৱ শ ষ স হ য় ড় ঢ় ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯
- used to write Bengali and Assamese.
- Devanāgarī
- अ प आ पा इ पि ई पी उ पु ऊ पू ऋ पृ ॠ पॄ ऌ पॢ ॡ पॣ ऍ पॅ ऎ पॆ ए पे ऐ पै ऑ पॉ ऒ पॊ ओ पो औ पौ क ख ग घ ङ च छ ज झ ञ ट ठ ड ढ ण त थ द ध न प फ ब भ म य र ल ळ व श ष स ह ० १ २ ३ ४ ५ ६ ७ ८ ९ प् पँ पं पः प़ पऽ
- used to write, either along with other scripts or exclusively, several Indian languages including Sanskrit, Hindi, Maithili, MagahiMarathi, Kashmiri, Sindhi, Bhili, Konkani, Bhojpuri and Nepali from Nepal.
- Gurmukhi
- ਅਆਇਈਉਊਏਐਓਔਕਖਗਘਙਚਛਜਝਞਟਠਡਢਣਤਥਦਧਨਪਫਬਭਮਯਰਲਲ਼ਵਸ਼ਸਹ
- primarily used to write Punjabi as well as Braj Bhasha, Khariboli (and other Hindustani dialects), Sanskrit and Sindhi.
- Gujarati script
- અ આ ઇ ઈ ઉ ઊ ઋ ઌ ઍ એ ઐ ઑ ઓ ઔ ક ખ ગ ઘ ઙ ચ છ જ ઝ ઞ ટ ઠ ડ ઢ ણ ત થ દ ધ ન પ ફ બ ભ મ ય ર લ ળ વ શ ષ સ હ ૠ ૡૢૣ
- used to write Gujarati and Kachchi
- Tibetan script
- ཀ ཁ ག ང ཅ ཆ ཇ ཉ ཏ ཐ ད ན པ ཕ བ མ ཙ ཚ ཛ ཝ ཞ ཟ འ ཡ ར ལ ཤ ས ཧ ཨ
- used to write Standard Tibetan, Dzongkha (Bhutanese), and Sikkimese
- Bengali script
- АБВГДЕЖЗИКЛМНОПРСТУФХЦЧШ (Cyrillic alphabet)
- ЙЩЬЮЯ
- Ъ – Bulgarian
- ЁЫЭ
- Ў, no Щ, І instead of И (Ґ in some variants) – Belarusian
- rarely Ъ – Russian
- ҐЄІЇ – Ukrainian
- ЉЊЏ, Ј instead of Й (Vuk Karadžić's reform)
- ЃЌЅ – Macedonian
- ЋЂ – Serbian
- ЄꙂꙀЗІЇꙈОуꙊѠЩЪꙐЬѢЮꙖѤѦѨѪѬѮѰѲѴҀ – Old Church Slavonic, Church Slavonic
- Ӂ – Romanian in Transnistria (elsewhere in Latin)
- ЙЩЬЮЯ
- ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ αβγδεζηθικλμνξοπρσςτυφχψω (Greek Alphabet) – Greek
- אבגדהוזחטיכלמנסעפצקרשת (Hebrew alphabet)
- and maybe some odd dots and lines above, below, or inside characters – Hebrew
- פֿ; dots/lines below letters appearing only with א,י, and ו – Yiddish
- no dots or lines around the letters, and more than a few words end with א (i.e., they have it at the leftmost position) – Aramaic
- 漢字文化圈 – Some East Asian Languages
- and no other – Chinese
- with あいうえおの Hiragana and/or アイウエオノ Katakana – Japanese
- 위키백과에 (note commonplace ellipses and circles) Korean
- ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏ etc. -- ㄓㄨˋㄧㄣㄈㄨˊㄏㄠˋ (Bopomofo)
- ㄪㄫㄬ -- not Mandarin
- កខគឃងចឆជឈញដឋឌឍណតថទធនបផពភមសហយរលឡអវអ្កអ្ខអ្គអ្ឃអ្ងអ្ចអ្ឆអ្ឈអ្ញអ្ឌអ្ឋអ្ឌអ្ឃអ្ណអ្តអ្ថអ្ទអ្ធអ្នអ្បអ្ផអ្ពអ្ភអ្មអ្សអ្ហអ្យអ្រអ្យអ្លអ្អអ្វ អក្សរខ្មែរ (Khmer alphabet) - Khmer
- Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք Օ Ֆ (Armenian alphabet) – Armenian
- ა ბ გდ ევ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ (Georgian alphabet) – Georgian
- AEIOUHKLMNPW' Hawaiian alphabet - Hawaiian
Latin alphabet (possibly extended)[edit]
Romance languages[edit]
Lots of Latin roots.
French (le français)[edit]
- Accented letters: â ç è é ê î ô û, rarely ë ï ; ù only in the word où, à only in the word à and at end of words ; never á í ì ó ò ú
- Angle quotation marks: « » (though 'curly-Q' quotation marks are also used); dialogue traditionally indicated by means of dashes
- Many apostrophised contractions, i.e. words beginning with l' or d', less often c', j', m', n', s', t' — only before vowels and h
- Common words: de, la, le, du, des, il, et;
- Letter w is rare and used only in loanwords (e.g whisky).
- Ligatures œ and æ are conventional
- Words ending in -ux, especially -aux or -eux;
Jersey Norman / Jèrriais (Jèrriais)[edit]
Alphabets In Different Indian Languages
- Common words: lé, dé, tchi, ès, i', ch'
- Tch, dg, th and în are common character combinations. ou is frequently followed by another vowel.
- Many apostrophised short forms, e.g. words beginning with l', d' or r'. é frequently alternates with an apostrophe e.g. c'mîn/quémîn.
Spanish (Español)[edit]
- Characters: ¿ ¡ (inverted question and exclamation marks), ñ
- All vowels (á, é, í, ó, ú) may take an acute accent
- The letter u can take a diaeresis (ü), but only after the letter g
- Some words frequently used: de, el, del, los, la(s), uno(s), una(s), y
- No apostrophised contractions
- Word beginnings: ll- (check not Welsh)
- Word endings: -o, -a, -ción, -miento, -dad
- Angle quotation marks: « » (though 'curly-Q' quotation marks are also used); dialogue often indicated by means of dashes
Italian (Italiano)[edit]
- Almost every word ends in a vowel. Exceptions include non, il, per, con, del.
- Common one-letter word: è.
- Common word: perché.
- Letter sequences: gli, gn, sci.
- Letters j, k, w, x and y are rare and used only in loanwords (e.g. whisky).
- Word endings: -o, -a, -zione, -mento, -tà, -aggio.
- Grave accent (e.g., on à) almost always occurs in the last letter of words.
- Double consonants (tt, zz, cc, ss, bb, pp, ll, etc.) are frequent.
Catalan (Català)[edit]
- Character combination l·l and tz
- Letter sequences: tx (check not Basque) and tg
- Letters k and w are rare and only used in loanwords (e.g. walkman)
- Word endings: -o, -a, -es, ció, -tat
- Word beginning: ll-
Romanian (Română)[edit]
- Characters: ă â î ș ț
- Common words: și, de, la, a, ai, ale, alor, cu
- Word endings: -a, -ă, -u, -ul, -ului, -ţie (or -ţiune), -ment, -tate; names ending in -escu
- Double and triple i: copii, copiii
- Note that Romanian is sometimes written online with no diacritics, making it harder to identify. A cedilla is sometimes used on S (ş) and on T (ţ) instead of the correct diacritic, the comma (above).
Portuguese (Português)[edit]
- Characters: ã, õ, â, ê, ô, á, é, í, ó, ú, à, ç
- Common one-letter words: a, à, e, é, o
- Common two-letter words: ao, as, às, da, de, do, em, os, ou, um
- Common three-letter words: aos, com, das, dos, ele, ela, mas, não, por, que, são, uma
- Common endings: -ção, -dade, -ismo, -mente
- Common digraphs: ch, nh, lh; examples: chave, galinha, baralho.
- The letters k, w and y are rare. They are found mostly in loanwords, e.g.: keynesianismo, walkie-talkie, nylon.
- Most singular words end in a vowel, l, m, r, or z.
- Plural words end in -s.
Walloon (Walon)[edit]
- Characters: å, é, è, ê, î, ô, û
- Common digraphs and trigraphs: ai, ae, én, -jh-, tch, oe, -nn-, -nnm-, xh, ou
- Common one-letter words: a, å, e, i, t', l', s', k'
- Common two-letter words: al, ås, li, el, vs, ki, si, pô, pa, po, ni, èn, dj'
- Common three-letter words: dji, nén, rén, bén, pol, mel
- Common endings: -aedje, -mint, -xhmint, -ès, -ou, -owe, -yî, -åcion
- Apostrophes are followed by a space (preferably non breaking one), eg: l' ome instead of l'ome.
Galician (Galego)[edit]
- Similar to Portuguese; the indefinite article 'unha' (fem. plural), the suffix -çom and a heavier usage of the letter 'x' usually sign Galician.
- Definite articles o or ó (masc. sing.), os (masc. plural), a (fem. sing.), as (fem. plural)
- Common diagraphs: nh (ningunha)
- The letters j, k, w and y are not in the alphabet, and appear only in loanwords
Germanic languages[edit]
English[edit]
- words: a, an, and, in, of, on, the, that, to, is, I (should always be a capital)
- letter sequences: th, ch, sh, ough, augh
- word endings: -ing, -tion, -ed, -age, -s, -’s, -’ve, -n’t, -’d
- diacritics or accents only in loanwords (piñata)
Dutch (Nederlands)[edit]
- letter sequences ij (capitalized as IJ, and also found as a ligature, IJ or ij), ei, doubled vowels (but not ii), kw, sch, oei, ooi, and uw (especially eeuw, ieuw, auw, and ouw).
- words: het, op, en, een, voor (and compounds of voor).
- word endings: -tje, -sje, -ing, -en, -lijk,
- at the start of words: z-, v-, ge-
- t/m occasionally occurs between two points in time or between numbers (e.g. house numbers).
West Frisian (Frysk)[edit]
- letter sequences: ij, ei, oa
- words: yn
Afrikaans (Afrikaans)[edit]
- Words: 'n, as, vir, nie.
- Similar to Dutch, but:
- the common Dutch letters c and z are rare and used only in loanwords (e.g. chalet);
- the common Dutch vowel ij is not used; instead, i and y are used (e.g. -lik, sy);
- the common Dutch word ending -en is rare, being replaced by -e.
German (Deutsch)[edit]
- umlauts (ä, ö, ü), ess-zett (ß)
- letter sequences: ch, sch, tsch, tz, ss,
- common words: der, die, das, den, dem, des, er, sie, es, ist, ich, du, aber
- common endings: -en, -er, -ern, -st, -ung, -chen, -tät
- rare letters: x, y (except in loanwords)
- letter c rarely used except in the sequences listed above and in loanwords
- long compound words
- a period (.) after ordinal numbers, e.g. 3. Oktober
- many capitalised words in the middle of sentences.
Swedish (Svenska)[edit]
- letters å, ä, ö, rarely é
- common words: och, i, att, det, en, som, är, av, den, på
- long compound words
- letter sequences: stj, sj, skj, tj, ck, än, and occasionally surnames ending in -qvist
- no use of characters w, z except for foreign proper nouns and some loanwords but x is used, unlike Danish and Norwegian, which replace it with ks
Danish (Dansk)[edit]
- letters æ, ø, å
- common words: af, og, til, er, på, med, det, den;
- common endings: -tion, -ing, -else, -hed;
- long compound words;
- no use of character q, w, x and z except for foreign proper nouns and some loanwords;
- to distinguish from Norwegian: uses letter combination øj; frequent use of æ; spellings of borrowed foreign words are retained (in particular use of c), such as centralstation.
Norwegian (Norsk)[edit]
- letters æ, ø, å
- common words: av, ble, er, og, en, et, men, i, å, for, eller;
- common endings: -sjon, -ing, -else, -het;
- long compound words;
- no use of character c, w, z and x except for foreign proper nouns and some loanwords;
- two versions of the language: Bokmål (much closer to Danish) and Nynorsk – for example ikke, lørdag, Norge (Bokmål) vs. ikkje, laurdag, Noreg (Nynorsk); Nynorsk uses the word òg; printed materials almost always published in Bokmål only;
- to distinguish from Danish: uses letter combination øy; less frequent use of æ; spellings of borrowed foreign words are ‘Norsified’ (in particular removing use of c), such as sentralstasjon.
Icelandic (Íslenska)[edit]
- letters á, ð, é, í, ó, ú, ý, þ, æ, ö
- common beginnings: fj-, gj-, hj-, hl-, hr-, hv-, kj-, and sj-,
- common endings: -ar (especially -nar), -ir (especially -nir), -ur, -nn (especially -inn)
- no use of character c, q, w, or z except for foreign proper nouns, some loanwords, and, in the case of z, older texts.
Faroese (Føroyskt)[edit]
- letters á, ð, í, ó, ú, ý, æ, ø
- letter combinations: ggj, oy, skt
- to distinguish from Icelandic: does not use é or þ, uses ø instead of ö (occasionally rendered as ö on road signs, or even ő).
Baltic languages[edit]
Latvian (Latviešu)[edit]
- uses diacritics: ā, č, ē, ģ, ī, ķ, ļ, ņ, ō, ŗ, š, ū, ž
- does not have letters: q, w, x, y
- no longer uses ō or ŗ in modern language
- extremely rare doubling of vowels
- rare doubling of consonants
- a period (.) after ordinal numbers, e.g. 2005. gads
- common words: ir, bija, tika, es, viņš
Lithuanian (Lietuvių)[edit]
- visual abundance of letters ą, č, ę, ė, į, š, ų, ū, ž
- does not have letters q, w, x
- extremely rare doubling of vowels and consonants
- many varying forms (usually endings) of the same word, e.g. namas, namo, namus, namams, etc.
- generally long words (absence of articles and fewer prepositions in comparison to Germanic languages)
- common words: ir, yra, kad, bet.
Slavic languages[edit]
Polish (Polski)[edit]
- consonant clusters rz, sz, cz, prz, trz
- includes: ą, ę, ć, ś, ł, ó, ż, ź
- words w, z, k, we, i, na (several one-letter words)
- words jest, się
- words beginning with był, będzie, jest (forms of copulabyć, 'to be').
Czech (Čeština)[edit]
- visual abundance of letters ž š ů ě ř
- words je, v
- to distinguish from Slovak: does not use ä, ľ, ĺ, ŕ or ô; ú only appears at the beginning of words.
Slovak (Slovenčina)[edit]
- visual abundance of letters ž š č;
- uses: ä, ľ, and ô and (very rarely) ĺ and ŕ;
- typical suffixes: -cia, -ť;
- to distinguish from Czech: does not use ě, ř or ů.
Croatian (Hrvatski)[edit]
- similar to Serbian
- letters-digraphs dž, lj, nj
- does not have q, w, x, y
- typical suffixes: -ti, -ći
- special letters: č, ć, š, ž, đ
- common words: a, i, u, je
- to distinguish from Serbian: infixes-ije- and -je- are common, verbs ending in -irati, -iran
Serbian (Srpski/Српски)[edit]
Serbian Latin[edit]
- similar to Croatian
- letters-digraphs dž, lj, nj (lj and nj are somewhat more common than dž, although not by much)
- no q, w, x, y
- typical verb suffixes -ti, -ći (infinitive is much less used than in Croatian)
- foreign words might end in -tija, -ovan, -ovati, -uje
- special letters: đ (rare), č, š (common), ć, ž (less common)
- common words: a, i, u, je, jeste
- future tense suffix -iće, -ićeš, -ićemo, -ićete (not found in Croatian)
- infixes -ije- and -je- are very often in Serbian that is spoken in Bosnia and Herzegovina, Montenegro and Croatia (ijekavica), but it does not appear in Serbia because each of those infixes are substituted with -e- (ekavica).
Serbian Cyrillic[edit]
- uses Џ, Ј, Љ, Њ, Ђ, Ћ
- does not use Щ, Ъ, Ы, Ь, Э, Ю, Я, Ё, Є, Ґ, Ї, І, Ў
- to distinguish from Macedonian: does not use Ѕ, Ѓ, Ќ
Celtic languages[edit]
Welsh (Cymraeg)[edit]
- letters Ŵ, ŵ used in Welsh
- words y, yr, yn, a, ac, i, o
- letter sequences wy, ch, dd, ff, ll, mh, ngh, nh, ph, rh, th, si
- letters not used: k, q, v, x, z
- letter only used rarely, in loanwords: j
- commonly accented letters: â, ê, î, ô, û, ŵ, ŷ, although acute (´), grave (`), and dieresis (¨) accents can hypothetically occur on all vowels
- word endings: -ion, -au, -wr, -wyr
- y is the most common letter in the language
- w between consonants (w in fact represents a vowel in the Welsh language)
- circumflex accent (^) is by far the commonest diacritical mark, although diacritics are often omitted altogether
Irish (Gaeilge)[edit]
- vowels with acute accents: á é í ó ú
- words beginning with letter sequences bp dt gc bhf
- letter sequences sc cht
- no use of the letter J, K, Q, V, W.
- frequent bh, ch, dh, fh, gh, mh, th, sh
- to distinguish from (Scottish) Gaelic: there may be words or names with the second (or even third) letter capitalized instead of the first: hÉireann.
Scottish Gaelic (Gàidhlig)[edit]
- vowels with grave accents: à è ì ò ù (é and ó still occasionally seen but usage is now discouraged)
- letter sequences sg chd
- frequent bh, ch, dh, fh, gh, mh, th, sh
- to distinguish from Irish: prefixes are hyphenated, so capitals in the middle of words generally do not occur: an t-Oban.
Albanian (Shqip)[edit]
- unique letters: ë, ç.
- ë is the most common letter in the language.
- the letter w is not used except in loanwords.
- dh, gj, ll, nj, rr, sh, th, xh, and zh are considered one letter instead of two.
- common words: po, jo, dhe, i, të, me
Maltese (Malti)[edit]
- unique letters: 'ċ', 'ġ', 'ħ', 'għ', 'ħ', 'ż'
- semitic origin, fairly intelligible with Arabic
- uses il-xxx for the definite article
Iranian languages[edit]
Kurdish (Kurdî / كوردی)[edit]
- The word xwe (oneself, myself, yourself etc.) is highly specific (xw combination) and frequent.
- Most frequent letter is ( I, i ) which equivalent to (Schwa).
- Using circumflex ( ^ ): ê, î, û.
- Using cedilla ( ¸ ): ç, ş.
- Have eight vowels (a, e, ê, i, î, o, u, û) where impossible to find a word without any vowel.
- Have lots of compound words.
Finno-Ugric languages[edit]
Finnish (Suomi)[edit]
- distinct letters ä and ö; but never õ or ü (y takes the place of ü)
- b, f, z, š and ž appear in loanwords and proper names only; the last two are substituted with sh or zh in some texts
- c, q, w, x appear in (typically foreign) proper names only
- outside of loanwords, d appears only between vowels or in hd
- outside of loanwords, g only appears in ng
- outside of loanwords, words do not begin with two consonants; this is reflected in the general syllable structure, where consonant clusters only occur across syllable boundaries, except in some loanwords
- common words: sinä, on
- common endings: -nen, -ka/-kä, -in, -t (plural suffix)
- common vowel combinations: ai, uo, ei, ie, oi, yö, äi
- unusually high degree of letter duplication, both vowels and consonants will be geminated, for example aa, ee, ii, kk, ll, ss, yy, ää
- frequent long words
Estonian (Eesti)[edit]
- distinct letters: õ, ä, ö and ü; but never ß or å
- similar to Finnish, except:
- letter y is not used, except in loanwords (ü is the corresponding vowel)
- letters b and g (without preceding n) are found outside of loanwords
- occasional use of š and ž, mainly in loanwords (plus combination tš)
- loanwords more common generally than in Finnish, mainly loaned from German
- words end in consonants more frequently than in Finnish, word-final b, d, v being particularly typical
- letter d is much more common in Estonian than in Finnish, and in Estonian it is often the last letter of the word (plural suffix), which it never is in Finnish
- double öö more common than in Finnish; other doubles can include õõ, üü, rarely hh (for German ch) and even šš
- common words: ja, on, ei, ta, see, või.
Hungarian (Magyar)[edit]
- letters ő and ű (double acute accent) unique to Hungarian
- accented letters á and é frequent
- letter combinations: cs, gy, ly, ny, sz, ty, zs (all classed as separate letters), leg‐, ‐obb (note: sz also common in Polish)
- common words: a, az, ez, egy, és, van, hogy
- letter k very frequent (plural suffix)
Eskimo–Aleut languages[edit]
Greenlandic[edit]
- long polysynthetic words (a single word can number 30+ letters)
- relatively abundant n, q (not necessarily followed by u), u
- ubiquitous double consonants and vowels (aa, ii, qq, uu, more rarely ee, oo)
- vowels a, i, u conspicuously more frequent than e, o (which are only found before q and r)
- no diphthongs except occasional word-final ai, only consonant combinations besides double consonants and (n)ng consist of r + consonant
- old spellings (now abolished in spelling reform) sometimes included acute accent, circumflex and/or tilde: Qânâq vs. Qaanaaq.
Southern Athabaskan languages[edit]
- vowels with acute accent, ogonek (nasal hook), or both: á, ą, ą́
- doubled vowels: aa, áá, ąą, ą́ą́
- slashed l: ł (check not Polish!)
- n with acute accent: ń
- quotation mark: ' or ’
- sequences: dl, tł, tł’, dz, ts’, ií, áa, aá
- may have rather long words
Western Apache (Nnee biyáti’/Ndee biyáti’)[edit]
In addition to the above,
- may use: u or ú
- may use vowels with macron: ā ą̄
- does not use ų
Navajo (Diné bizaad)[edit]
In addition to the above,
- does not use u, ú, or ų
(Mescalero / Chiricahua) (Mashgaléń / Chidikáágo)[edit]
In addition to the above,
- uses: u, ú, ų
- does not use o, ó, or ǫ
Guaraní[edit]
- lots of tildes over vowels (including y) and n
- tilde over g: g̃—it's the only language in the world to use it. Example words: hagũa and g̃uahẽ.
- b, d, and g usually do not occur without m or n before (mb, nd, ng) unless they're Spanish loan words.
- f, l, q, w, x, z extremely rare outside loan words
- does not use c without h: ch
Japanese in Romaji (Nihongo/日本語)[edit]
- words: desu, aru, suru, esp. at end of sentences;
- word endings: -masu, -masen, -shita;
- letters: Japanese almost always alternates between a consonant and a vowel. Exceptions are digraphsshi and chi, fricativetsu, gemination (two of the same consonant in a row) and palatalization (a consonant followed by the letter y).
- a macron or circumflex may be used to indicate doubled vowels, eg. Tōkyō
- common words: no, o, wa, de, ni
(Note: Romaji is not often used in Japanese script. It is most often used for foreigners learning the pronunciation of the Japanese language.)
Hmong (Hmoob) written in Romanized Popular Alphabet[edit]
- Almost all written words are quite short (one syllable).
- Syllables (unless they are pronounced with mid tone) end in a tone letter: one of b s j v m g d, leading to apparent 'consonant clusters' such as -wj
- w can be the main vowel of a syllable (e.g. tswv)
- Syllables can begin with sequences such as hm-, ntxh-, nq-.
- Syllables ending in double vowels (especially -oo, -ee) possibly followed by a tone letters (as in Hmoob 'Hmong').
Vietnamese (tiếng Việt)[edit]
- Roman characters with more than one diacritical mark on the same vowel. See above.
- Almost all written words are quite short (one syllable, mostly less than six characters long).
- Words beginning with ng or ngh
- Words ending with nh
- common words: cái, không, có, ở, của, và, tại, với, để, đã, sẽ, đang, tôi, bạn, chúng, là
Vietnamese Quoted-Readable (VIQR)[edit]
- The following characters (often in combination) after vowels: ^ ( + ' ` ? ~ .
- DD, Dd, or dd
- The following character before punctuation:
Vietnamese VNI encoding[edit]
- The digits 1-8 after vowels
- The digit 9 after a D or d
- The following character before numbers:
Vietnamese Telex[edit]
- The following characters after vowels: s f r x j
- The following vowels, doubled up: a e o
- The letter w after the following characters: a o u
- DD, Dd, or dd
Chinese, Romanized[edit]
Standard Mandarin (現代標準漢語)[edit]
- In general, Mandarin syllables end only in vowels or n, ng, r; never in p, t, k, m
Pinyin[edit]
- Words beginning with x, q, zh
- Tone marks on vowels, such as ā, á, ǎ, à
- For convenience while using a computer, these are sometimes substituted with numbers, e.g. a1, a2, a3, a4
Wade–Giles[edit]
- Words do not begin with b, d, g, z, q, x, r
- Words beginning with hs
- Many hyphenated words
- Apostrophes after initial letters or digraphs, e.g. t'a, ch'i
Gwoyeu Romatzyh[edit]
- Many unusual vowel combinations such as ae, eei, ii, iee, oou, yy, etc.
- Insertion of r, e.g. arn, erng, etc.
- Words ending in nn, nq
Standard Cantonese (粵語)[edit]
- In general, Cantonese syllables can end in p, t, k, m, n, ng; never r
- Double aa is common but double ee/ii/oo/uu is rare
Southern Min / Min-Nan (Bân-lâm-gí/Bân-lâm-gú) in Pe̍h-ōe-jī[edit]
- Many hyphenated words.
- Words can end in p, t, k, m, n, ng, h; never r
- Roman characters with many diacritical marks on vowels. Unlike Vietnamese, each character has at most one such mark.
- Unusual combining characters, namely · (middle dot, always after o) and (vertical bar). ¯ (macron) is also common.
Austronesian languages[edit]
Malay (bahasa Melayu) and Indonesian (bahasa Indonesia)[edit]
May contain the following:
Prefixes: me-, mem-, memper-, pe-, per-, di-, ke-
Suffixes: -kan, -an, -i
Others (these almost always written in lowercase): yang, dan, di, ke, oleh, itu
Malay and Indonesian are mutually intelligible to proficient speakers, although translators and interpreters will generally be specialists in one or other language. See Comparison of Standard Malay and Indonesian.
Frequent use of the letter 'a' (comparable to the frequency of the English 'e').
Turkic languages[edit]
Note that some Turkic languages like Azeri and Turkmen use a similar Latin alphabet (often Jaŋalif) and similar words, and might be confused with Turkish.Azeri has the letters Əə, Xx and Qq not present in the Turkish alphabet, and Türkmen has Ää, Žž, Ňň, Ýý and Ww.Latin Characters uniquely (or nearly uniquely) used for Turkic languages: Əə, Ŋŋ, Ɵɵ, Ьь, Ƣƣ, Ğğ, İ, and ı.All Turkic languages can form long words by adding multiple suffixes.
Turkish (Türkçe/Türkiye_Türkçesi)[edit]
Turkish Alphabet[edit]
Lowercase: a b c ç d e f g ğ h ı i j k l m n o ö p r s ş t u ü v y z
Uppercase: A B C Ç D E F G Ğ H I İ J K L M N O Ö P R S Ş T U Ü V Y Z
Common words[edit]
- bir — one, a
- bu — this
- ancak — but
- oldu — was
- şu — that
Misc.[edit]
- Look for word endings. Tense changes in Turkish verbs are created by adding suffixes to the end of the verb. Pluralizations occur by adding -lar and -ler.
- Common Tense Changes: -yor-mış-muş-sun
- Possessivity/person: -im-un-ın-in-iz-dur-tır
- Example: Yapmıştır, '[He] did it'; Yap is the verb stem meaning 'to do', -mış indicates the perfect tense, -tır indicates the third person (he/she/it).
- Example: Adalar, 'Islands'; Ada is a noun meaning 'island', -lar makes it plural.)
- Example: Evimiz, 'Our house'; Ev is a noun meaning 'house', -im indicates the first-person possessor, which -iz then makes plural.)
Azeri (Azərbaycanca)[edit]
Azeri can be easily recognized by the frequent use of ə. This letter is not used in any other officially recognized modern Latin alphabet. In addition, it uses the letters x and q, which are not used in Turkish.
- Common words: və, ki, ilə, bu, o, isə, görə, da, də
- Frequent use of diacritics: ç, ə, ğ, ı, İ, ö, ş, ü
- Words ending in -lar, -lər, -ın, -in, -da, -də, -dan, -dən
- Words never beginning with ğ or ı
- Words rarely beginning with two or more consonants
- Transliteration of foreign words and names, e.g. Audrey Hepburn = Odri Hepbern
Chinese (中文)[edit]
- No spaces, except between punctuation marks and (sometimes) foreign words.
- Arabic numerals (0-9) sometimes used
- Punctuation:
- Period 。(not .)
- Serial comma 、(distinguished from the regular comma ,)
- Ellipse …… (six dots)
- No hiragana, katakana, or hangul
- May be written vertically
Simplified Chinese (简体) vs Traditional Chinese (繁體)[edit]
Note: Many characters were not simplified. As a result, it is common for a short word or phrase to be identical between Simplified and Traditional, but it is rare for an entire sentence to be identical as well.
Common radicals different between Traditional and Simplified:
- Simplified: 讠钅饣纟门(e.g. 语 银 饭 纪 问)
- Traditional: 訁釒飠糹門(e.g. 語 銀 飯 紀 問)
Common characters different between Traditional and Simplified:
- Simplified: 国 会 这 来 对 开 关 门 时 个 书 长 万 边 东 车 爱 儿
- Traditional: 國 會 這 來 對 開 關 門 時 個 書 長 萬 邊 東 車 愛 兒
Standard written Chinese (based on Mandarin) vs written Vernacular Cantonese[edit]
Hollywood hindi dubbed free download. Note: Cantonese-speakers live in Mainland China, Hong Kong,Taiwan and Macau, so written Cantonese can be written in either Simplified or Traditional characters.
Common characters in Vernacular Cantonese that do not occur in Mandarin (only characters that are the same between Traditional and Simplified are chosen here):
- 嘅 咗 咁 嚟 啲 唔 佢 乜 嘢
Some of the above characters are not supported in all character encodings, so sometimes the 口 radical on the left is substituted with a 0 or o, e.g.
- o既 0既
Japanese (日本語)[edit]
- Katakana (カタカナ) and hiragana (ひらがな) characters mixed with kanji (漢字)
- Few or no spaces
- Arabic numerals (0-9) sometimes used
- Punctuation:
- Period 。
- Comma 、(,also used)
- Quotation marks 「」
- Occasional small characters beside large ones, eg. しゃ りゅ しょ って シャ リュ ショ ッテ
- Double tick marks (known as dakuon or handakuon) appearing at upper right of characters, eg. で が ず デ ガ ズ
- Empty circles (maru) appearing at upper right of characters, eg. ぱ ぴ パ ぴ
- Frequent characters: の を は が
- May be written vertically
Korean (한국어/조선말)[edit]
- Western-style punctuation marks
- Western-style spacing
- Hangul letters, e.g. ㅎ h, ㅇ ng, ㅂ b, etc.
- Hangul letters used to form syllable blocks; e.g. ㅅ s + ㅓ eo + ㅇ ng = 성 seong
- Circles and ellipses are commonplace in Hangul; are exceedingly rare in Chinese.
- General appearance has relatively-uniform complexity, as contrasted with Chinese or Japanese.
[edit]
Khmer is written using the distinctive Khmer alphabet.
- rarely uses spaces
- Letters have a distinctively 'taller' shape than other Brahmic scripts.
- Uses Khmer numerals in writing ១ ២ ៣ ៤ ៥ ៦ ៧ ៨ ៩.
- Has 'clusters' of letters stuffed upon each other.
- has 24 diacritics denoting syllable rhymes - ា ិ ី ឹ ឺ ុ ូ ួ ើ ឿ ៀ េ ែ ៃ េា ៅ ុំ ំ ាំ ះ ុះ េះ ោះ
- use this as a full stop '។'
Greek (Ελληνικά)[edit]
Modern Greek is written with Greek alphabet in monotonic, polytonic or atonic, either according to Demotic (Mr. Triantafilidis) grammar or Katharevousa grammar. Some people write in Greeklish (Greek with Latin script) which is either Visual-based, orthographic or phonetic or just messed-up (mixed). The only official orthographic forms of Greek language are Monotonic and Polytonic.
Normal Modern Greek (Greek Monotonic)[edit]
- words και, είναι;
- Each multi-syllable word has one accent/tone mark (oxia): ά έ ή ί ό ύ ώ
- The only other diacritic ever used is the tréma: ϊ/ΐ, ϋ/ΰ, etc.
Pre-1980s Greek (Greek Polytonic)[edit]
Katharevousa, Dimotiki (Triantafylidis' grammar)
- Diacritics: ά, ᾶ, ἀ, ἁ, and combinations, also with other vowels.
- Some texts, especially in Katharevousa, also have ὰ, ᾳ, in combination with other diacritics.
Ancient Greek[edit]
- Diacritics: ά, ὰ, ᾶ, ἀ, ἁ, ᾳ, and combinations, also with other vowels; ῥ; tilde (ᾶ) often appears more like a rounded circumflex
- some texts feature lunate sigma (looks like c) instead of σ/ς
Greek Atonic[edit]
- Was common in some Greek media (television);
- You will see Greek characters without accents/tones;
- words: και, ειναι, αυτο.
Greek in Greeklish[edit]
- Automated conversion software for Greeklish->Greek conversion exists. If you notice a Greeklish text it may be useful for the Greek el.wikipedia (after conversion).
- Keep in mind: in Greeklish more than one character may be used for one letter. (example: th for Θ (theta)).
Orthographic Greeklish[edit]
- words kai, einai.
Phonetic Greeklish[edit]
- words ke, ine;
- omega appears as o;
- ei, oi appear as i;
- ai appears as e.
Visual-based Greeklish[edit]
- omega (Ω or ω) may appear as W or w;
- epsilon (E) may appear as 3;
- alpha (A) may appear as 4;
- theta (Θ) may appear as 8;
- upsilon (Y) may appear as /;
- gamma (γ) may appear as y
- More than one character may be used for one letter.
Messed-up (Mixed) Greeklish[edit]
- words kai, eine;
- combines principles of phonetic, visual-based and orthographic Greeklish according to writer's idiosyncrasy;
- The most commonly used form of Greeklish.
Armenian (Հայերեն)[edit]
Armenian can be recognized by its unique 39-letter alphabet:
Ա Բ Գ Դ Ե Զ Է Ը Թ Ժ Ի Լ Խ Ծ Կ Հ Ձ Ղ Ճ Մ Յ Ն Շ Ո Չ Պ Ջ Ռ Ս Վ Տ Ր Ց Ւ Փ Ք ԵՎ(և) Օ Ֆ
Georgian (ქართული)[edit]
Georgian can be recognised by its unique alphabet (note some characters have fallen out of use).
ა ბ გ დ ე ვ ზ ჱ თ ი კ ლ მ ნ ჲ ო პ ჟ რ ს ტ ჳ უ ფ ქ ღ ყ შ ჩ ც ძ წ ჭ ხ ჴ ჯ ჰ ჵ ჶ ჷ ჸ
Slavic languages using the Cyrillic alphabet[edit]
Bolding denotes letters unique to the language
Belarusian (беларуская)[edit]
- uses: ё, і, й, ў, ы, э, ’
- features: шч used instead of щ
- the only Cyrillic language not to feature и.
Bulgarian (български)[edit]
- uses: ъ, щ, я, ю, й
- words: със, в
- features: ъ is used as a vowel; many words end in definite article –ът, –ят, –та, –то, –те
Macedonian (македонски)[edit]
- uses: ј, љ, њ, џ, ѓ, ќ, ѕ
- words: во, со
- features: р is usually found between consonants, for example првин
Russian (русский)[edit]
- uses: ё, й, ъ, ы, э, щ
Serbian (српски)[edit]
- uses: ј, љ, њ, џ, ђ, ћ
- does not use: ъ, щ, я, ю, й
- words: је, у
- features: large consonant clusters, for example српски
Ukrainian (українська)[edit]
- uses: є, и, і, ї, й, ґ, є щ
- does not use: ъ, ё, ы, э
Arabic alphabet[edit]
- All languages using the Arabic alphabet are written right-to-left.
- A number of other languages have been written in the Arabic alphabet in the past, but now are more commonly written in Latin characters; examples include Turkish, Somali and Swahili.
Arabic (العربية)[edit]
- short vowels are not written so many words are written with no vowel at all
- common prefix: -ال
- common suffix: ة-
- words: إلى, من, على
Persian (فارسی)[edit]
- uses: پ, چ, ژ, گ
- words: که, به
Urdu (اردو)[edit]
- uses: ٹ, ڈ, ڑ, ں, ے
- many words ending in ے
- words: اور, ہے
- to distinguish from Arabic: in many texts, Urdu is written stylistically with words ‘slanting’ downwards from top-right to bottom-left (unlike the ‘linear’ style of Arabic, Persian etc).
Syriac Alphabet[edit]
Syriac (ܐܬܘܪܝܐ)[edit]
- short vowels are not usually written so many words are written with no vowel at all
- three styles of writing (estrangela, serto, mahdnaya) and two different ways of representing vowels
- basic alphabet in Estrangela style is: ܐ ܒ ܓ ܕ ܗ ܘ ܙ ܚ ܛ ܝ ܟ ܠ ܡ ܢ ܣ ܥ ܦ ܨ ܩ ܪ ܣ ܬ
- basic alphabet in Serto style is: ܬ, ܫ, ܪ, ܩ, ܨ, ܦ, ܥ, ܣ, ܢ, ܡ, ܠ, ܟ, ܝ, ܛ, ܚ, ܙ, ܘ, ܗ, ܕ, ܓ, ܒ, ܐ
- basic alphabet in Madnhaya style is: ܬ,ܫ,ܪ,ܩ,ܨ,ܦ,ܥ,ܣ,ܢ,ܡ,ܠ,ܟ,ܝ,ܛ,ܚ,ܙ,ܘ,ܗ, ܕ,ܓ,ܒ,ܐ
Dravidian languages[edit]
- All Dravidian languages are written from left to right.
- All dravidian languages have different scripts. But similarity can be found in their orthography.
Tamil[edit]
- common word endings :ள்ளது, கிறது, கின்றன, ம்
- common words: தமிழ், அவர், உள்ள, சில
- Tamil has a unique 30-letter alphabet. With the help of diacritics, as many as 247 letters can be written.
அ ஆ இ ஈ உ ஊ எ ஏ ஐ ஒ ஓ ஔ க ங ச ஞ ட ண த ந ப ம ய
Bengali[edit]
The Bengali alphabet or Bangla alphabet (Bengali: বাংলা বর্ণমালা, bangla bôrnômala) or Bengali script (Bengali: বাংলা লিপি, bangla lipi) is the writing system, originating in the Indian subcontinent, for the Bengali language and is the fifth most widely used writing system in the world. The script is used for other languages like Assamese, Maithili, Meithei and Bishnupriya Manipuri, and has historically been used to write Sanskrit within Bengal.
Bengali[edit]
Bengali has unique 50 letter Alphabet.
- The Bengali script has a total of 9 vowel graphemes, each of which is called a স্বরবর্ণ swôrôbôrnô 'vowel letter'. The swôrôbôrnôs represent six of the seven main vowel sounds of Bengali, along with two vowel diphthongs. All of them are used in both Bengali and Assamese languages.
অ আ ই ঈ উ ঊ ঋ এ ঐ ও ঔ
- The Bengali script has a total of 39 Consonants. Consonant letters are called ব্যঞ্জনবর্ণ bænjônbôrnô 'consonant letter' in Bengali. The names of the letters are typically just the consonant sound plus the inherent vowel অ ô. Since the inherent vowel is assumed and not written, most letters' names look identical to the letter itself (the name of the letter ঘ is itself ghô, not gh).
ক খ গ ঘ ঙচ ছ জ ঝ ঞট ঠ ড ঢ ণত থ দ ধ নপ ফ ব ভ ময র ল শ ষ স হ ড় ঢ় য়ৎ ঃ ং ঁ
- has 10 diacritics denoting syllable rhymes -
া ি ী ু ূ ৃ ে ৈ ো ৌ
Assamese[edit]
- The Assamese script has a total of 9 vowel graphemes, each of which is called a স্বরবর্ণ swôrôbôrnô 'vowel letter' too.
অ আ ই ঈ উ ঊ ঋ এ ঐ ও ঔ
- has a total of 39 Consonants. Consonant letters are called ব্যঞ্জনবর্ণ bænjônbôrnô 'consonant letter' in Bengali.
ক খ গ ঘ ঙচ ছ জ ঝ ঞট ঠ ড ঢ ণত থ দ ধ নপ ফ ব ভ ময ৰ ল শ ষ স হ ড় ঢ় য়ৎ ঃ ং ঁ
- has 10 diacritics denoting syllable rhymes -
া ি ী ু ূ ৃ ে ৈ ো ৌ
Canadian Aboriginal syllabics[edit]
In modern writing, Canadian Aboriginal syllabics are indicative of Cree languages, Inuktitut, or Ojibwe, though the latter two are also written in alternative scripts. The basic glyph set is ᐁ ᐱ ᑌ ᑫ ᒉ ᒣ ᓀ ᓭ ᔦ, each of which may appear in any of four orientations, boldfaced, superscripted, and with diacritics including ᑊ ᐟ ᐠ ᐨ ᒼ ᐣ ᐢ ᐧ ᐤ ᐦ ᕽ ᓫ ᕑ. This abugida has also been used for Blackfoot.
Cree language[edit]
Inuktitut[edit]
Other North American syllabics[edit]
Blackfoot[edit]
Cherokee[edit]
Artificial languages[edit]
Esperanto (Esperanto)[edit]
- words: de, la, al, kaj
- Six accented letters: ĉ Ĉ ĝ Ĝ ĥ Ĥ ĵ Ĵ ŝ Ŝ ŭ Ŭ, their corresponding H-system representation ch Ch gh Gh hh Hh jh Jh sh Sh u U or their corresponding X-system representation cx Cx gx Gx hx Hx jx Jx sx Sx ux Ux
- words ending in o, a, oj, aj, on, an, ojn, ajn, as, os, is, us, u, i, aŭ
Klingon (tlhIngan Hol)[edit]
- When written in the Latin alphabet Klingon has the unusual property of a distinction in case; q and Q are different letters, and other letters are either always (e.g. D, I, S) or never (e.g. ch, tlh, v) written in upper case. This causes a large number of words that look quite strange to people who aren't used to it, for example: yIDoghQo', tlhIngan Hol (with mixed case).
- The apostrophe is fairly frequent, especially at the end of a word or syllable.
- Common suffixes: -be', -'a'
- Common words: 'oH, Qapla'
- May use one or more apostrophes in the middle of a word: SuvwI″a'
Lojban (lojban.)[edit]
- (almost) all lowercase;
- common words lo, mi, cu, la, nu, do, na, se;
- paragraphs delimited with with ni'o and sentences delimited with .i (or i);
- many five-letter words in consonant-vowel shape CCVCV or CVCCV;
- many short words with apostophes between vowels, like ko'api'o etc.;
- usually no punctuation except for dots;
- may use commas in the middle of words (typically proper nouns).
External links[edit]
- Language Identification Web Service, language detection API, 100+ languages supported
- Translated, an online language identifier, 102 languages supported
- Language Detector, Online language identification from text or URLs.
- Google Translate, Google's translation service.
- Xerox, an online language identifier, 47 languages supported
- Language Guesser, a statistical language identifier, 74 languages recognized
- NTextCat - free Language Identification API for .NET (C#): 280+ languages available out of the box. Recognizes language and encoding (UTF-8, Windows-1252, Big5, etc.) of text. Mono compatible.