Persian Language and Grammar
Persian (فارسی Fārsī) is an Indo-European language of the Iranian branch — sister to Pashto, Kurdish and Balochi. It has roughly 120 million speakers across three official varieties: Iranian Persian (Farsi), Dari (Afghanistan) and Tajiki (Tajikistan, written in Cyrillic). For roughly a thousand years (until the nineteenth century) Persian was the lingua franca of administration and high culture from the Ottoman frontier to the Bay of Bengal, making it the single most important non-Arabic literary language of the Islamicate world.
A short unwritten vowel -e (-i in Dari/Tajiki) that joins a noun to its modifier in Persian: kitāb-e Hāfez ("the book of Hafez"), gul-e sorkh ("the red rose"). The ezafe is the single most distinctive syntactic feature of Persian.
Historical stages
Persian has three diachronic stages:
| Stage | Approximate dates | Examples |
|---|---|---|
| Old Persian | c. 600 BCE – 300 BCE | Achaemenid inscriptions (Behistun) |
| Middle Persian (Pahlavi) | c. 300 BCE – 900 CE | Zoroastrian texts (Bundahishn) |
| New Persian (Farsi) | c. 900 CE – present | Rudaki, Ferdowsi, Hafez to modern Tehran prose |
New Persian emerged in the eastern Iranian world (Khurasan, Transoxiana) after the Islamic conquests, written in a modified Perso-Arabic script of 32 letters.
Script and orthography
The modern Persian alphabet is the Perso-Arabic alphabet with four additional letters (پ، چ، ژ، گ) for sounds absent in Arabic. Dari and Iranian Farsi use the same script with minor convention differences; Tajiki uses the Cyrillic alphabet (since 1939).
Phonology
Iranian Farsi has six vowels (a, e, o, ā, i, u) and 23 consonants. It has lost many Arabic phonemic distinctions in pronunciation (ث ص س are all /s/; ز ذ ض ظ are all /z/). It is non-tonal, with predictable stress (usually on the final syllable of the word stem).
- Family: Indo-European → Iranian → Western Iranian → New Persian.
- Varieties: Iranian Farsi, Afghan Dari, Tajiki (Cyrillic).
- Script: Perso-Arabic, 32 letters.
- Word order: SOV (Subject-Object-Verb).
- Distinctive feature: the ezafe linker (-e/-i).
- No grammatical gender; one general plural marker -hā / -ān.
- Compound verbs: the dominant verbal pattern (e.g., kār kardan "to work").
Nominal morphology
Persian nouns are not inflected for gender or case. They take a definite plural in -hā (broadly used) or -ān (animate, especially humans):
- ketāb — ketāb-hā "books"
- mard — mardān "men"
The definite is unmarked; indefiniteness is shown by enclitic -i: ketāb-i "a book". The accusative is marked by the postposition rā when the object is definite: ketāb rā khvāndam — "I read the book."
The ezafe
The ezafe joins a noun to:
- Possessor: ketāb-e man — "my book".
- Adjective: gul-e sorkh — "the red rose".
- Apposition / further noun: shahr-e Tehrān — "the city (of) Tehran".
It is unwritten in standard Persian script but pronounced as a short -e (Iranian) or -i (Dari/Tajiki). Mastery of ezafe is the threshold of any reading of Persian.
Verbal morphology
The Persian verb is built on two stems: a present stem and a past stem. From these, six basic finite forms derive:
| Form | Example (kardan "to do") |
|---|---|
| Present indicative | mi-konam "I do" |
| Subjunctive | bekonam "may I do" |
| Past | kardam "I did" |
| Imperfect | mi-kardam "I used to do" |
| Present perfect | karde-am "I have done" |
| Pluperfect | karde budam "I had done" |
Future is formed with the auxiliary khwāstan: khwāham kard "I shall do".
Compound verbs
Modern Persian relies overwhelmingly on compound verbs (light verb + noun): kār kardan "to work", negāh kardan "to look", eshtebāh kardan "to err". This is one of its most striking syntactic features.
Pronouns and enclitics
Persian has full pronouns (man, to, u, mā, shomā, ishān) and possessive/object enclitics (-am, -at, -ash, -emān, -etān, -eshān): ketāb-am "my book"; did-am-ash "I saw him".
Syntax
| Feature | Detail |
|---|---|
| Word order | SOV (Subject-Object-Verb) |
| Adpositions | Prepositions (e.g., az, be, bā); plus postposition rā |
| Relative clauses | Introduced by ke; head noun marked with -i indefinite |
| Negation | Prefix na- (or ne-) on the verb: nakardam "I did not do" |
| Questions | Interrogative word āyā or rising intonation |
Varieties
| Variety | Region | Notes |
|---|---|---|
| Iranian Persian (Farsi) | Iran | Tehrani pronunciation as standard |
| Dari | Afghanistan | More conservative phonology; many archaic words |
| Tajiki | Tajikistan, Uzbekistan | Cyrillic script; heavy Turkic/Russian borrowing |
The three varieties are mutually intelligible in writing (when Dari and Farsi are compared) but increasingly divergent in colloquial speech.
Loan vocabulary
Persian's vocabulary is layered with Arabic (massive after the Islamic conquests), Turkic (administrative), Mongolic (military), French (technical and political since the late nineteenth century) and English (modern technological). The Iranian Academy of Persian Language and Literature (Farhangestan) actively coins native equivalents for foreign technical terms.
For CSS, three facts almost always recur: (1) Persian is an Iranian branch of Indo-European, sister to Pashto and Balochi; (2) it uses the ezafe linker (-e/-i) — its signature syntactic feature; (3) Dari, Farsi and Tajiki are three varieties of the same New Persian, the last written in Cyrillic.
Persian and Urdu
For CSS aspirants, the most practical fact is that Persian supplied the structural backbone of literary Urdu: the script (Nastaliq), the metres (aruz), the masnavi/ghazal forms, and roughly half of the Urdu literary lexicon. To read Iqbal, Ghalib or Mir without a working knowledge of Persian grammar is impossible.