Indo-European Proto-Dialects: an article by Cyril Babaev

Indo-European Proto-Dialects.
1. Modern Linguistics about the Breakup of the Proto-Indo-European Community.

When discussing the problem of Proto-Indo-European areal dialects we should note first of all that we will try not to touch another, maybe the most vital matter of the whole Indo-European linguistics - the question of the Indo-European homeland. These two problems are quite different, though they are closely connected with each other.

The only fact we consider undoubted is that there were three main phases of the Proto-Indo-European language history:

The first: the Common Proto-Indo-European language, i.e. the linguistic community which existed in a rather narrow area somewhere in Asia or in Europe. Maybe this was not the single language, but the variety of tribal speech which had only slight differences.

The second: the Proto-dialects. With the increasing number of people - speakers of the Proto-Indo-European language - the area of the language was gradually and constantly widening, and as people were living in isolated tribes and had little contact with each other, dialectal differences appeared. It meant the difference in some phonetic features, morphological peculiarities and lexical alternate versions.

The third: the separate Indo-European languages. The migrations which began at some moment led to the diffusion of dialects within all the Eurasian continent, and Indo-European dialects lost all contacts with each other. Moreover, many of them began relations with other language families on their new homelands and therefore acquired significant new features of phonetics, morphology and vocabulary.

The third phase, naturally, generated single languages which we can definitely name: Germanic, Celtic, Anatolian, Indic, Iranian, Tocharic etc. Later they all became separate groups within the family.

What we are interested in in this article in phase 2, the Proto-dialects. Already the first linguists, who carried out the research of Indo-European matters in the beginning of the 19th century, noticed that some language groups are more similar to each other than to other groups. That is how the terms of "communities" emerged in comparative linguistics: Balto-Slavic, Illyro-Venetic, Indo-Iranian communities denoted the closer links between their members that between them and other neighbouring groups. The method of studying which groups and separate languages were more closely connected with each other is easy: comparative linguistics allows to study the similarities in phonetics and morphology, which make languages more alike each other or vice versa far from each other in development.

But this exact method appeared to lead different scholars to completely different conclusions in distinguishing Proto-Indo-European dialects. The earliest theory of identifying the original distribution of dialects was developed already in the previous century; according to it, all Indo-European languages were divided into two major groups: "satem" languages which turned the Proto-Indo-European k' into s (Iranian, Indic, Slavic, Baltic, Albanian, Armenian, Thracian), and "centum" languages, where this palatal k' became k (Italic, Hellenic, Anatolian, Tocharic, Celtic, Germanic, Venetic). The very words satem and centum mean "a hundred" in Avestan and Latin respectively, and both derived from the PIE *k'mtom. So accroding to this division, linguists made a distinction between languages which could hardly be geographically referred to as "western" or "eastern" dialects. Really, if we state that centum languages were originally a "western" PIE dialect, then how Tocharic and Anatolian appear in this group?

The theory exists and is supported even nowadays, but no longer considered as the original division of the Proto-Indo-European language. It became possible to break the theory after Hittite and Tocharic languages were discovered in the beginning of this century. Moreover, practically none of the Indo-European groups always follow the "centum - satem" rule, for example in Slavic and Baltic languages this PIE k' and other palatals (g', g'h) could become s, z as well as k, g.

Many linguists offered versions in favour of the so-called "European languages" theory, opposing such groups as Baltic, Slavic, Germanic, Celtic, Italic (i.e. spoken in Europe) to all the rest Indo-European branches. This theory supports the idea that all European languages are descendants of the "Proto-European" language, which in its turn used to be on of the two major dialects of Proto-Indo-European. Arguments of supporters of this theory are well known: first of all the geographical position of Europeans, and then the number of words found only in Indo-European tongues in Europe. Among them the most famous are *mari- (sea), *sal- (salt, dirt), *teuta- (people, a tribe), *lac- (milk), *amlu- (an apple).

But again some faults can be found. The stem *lac- was present in Tocharic and was even borrowed into Old Chinese from Tocharic; on the contrary, this stem cannot be found in Balto-Slavic. So it can be sooner Celto-Italo-Tocharic originally, than Proto-European. Another stem, *teuta-, exists in Italic, Baltic, Celtic, Germanic, but not in Slavic. On the contrary, it is found in Greek, which is referred to "Asiatic", not "Proto-European" languages. And the word *amlu- was lately defined as the European substratum (pre-Indo-European) term.

Still another version was suggested in modern linguistics and stated the "internal" and "external" dialects within the PIE community. Researchers stress similar features of Tocharic, Celtic, Italic languages and opposed them to certain similarities in Baltic, Slavic, Germanic and other groups. But sooner it became clear that we cannot judge about the Proto-IE dialects on the basis of the modern geographical distribution of the groups, neither basing on the ancient geography. It would be much more scientific to use linguistical materials and to compose comparative data which helps to recreate the structural features similar in different languages and language groups.

2. The Comparative Analysis.

We here give a little table of basic morphological traits which differ within the Indo-European family but common from certain groups. At all we have 16 characteristic features and the list of language groups which share them (we hope our abbreviations will be clear):

1.	feminine stems in -, -, -	all but Anatolian
2.	genetive singular -osyo	Indo-Iranian, Greek, Armenian
	gen.sg. -	Tocharic, Italic, Celtic, Venetic, Illyrian
	gen.sg. -	Baltic, Slavic
	gen.sg. -eso	Germanic, Baltic
3.	instrumental masculine sg. -	Indo-Iranian, Germanic, Baltic
4.	indirect cases sg. & pl. -m-	Baltic, Slavic, Germanic
	dative sg. -bhos	Italic, Celtic, Venetic, Illyrian
	indirect cases sg. & pl. -bhi-	Indo-Iranian, Greek, Armenian
5.	instr. pl. masc. -ys	all but Anatolian
6.	locative pl. -su / -si	Indo-Iranian, Baltic, Slavic, Germanic
7.	gen.-loc. dual -os	Indo-Iranian, Slavic
8.	1^st person personal pronouns -em	Indo-Iranian, Slavic
9.	demonstrative pronouns so, s, to	all but Anatolian
10.	relative pronouns. kwis	Anatolian, Tocharic, Italic
	relative pronouns yos	Indo-Iranian, Greek, Slavic, Phrygian
11.	degrees of comparison in -tero-, -isto-	Indo-Iranian, Greek, Germanic
	degrees of comparison in -samo-	Celtic, Italic
12.	athematic and thematic aorist tense	Indo-Iranian, Greek, Armenian, Phrygian
13.	medium voice in -oi / -moi	Indo-Iranian, Greek, Slavic, Baltic, Germanic
	medium voice in -r	Anatolian, Tocharic, Italic, Celtic, Phrygian
14.	subjunctive mood with -- / --	Tocharic, Italic, Celtic
15.	modal forms in -l-	Anatolian, Tocharic, Armenian, Slavic
16.	participle medium voice present tense in -mo-	Anatolian, Baltic, Slavic

This is just the morphology, no phonetic analysis is given here, but this already shows the exact relations between groups of the Indo-European family. According to this table filled by many other important grammar examples, comparativists try to discover the chronological sequence of the Proto-Indo-European dialect articulation. The deep look inot this table gives us the following conclusions.

1. We do not know which of the several alternative variants of the grammar forms above was in wider use in Proto-Indo-European itself; probably, two or three ones were used together as synonymous. But maybe only one form or ending was common for all Proto-IE speakers, and all alternate forms were invented already later, within the dialects.

2. The first language which fell apart the Proto-community, was obviously Anatolian. Among all the 16 features in the table above, the forms number 1, 5, 9 are present in all Indo-European groups, except Anatolian. Neither Hittite, nor Lydian, Palaic, nor Late Anatolian languages show the feminine stems in long a-, i-, u-, and therefore lack the very feminine gender, having only two genders: active (animate) and inactive (neuter, inanimate). Several noun case endings (including the instrumental plural masculine) and the demonstrative pronoun so, s, to are as well absent in all Anatolian languages. This all creates some distance between the Anatolian groups and all the rest groups of the family - and this gap is evidently the matter of time. The ancestors of Hittites and Luwians moved apart from the Proto-Indo-Europeans several centuries before the general greakup occurred.

Phonetic data confirms our supposition. Anatolian was the only to preserve many phonetic trends quite archaic for the family: it keeps laryngeals, which cannot be found anywhere else in ancient Indo-European languages (Latin ante, Hittite hant- "forward", Greek arktos, Hittite hartagga "a bear"). The special directive case, which is common for early Hittite and Luwian documents, is unique in the Indo-European family.

3. After Anatolians went away, heading for Asia Minor where they will settle in about 2200 BC, the Indo-European community lived together for some time more. But already in the middle of the 4th millennium BC first differences in the speech emerge, and this was the beginning of the dialectal diffusion.

We can believe that first the common Proto-language broke in two major dialects, their traces are well seen in the comparative chart. The features number 10 and 13 together with the lexical and phonetic materials give us the approximate borders of the division: the first group included Tocharic, Italic, Celtic, Illyrian, Venetic languages. In fact, we should not forget that at the time this division occurred they were a single language, or even not al language yet, but a dialect, still having just a few differences from the second dialectal group: Indic, Iranian, Greek, Armenian, Slavic, Baltic, Germanic, Phrygian, Thracian.

The special situation with Phrygian, which, though closer to the second group with its relative pronoun ios, but having -r as the medium voice ending. Here we should look at Phrygian phonetics, vocabulary, syntax and discover that it was much closer as a whole to Armenian, Greek and Thracian, than to other groups. So its place will be the second dialectal group.

By the way, such a division makes us think also that Anatolian, before it drifted apart, was closer in morphology to the first group. It must have been geographically located near the speakers of the futute Tocharic, Celtic and Italic languages, than to those speaking Indic and Greek.

3. The next division that took place in the Proto-Indo-European area was the exit of Tocharic speakers from the community. Both Tocharic languages (Tocharic A or Agnean, Tocharic B or Kushitian) demonstrate the number of forms and etyiomologies different from its neighbours Celtic, Italic, Illyrian languages. It must have left the Indo-European homeland even before the widest migrations of Indo-Europeans began.

The destiny of the Tocharic group is quite interesting. In Tocharic languages linguists find lexical borrowings from Iranian languages which mean that Tocharians were moving north-east from the Central Asia. Another big group of loanwords in Tocharic derive from Finno-Ugric languages and, what is important, even from the Proto-Finno-Ugric language. Obviously, Tocharic speakers contacted with Finno-Ugric tribes before this Proto-language was divided into two or more branches. But nevertheless, Tocharic also has a small number of lexical parallels with Slavic, Baltic, Armenian, Germanic, Indic languages:

Tocharic walo "a prince", Slavic vladeti "to rule, to own";
Tocharic soy "a son", Greek uhios "a son" - all the rest languages added the suffix -n- to this stem: Baltic sunus, Germanic sunu;
Tocharic wap- "to weave", Albanian venj "I weave".

But certainly the number of such realtive stems in Tocharic and Celtic, Tocharic and Italic are much more numerous and reflect not just timely contacts, but the common past.

4. The second group of dialects named above (let us schematically call it Indo-Thracian) could not survive for long, and had to break up as well. Its descendants united into two subgroups: Indic, Iranian, Armenian, Phrygian, Greek and Baltic, Germanic, Slavic. Already in the 19th century linguists noticed that Baltic, Germanic and Slavic languages have very much in common in morphology, vocabulary and phonetics. The comparative table above proves that. We can just look at some ancient languages of those three groups and see how they were similar still a thousand years ago - Old Prussian, Gothic, Old Church Slavonic.

The same can be said about Indic, Iranian and Greek. Geographically this division could took place in the South Russian Steppes, which are sometimes called the "secondary Indo-European homeland". Lately some trsces of Indo-Aryan presence north to the Black Sea were discovered by linguists in toponymic and hydronymic names, and this also supports the idea. Maybe, Greeks, Iranians and Aryans used to live here together - not for too long though. The Aryan - Greek - Armenian correspondences in the vocabulary make about 30 at all:

Aryan haras "heat", Greek theros "summer", Armenian jer "warmth";
Aryan marta "mortal", Iranian mar@ta "a man", Greek mortos "mortal", Armenian mard "a man"
Aryan jarant "old", Iranian zar@ta "old", Greek gern "an old man", Armenian cer "old"
Aryan stana "breast", Greek stnion "breast", Armenian stin;
Iranian izaena "leather thing", Greek aig "a goat", Armenian aye "a goat"

As for Germano-Balto-Slavic correspondent lexics, the materials are even more easy to find:
Baltic aldija "a ship", Slavic ladija, Germanic olda "a vessel";
Baltic draugas "a friend", Slavic drug, Germanic dringan "to serve the army"
Baltic dailyti "to divide", Slavic deliti, Germanic dailjan;
Baltic rugiai "rye", Slavic rozhi, Germanic rugr.

5. I cannot exactly state the time of all these divisions and subdivisions. They cannot be somehow found by means of archaeology, not history, but just using comparative linguistics. We can only be sure everything mentioned above happened between 4000 BC and 2200 BC. Neither can we distinguish the geographical borders of the dialectal groups which appeared due to all diffusions.

The 5th period included several imporant events, which led further to the spreading of Indo-European dialects and languages over Europe and partly Asia. In the beginning of this period we had three major branches, which divided the following way:

Indic-Iranian-Armenian-Phrygian-Greek > Indic-Iranian-Armenian + Phrygian-Greek >
> Indic-Iranian + Armenian + Phrygian-Greek
Baltic-Slavic-Germanic > Baltic-Slavic + Germanic
Italic-Venetic-Illyrian-Celtic > Italic-Venetic-Illyrian + Celtic

This system of divisions can be stated quite for sure, because a lot of linguistic, arhaeological and historical materials support exactly this version. Such communities (or language alliances?) as Indo-Iranian, Balto-Slavic and Illyro-Venetic, are well known even according to historical documents. And the close similarities between such languages as for example Baltic and Slavic are obvious even for an ordinary person, not only for sophisticated linguists.

Here are some basic Italo-Celtic correspondent lexics:
Latin terra < *tersa (land), Oscan teerum, Irish tr < *ters-;
Latin velum (cloth), Irish figim (I weave);
Latin trans (across), Umbrian traf, Welsh tra;
Latin deses (lazy), Irish deid;

We descripted roughly the possible way of the Proto-Indo-European language to separate language groups. Certainly, many materials are to be found yet, and much correction is to be made to this structure of the historical and linguistic development of the family, but we believe that the general trend is quite right.

If you would like to see the graphical reflection of the theory given in this very article, please refer to the Indo-European Clickable Tree where the special table depicts the possible Indo-European history. Another source available for studying the matter is our Indo-European Chronology, ready from 4000 BC to 550 BC by now.