Language Classification Working Group

 

Marianne Mithun

 

I appreciated the paper by Gary Simons and Peter Constable, Language Identification and IT: Addressing Problems of Linguistic Diversity on a Global Scale, and find that it raises just the kinds of issues that had occurred to me.

 

Language codes

The Ethnologue system looks optimal to me. It is based on the most common criterion for delimiting languages: mutual intellgibility. We know that this is not a simple matter, but it is what we have. Languages known by different names in different areas are nevertheless represented by a single code. The system is exhaustive and expandable: there is room for many more distinct languages, both living and gone, than we will ever recognize, even if some codes are retired when alternate divisions are made. I myself appreciate the mnemonic character of most of the existing Ethnologue codes. We all realize that not all codes can be perfectly mnemonic, simply because the names of many languages sound alike and single languages are often known by multiple names. What is the most mnemonic for one user may not be the same as that for the next. But the principle of making the codes as mnemonic as possible has considerable value. It is much like airport names: it is much easier to learn and remember LAX for Los Angeles and SFO for San Francisco, even if they are not perfect, than something like XQR and MBT. This user-friendliness should lead to wider acceptance and result in fewer mistakes.

 

Considerable work has already gone into establishing the Ethnolgue codes. It would be silly for others to try to do it all over again. The team responsible consists of good linguists dedicated to representing current thinking among specialists. Furthermore, this team is in place to maintain the system, continually updating it as more is known.

 

Coding schemes for language classification

The kind of genetic information associated with languages in the Ethnologue lists, that is, family and subgroups, as well as alternate names, I find optimal and necessary. It would be a mistake to include this as part of the language code itself, of course, because any change in subgrouping would necessitate a change in language code. I was at first intrigued by the LINGUIST coding scheme: it is good to see the hierarchical information and layering, and important to be able to pull together information from subgroups. I see two problems. I worry that the complete arbitrariness of the alphabetic labels beyond the family (ATAACAB) will keep them from being used and will result in errors of interpretation. A more serious problem is the principle of naming the subgroups alphabetically. Every time a new subgroup is recognized, the subgrouping codes for all languages not only in the new subgroups but also for all of those in subgroups whose names occur later in the alphabet will have to be altered.

 

Additional information

The locales in which languages are spoken, and the estimated numbers of speakers furnished in the The Ethnologue is important and should be accessible. This would be sufficient information for those interested in areal traits, without introducing and forcing premature conclusions about areal influences. I myself think that classifying languages by typological similarities at this stage in our knowledge would be a serious mistake. Most typologies currently oversimplify grossly (verb-initial or verb-final, head- or dependent-marking, nominative/accusative or ergative/absolutive, pro-drop ...), and such a specification would set those oversimplifications in stone.