The term "ontology" is one of those words that cause a lot of misunderstanding. It has been used for a long time in philosophy, where it refers to the study of what kinds of things exist, the study of first principles and the essence of things. Since most linguists have some philosophical background or knowledge, this is the definition which they are most familiar with.
If this were what we meant by "building an ontology of linguistics", then any attempt to formulate one would be a truly bad idea. Not only would it take a very long time indeed to do it properly, it's probably also beyond our abilities with our present state of linguistic knowledge. We would like to say immediately, then, that this is not what we are trying to do here.
Fortunately, the term ontology has a completely different meaning in information technology. An ontology here is essentially a formal statement of the relationship between terms, a working model of the entities and the interactions between those entities in some particular domain of knowledge. Its purpose is not to define meaning, but to allow computers to navigate human knowledge in a way that mimics intelligence. What it is, then, is not anywhere near as important as what it allows a computer to do. And two of the most useful things it does are that it allows a computer to respond usefully to linguistic queries, and to compare linguistic data in a way linguists understand.
To take a much over-simplified example, suppose a linguistic ontology had the following hierarchy:
Using this hierarchy, a machine could answer queries such as "What numbers does a language have?" "What languages have more than two numbers?" on any language data marked up according to the ontology. The machine does not need to know what terms like "singular" actually mean: it simply needs to know what kind of terms it must interpret as numbers.
Referencing a single ontology supports the long-term intelligibility of data, since the concepts in the ontology are precisely defined and generally understood. This function "render[s] disparate markup terminologies transparent" (GOLD Community Website). Furthermore, because an ontology can link linguistic data to meanings, it contributes to the development of the semantic web. This will enable searches for data on the web that will return and compare forms from a wide variety of languages.
In order for these functions to be effective, data must be defined in a standardized way and the mapping between a term and the concept it represents must be unique. This does not mean that a single ontology must be used by all linguists but, instead, that linguists should relate their preferred terminology to concepts defined in a standard ontology. When a standard ontology is used as a reference point, data becomes interoperable and searches across "disparate data sets" are enabled.
Descriptive profiles link specialized markup to a standard ontology. In a descriptive profile, terms used in the data source are mapped to terms in the ontology. This is called terminology mapping. In a terminological mapping:
- each term is represented only once
- each term is defined by its relationship to one of more concepts
- the terminology mapping document must uniquely identify the resource or resources that contain the concepts that each term references
General Ontology for Linguistic Description (GOLD)
A group of linguists have started the development of a morphosyntactic ontology that will provide such a resource for linguists. The developers of GOLD endeavor to represent all naturally occurring linguistic phenomena while capturing the structure of linguistic universals. In this way, the developers aim to "represent the cumulative knowledge of well trained and broadly experienced professional linguists, who know about both individual languages and universals." (Farrar & Langendoen, 2003) This morphosyntactic ontology is part of a larger effort to build a collection of domain-specific ontologies under the Suggested Upper Merged Ontology (SUMO).
In order to represent the cumulative knowledge of various linguists, the ontology group requested the input of the lingusitics community at the 2005 E-MELD workshop. At the workshop, participants were split up into six working groups, each representing a region of expertise. For example, one group was comprised primarily of experts in Australian languages. At the end of the workshop, each group presented an array of suggested modifications to the ontology, and the ontology group has since been at work implementing them.
FIELD allows users to input lexical data and exports an archival xml document that links to GOLD. FIELD provides a convenient interface for linguists to interact with language data in a fully searchable online database, which is flexible enough to accommodate the language data of different language families and typological configurations.
GOLD Community Site