Example-2: Each vocabulary in the ontology should be linked with a “word sense” in a linguistic resource

Administrative Data
Recommendation:
Encouraged
Author:
Mustafa Jarrar

Description:

Each vocabulary in the ontology rooted into a word sense in a linguistic resource. If this vocabulary is specific or new (i.e. not found in linguistic resources) then it should be described in the same way linguistic resources are built, e.g. see guideline #1.

The idea of this guideline is to use lexical resources as a vocabulary space and consensus reference to root ontology definitions. For example, suppose you are building an ontology and you want to define a new concept called Book. If you access WordNet, you will find several discriminated word-meanings for the term Book: (G1021, A written work or composition that has been published, printed on pages bound together); (G1022, A record in which commercial accounts are recorded); (G1023, A collection of playing cards satisfying the rules of a card game), etc. Then, ask yourself, which of these concepts do you mean? Or that your concept is a special case of? Suppose you choose the concept G1021. Now, instead of defining your own new concept, you can link your concept (e.g. using URI, namespaces) to the concept G1021. If another ontology is built in the same way (all of its vocabularies are into WordNet), these ontologies will be easier to integrate and interoperate at least freely from language ambiguity and multilingualism.

The importance of using linguistic resources in this way lies in the fact that a linguistic resource renders the intended meaning of a linguistic term as it is commonly "agreed" among the community of its language. The set of concepts that a language lexicalizes through its set of word-forms is generally an agreed conceptualization. For example, when we use the English word 'Book', we actually refer to the set of implicit rules that are common to all English-speaking people for distinguishing 'Books' from other objects. Such implicit rules (i.e. meanings) are learned and agreed from the repeated use of word-forms and their referents. Usually, lexicographers and lexicon developers investigate the repeated use of a word-form (e.g. based on a comprehensive corpus) to determine its underlying concept(s).

Although linguistic resources do not represent absolute agreements on or correctness of meanings (if exist), but (from a methodological viewpoint) they do improve the quality of the ontological definitions. In other words, a way preventing ontology builders from imposing their personal viewpoints and usability perspectives at the ontology level is, by investigating and rooting the ontology concepts at the level of agreed community/human language conceptualization.

Some linguistic resources focus mainly on the morphological issues of terms, rather than categorizing and clearly describing their intended meanings. Depending on its description of term meaning(s), its accuracy, and conceptual structure (i.e. the discrimination of term meanings in a machine-referable manner, much as WordNet synsets), a lexical resource can play an important role in ontology engineering.

Remark: a new online version of WordNet (in RDFS format) can be found at the W3C- BestPractices website; also an RDFS/OWL version will be published at the official WordNet website (Princeton University). We recommend using URIs based on these versions.


Advantages:
  • Gain more consensus and quality. The consensus about the ontology vocabularies is gained and realized by rooting them into a linguistic resource, i.e. at the level of a community/human language conceptualization.
  • Enable compatibility and semantic interoperability with other neighbouring ontologies. The ontology will be easier to integrate and interoperate with the other ontologies that are built in the same way.

References:

Jarrar, M.: Towards the notion of gloss, and the adoption of linguistic resources in formal ontology engineering. In: Proceeding of the 15th International World Wide Web Conference, WWW2006. Edinburgh, Scotland. May 2006. ACM, 2006