OntoDT - Ontology of Datatypes

In the context of the development of the ontology of data mining that needs to be general enough to allow the representation of mining structured data, we developed a separate ontology module, named OntoDT, for representing the knowledge about datatypes. In the preliminary OntoDT development phase, the classes used to represent datatypes were integrated in OntoDM-core. For generality and reuse purposes, however, we later exported datatype specific OntoDM classes in a separate ontology module - OntoDT. This ontology can now be reused independently by any other ontology that requires a representation of and reasoning about general purpose datatypes.

Background

The content of the OntoDT ontology module is based on an ISO standard for the representation of datatypes in computer systems and programming languages. The first edition of the standard, named `Language independent datatypes', was published in 1996. The revised version, named `General-Purpose Datatypes', was published in 2007. The standard specifies both primitive datatypes, defined without a reference to other datatypes, and non-primitive datatypes, defined in terms of other datatypes, that occur commonly in programming languages. The definitions of datatypes are independent of any particular programming language or implementation. Furthermore, Meek (1994) discusses a proposal for a taxonomy of datatypes using as a base the first version of the ISO standard. His taxonomy follows the practice of starting with a number of primitive datatypes and using these to construct others. The proposed taxonomy is given only in the form of an overview and a discussion of how things are done, without any formal representation (in a machine processable language) that can be reused further.

Content

The OntoDT ontology defines:

  • datatype characterizing operation and a taxonomy of datatype characterizing operations,
  • datatype quality and a taxonomy of datatype qualities
  • a datatype taxonomy comprising of classes and instances of
    • primitive datatypes
    • generated datatypes (non-aggregate and aggregated datatypes),
    • subtypes, and
    • defined datatypes.

Datatype and value space

In the OntoDT ontology, the datatype class is modeled as a subclass of the OBI: data representational model class. It defines the type of data, with the set of distinct values that the data can take, the properties of those values, and the operations on those values. The datatype class is represented with the has-member relation to the value space specification class and the has-operation relation to the characterizing operation class. In addition, OntoDT models datatype properties as subclasses of the quality class and connects them using the has-quality relation.

The value space specification class is modeled in OntoDT as a subclass of the OntoDM: specification entity class. It specifies the collection of values for a given datatype. The value space of a given datatype can be defined in different ways: by enumerating the values; with axioms using a set of fundamental notions; as a subset of values defined in another value space with a given set of properties; or as a combination of arbitrary values from some other defined value space by specifying a construction procedure.

Characterizing operations

A characterizing operation is defined as IAO: directive information entity that specifies those operations on the datatype that distinguish it from other datatypes having identical value spaces. The characterizing operation of a datatype can be: niliadic, monadic, dyadic and n-adic.

  • A niliadic operation specifies an operation that yields values of a given datatype.
  • A monadic operation specifies an operation that maps a value of a given datatype into a value of the given datatype, or into a value of the boolean datatype.
  • A dyadic operation specifies an operation that maps a pair of values of a given datatype into a value of the given datatype, or into a value of the boolean datatype.
  • An n-adic operation specifies an operation that maps an ordered n-tuple of values (n>2), each of which is of a specific datatype, into values of a given datatype.

Finally, all characterizing operation classes have defined subclasses, which represent datatype specific operations.

Datatype properties

A datatype property is defined as a quality that specifies the intrinsic properties of the data units represented by the datatype, regardless of the properties of their representations in computer systems. Each datatype has a set of unique datatype properties. These include property classes such as: order, numericalness, cardinality, exactness ,equality , and boundedness.

  • Order is a datatype property that denotes whether there exists an order relation defined on its value space.
  • Numericalness denotes whether the values in the value space are quantities expressed in a mathematical numbering system.
  • Cardinality denotes the notion of cardinality of the value space.
  • Exactness denotes whether every value from the value space is distinguishable from every other value in the value space.
  • Boundedness is a property that denotes the boundaries of the value space.

All datatype property classes have defined subclasses. For example, the boundedness class has the following subclasses: bounded (bounded below, bounded above) and unbounded (unbounded below, unbounded above).

Extended datatype

In OntoDT, an extended datatype (named 'subtype' in the ISO standard) is defined as a IAO: data representational model that is derived from an existing datatype by restricting the value space to a subset of the base datatype, while maintaining all operations. The base type denotes the role of a datatype as a parametric datatype on which a generator operates to produce a new datatype. An extended datatype is defined by a subtype generator that represents the relationship between the value spaces of the base type and the extended datatype.

In OntoDT, we define the following classes of subtype generators: range generator, selection generator, exclusion generator, size generator, extension generator, and explicit subtype generator. Subtype generators can change the set of datatype properties valid for the base datatype, and this is the reason we do not represent them simply as subclasses of the datatype class. For example, applying the range generator to an unbound datatype will make it bounded.

Using these notions, we can represent an extended datatype of any previously defined type. For example, by using a range subtype generator we can place a new upper and/or lower bound on the value space of a chosen base datatype. The positive integer datatype is a extended datatype of the integer datatype obtained by limiting the value space with a lower bound of zero.

Using these notions, we can represent an extended datatype of any previously defined type. For example, by using a range subtype generator we can place a new upper and/or lower bound on the value space of a chosen base datatype. The positive integer datatype is a extended datatype of the integer datatype obtained by limiting the value space with a lower bound of zero.

Versions and Download

Release version 1

Publications

OntoDT@Bioportal


QR Code
QR Code OntoDT - Ontology of Datatypes (generated for current page)