Generalization

We take generalization to denote the outcome of a data mining task. In OntoDM-core, we consider and model three different aspects of generalizations, each aligned with a different description layer:

  • the specification of a generalization,
  • a generalization as a realizable entity, and
  • the process of executing a generalization.

Many different types of generalizations have been considered in the data mining literature. The most fundamental types of generalizations, as proposed by Dzeroski (2006) are in line with the data mining tasks. These include clusterings, patterns, probability distributions, and predictive models.

Generalization specification

In OntoDM-core, the generalization specification class is a subclass of the OBI class data representational model. It specifies the type of the generalization and includes as part the data specification for the data used to produce the generalization, and the generalization language, for the language in which the generalization is expressed. Examples of generalization language formalisms for the case of a predictive model include the languages of: trees, rules, Bayesian networks, graphical models, neural networks, etc.

As in the case of datasets and data mining tasks, we can construct a taxonomy of generalizations. In OntoDM-core, at the first level, we distinguish between a single generalization specification and an ensemble specification. Ensembles of generalizations have as parts single generalizations. We can further extend this taxonomy by taking into account the data mining task and the generalization language.

Dual nature of generalizations

Generalizations have a dual nature. They can be treated as data structures and as such represented, stored and manipulated. On the other hand, they act as functions and are executed, taking as input data examples and giving as output the result of applying the function to a data example. In OntoDM-core, we define a generalization as a sub-class of the BFO class realizable entity. It is an output from a data mining algorithm execution.

The dual nature of generalizations in OntoDM-core is represented with two classes that belong to two different description layers: generalization representation, which is a sub-class of information content entity and belongs to the specification layer, and generalization execution, which is a subclass of planned process and belongs to the application layer.

A generalization representation is a sub-class of the IAO class information content entity. It represents a formalized description of the generalization, for instance in the form of a formula or text. For example, the output of a decision tree algorithm execution in any data mining software usually includes a textual representation of the generated decision tree. A generalization execution is a sub-class of the OBI class planned process that has as input a dataset and has as output another dataset. The output dataset is a result of applying the generalization to the examples from the input dataset.


QR Code
QR Code Generalization (generated for current page)