The OntoDM ontology
The domain of data mining (DM) deals with analyzing different types of data. The data typically used in data mining is in the format of a single table, with primitive datatypes as attributes. However, structured (complex) data, such as graphs, sequences, networks, text, image, multimedia and relational data, are receiving an increasing amount of interest in data mining. A major challenge is to treat and represent the mining of different types of structured data in a uniform fashion.
A theoretical framework that unifies different data mining tasks, on different types of data can help to formalize the knowledge about the domain and provide a base for future research, unification and standardization. Next, automation and overall support of the Knowledge Discovery in Databases (KDD) process is also an important challenge in the domain of data mining. A formalization of the domain of data mining is a solution that addresses these challenges. It can directly support the development of a general framework for data mining, support the representation of the process of mining structured data, and allow the representation of the complete process of knowledge discovery.
We propose a reference modular ontology for the domain of data mining OntoDM, directly motivated by the need for formalization of the data mining domain. The OntoDM ontology is designed and implemented by following ontology best practices and design principles. Its distinguishing feature is that it uses Basic Formal Ontology (BFO) as an upper-level ontology and a template, a set of formally defined relations from Relational Ontology (RO) and other state-of-the-art ontologies, and reuses classes and relations from the Ontology of Biomedical Investigations (OBI), the Information Artifact Ontology (IAO), and the Software Ontology (SWO). This will ensure compatibility and connections with other ontologies and allow cross-domain reasoning capabilities.
The OntoDM ontology is composed of three sub-ontologies covering different aspects of data mining:
- Ontology of Core Data Mining Entities (OntoDM-core), that formalizes the key data mining entities for representing the mining of structured data in the context of a general framework for data mining; and
- Ontology of Data Mining Investigations (OntoDM-KDD), that formalizes the knowledge discovery process based on the Cross Industry Standard Process for Data Mining (CRISP-DM) process model.
- OntoDT: Panče Panov, Larisa Soldatova, Sašo Džeroski. Generic Ontology of Datatypes. Information Sciences 329 (2016) 900–920
- OntoDM-KDD: Panče Panov, Larisa Soldatova, Sašo Džeroski. OntoDM-KDD: Ontology for Representing the Knowledge Discovery Process. Discovery Science 2013, Lecture Notes in Computer Science Volume 8140, pp 126-140, 2013
- Doctoral Thesis: Panče Panov. A Modular Ontology of Data Mining. Doctoral Thesis. Jožef Stefan International Postgraduate School. 2012
- Panče Panov, Larisa Soldatova and Sašo Džeroski. Representing Entities in the OntoDM Data Mining Ontology . In Sašo Džeroski, Bart Goethals and Panče Panov (Eds.) Inductive Databases and Constraint-Based Data Mining., pg.27-55., 2010, Springer Download