Scalable Reasoning on Very Large Data Sets

Real semantic web applications manage large amount of metadata which are expressed as ontology instances. The description logic reasoners are not yet ready to manage large number of ontology instances, due to the instances are not stored in a persistent way, and because the reasoning is evaluated in the main memory. Our goal is to implement a persistent and scalable reasoner for OWL ontologies. Our work includes the design and implementation of a repository to store the ontologies and their instances, the implementation of reasoning algorithms, the definition and implementation of query languages for the ontology and also the definition of mechanisms to optimize these queries and reasonings.

Middleware based on ontologies

Our research is based on finding solutions to allow us to develop software applications based on semantics. In this sense, it is very important to produce the components. As a result, we produced a middleware which is based on a set of components known as SD-Core (the Semantic Directory Core). The Semantic Directories offer the minimun elements for developing applications in the Semantic Web. SD-Core has been extended in two ways:

  • Adding specific methods for several applications. SD-Data is an extension developed to manage resources which generate data. This extension adds specific methods for dealing with these kinds of resources.
  • Adding new interfaces to the SD-Core
    1. KOMF (the Khaos Ontology-based Mediation Framework) is a framework that incorporates interfaces to allow query planning and resolution queries using a mediator based on ontologies.
    2. OMAF (the Ontology Matching and Alignment Framework) is a framework thar incorporates interfaces to facilitate comunication between ontologies.

Discovery of semantic correspondences between ontologies

Ontology alignment consists of discovering the set of semantic correspondences (mappings) between the entities from two or more ontologies which belong to a same domain but that have been developed separately.

In our work we focus on the development of techniques and tools for supporting the development of semantic correspondences between ontologies, to do so, we have developed MaF, a Framework for statistical matching that allows user to compose an weight basic matching algorithms. This framework also offer functions to develop and validate new matching algorithms. Moreover, we investigate the use of traditional techniques to obtain correspondences and also the use of new techniques such as those that exploit implicit web knowledge in order to take decisions on the quality of discovered correspondences.

Semantic Web Services Composition. Semantic ESB

Nowadays, researchers in the Semantic Web field try to solve the problem of data integration, to do so, these researchers propose new languages to model the information to be represented (RDF, RDF Schema, OWL, etc). However, some researchers are working on the problem of application integration. Consortiums suchs as NESSI, EIC, OASIS Semantic Execution Environment TC and SWS are leaders in the development of Semantic Web Services where the concept of Service-Oriented Semantic Architectures (SESA). The main goal is to automatically support the lifecycle and the creation of SESAs where all operations related to services are allowed: discover, selection, composition, mediaton, invocation and ejecution. The main objetive is to obtain an implementation for this platform as the basis for the development of the logic for the SWS lifecycle. The last goal is to hide the semantic layer from developers to facilitate their work.

Semantic Content Recommendation

Traditional Recommeder Systems are based on collaborative filtering, thus, on the user interaction with the articles to be recommended. These systems do not take into account user profiles nor content characteristics, but they recommend items that have been choosen by a group of consumers. The content recommendation is based on the characteristics of the users and the content of the articles. This line of research uses semantic ontologies by defining ontologies for user profiles and contents. These user profiles are enriched with reasoning about user behaviour. Recommender systems can be mixed with collaborative filters, creating hybrid systems that shows satisfactory results for a lot of scenarios.

Semantic extension of databases

Within this line of research we wish automatically annotate databases. A great effort on researching about the notion of Semantic Web and its associated technologies has been made. In this sense, one of the most active lines is the annotation of web pages where these pages are annotated using a domain ontology. This process generates some useful metadata in the form of instances. Theses instances can be loaded with a reasoner and queried using the reasoning capabilites, then results can be linked with data from the web page. Our goal is to study how we can semantically annotate databases. To do that, we focus on the generation and management of annotations and how to link them with the data from the database.

Aplications: Systems Biology

Systems Biology is an area of scientific research concerned with the study of biological processes using a systemic approach. It uses the modelling (the use of mathematical models that describe the behavior of something). The models allow the behavior of the process to be predicted like a dynamic system, treated generally as a complex network. Moreover, it is necessary to approach the problem of the integration of existing information.  These sources of information are growing exponentially with the appearance of analysis tools which has given rise to theories. Consequently, Systems Biology is the archetypal domain application that makes an intensive use of data and knowledge. Within this line of investigation several applications have been developed that use the knowledge and systems developed in the lines of basic research, to solve specific problems facing by the researchers working in this area. The developed applications are:

  • BioBroker. This tool integrates different biological databases using XML las a model of data interchange.

  • SB-KOM. Mediator based on using KOMF as the framework and using optimized scheduling algorithms for the biological data bases. 

  • ASP 3D Model Finder. Application that allows looking for the three-dimensional protein structure in the metabolism of Amines, using computer generated methods this structure is generated using existing information.

  • SBMM-Assistant. This assistant allows recovering information about metabolic routes (both the route and information of its components). In addition, it allows the edition of these routes by the users.

Applications: Cultural Heritage and Tourism

In this line, we have developed techniques to support the management and diffusion of cultural heritage and tourism. We are interested in the creation of new technologies in a social context to facilitate the creation of contents on the basis of user profiles. In this sense, we have worked on the development of GeoTrip: An online GIS that allows mobile device users to select set of points of interest in real time, selecting this set on the basis of geographic position, recommendations carried out by other users who appear to have the same behaviour patterns and the search based on the metadata associated. The set of the points of interest returned for each query, can be converted into routes, so that the mobile devices can become interactive guides at the disposal of the users who, for example, visit a city. In addition, the system allows the users to associate to each point of interest complementary multimedia information.

Semantic Web for E-Science

Given the complexity of studying Biology Systems, it is necessary to reduce the number of equations and variables in the model as much as possible, but without losing any information. Nowadays, these research lines are opened:

  1. Systems Modelling. Application to Systems Biology
    1. Reduction of Models
      Several techniques to reduce mathematical models exist. The most outstanding are: lumping, sensibility analysis and timescale analysis.
    2. Structural Kinetic Modeling
      Technique which is used to obtain relevant information about the variables which describe the mathematical method for saturation ranges and viability study.
  2. Semantic Modeling. This technique consists of obtaining relevant information about the parameters from the model, introducing ontologies. Application to Systems Biology introducing Biologic Ontologies, for example SBO (System Biology Ontology).
  3. Semantic Web: Semantic Web Technologies and application development. Application to Systems Biology.

Multi-objective optimization

Many real-world optimization problems are multi-objective in nature, which means that they are composed of a number of functions or objectives that have to be maximized/minimized at the same time. These problems are frequent in engineering, economics, etc. Currently, one of the most used approaches to deal with this kind of problems is the use of metaheuristics, which are non-exact optimization techniques that do not guarantee to find optimal solutions but can produce high-quality solutions in a reasonable amount of time.
The open research lines we are working with are:

  1. Design of new algorithms, aimed at obtaining more efficient methods to solve complex problems.
  2. Applications to bioinformatics problems.
  3. Applications to civil engineering problems

Big Data Analytics

Management, integration and analysis of data in Life Sciences is one of the main research topics of Khaos Research (University of Málaga, Spain). Our experience includes Large Amounts of Data Reasoning, Middleware based on ontologies, semantic relationship discovery, Semantic Web Services, content recommendation based on semantics, optimization algorithms, semantic extension of databases and Semantic Web for E-Science. Nowadays our group is developing three research topics (management, integration and analysis of data) in line with technologies than can be applied on them:

• Data Management (Relational Databases, NoSQL Databases, Linked Data)

• Data Integration (Linked Data)

• Data Analysis (Big Data Analytics, Data Mining, Text Mining, Information Retrieval and Optimization Algorithms)

These research topics are being also carried out to be applied on Big Data (large amounts of data, data streams, heterogeneous data). In this context we are applying the results of this research to combine management, integration and analysis of Big Data to Health Applications, Economy and Tourism.

We are currently participating in the Bioledge EU project (FP7) in Protein Production, and we have participated in a number of national and regional project in the application of data management, integration and analysis in Life Sciences. And we are starting also a SME project on Big Data Analytics on credit card transactions for fraud detection.