IR.WoT - Information Retrieval for the WoT

IR.WoT Architecture

Our approach, named IR.WoT is an Information Retrieval system for the Web of Things. In the figure, we map the IR.WoT components to the IR data flow in the modular architecture of [Tran et al, 2017], https://doi.org/10.1145/3092695. Where a generic Web of Things Search Engine (WoTSE) has been referenced as a comparison framework to guide future IR developments. We have cross-correlated our IR.WoT as a service architecture with the modular counterpart. The former provides a functional cloud-based resource perspective and the latter provides a data flow-oriented analysis. In the following sections, we describe each functional modular part of IR.WoT and we present an overview of internal cloud architecture.

Query UI Module Description

Search for Things?. Input text box for capturing the CO Content-Only user query. You can enter it in natural language as in traditional search engines.
Spatial Restrictions. By default query is constraint to "Here". So the results displayed are scored and organised to rank highest the WoT entities around you about 10 meters or less.
Temporal Restrictions. By default query is constraint to "Now". So the results displayed are scored and organised to rank highest the WoT entities with recent actions or events happended about 10 minutes or less.
Entity Restrictions. You can filter the results tuning the query to search between all the entities or only-things or only-sensors or only-spaces.
Property Restrictions. Advanced options to create CAS Content-and-Structure user queries. So results are ranked higher for WoT entities with the input provided for properties.
Action Restrictions. Advanced options to create CAS user queries. So results are ranked higher for WoT entities that exhibit the actions or services provided.
Event Restrictions. Advanced options to create CAS user queries. So results are ranked higher for WoT entities with historical events as provided.

Interpreter Module Description

The NEXI (Narrowed Extended XPath I) query language developed by the INEX community (Trotman & Sigurbjörnsson, Narrowed Extended XPath I (NEXI), 2005), (Trotman & Sigurbj_rnsson, NEXI, now and next, 2004) it is based on XPath expressions to access and navigate within the components and elements of the IR Test XML-based document collection. Because exact containment of elements may be less critical in IR applications, NEXI only supports descendant or auto ("//") notation for routes. To specify classified recovery, NEXI replaces the contains function with about.

In the framework of the IR.WoT, user-facing fields and restrictions are captured through the Query UI Module, which are then translated from natural language to NEXI language queries. Conventionally, translation can be done using various methods given the structure and content of the XML documents in the collection in order to build information retrieval type queries for XML. For the proposed model, the following types of NEXI queries are available:

Simple Queries of the form //A[B]. Example NEXI = //*[about(.,co_query)]
Compound Queries of the form //A[B]//C[D]. Example NEXI = //*[about(.,co_query) and .//event/eventTime <= 1hour]

Dynamic Indexer Module Description

Our Dynamic Index proposal is to analyse the combination of three data structures with the application of three combined XML dynamic strategies. Data structures: the dictionary that makes up the inverted index can be stored in a hash table or a similar structure, and the list of publications for each term t can be stored in a fixed-length array structure or a similar structure too.

Hash Map & Linked Lists
Hash Map & B+ Tree
Hash Map & Black-Red Tree

Strategies for Dynamic XML Indexing maintenance:

Element-based: Highly redundant strategy which allows retrieval at any level, indexing all the elements trade-off with space complexity of the Index. Nested elements and ief calculations are grey zones.
Leaf-Only: Non-redundant strategy which allows the retrieval at only-leaf level trade-off with the time complexity of relevance calculation from leaf element plus upward efficient propagation of the score.
Aggregation-based: Indexing only-leaf elements using an aggregated representation trade-off with the element’s degree of influence.
Selective: Indexing elements with a number of words above a given threshold, or of a type, or disjoints fragments trade-off with strategy combination to calculate term statistics.
Distributed: Index for each element type with separate statistics, trade-off with space complexity and parallelism at retrieval time.
Structure: structure/terms pairs statistics to capture significance of the structure.
Map-Reduce: large-scale distributed data processing framework and programming model used to speed up and manage the index creation and maintenance in major current search engines.
Compressed: For efficiently store and transfer data of dictionary and the index scheme itself.

IR.WoT Architecture

Query UI Module Description

Interpreter Module Description

Dynamic Indexer Module Description

IR Modeler & Ranker Module Description

Doctoral Thesis

Get in touch