Indexing for semantic cache to reduce query matching complexity

role in the complexity of semantic matching is the indexing scheme. Existing indexing schemes store each disjoint query (on the base of projection) into different segments and the number of segments can grow to exponential (2-1, where n is the number of queried attributes) in the worst case. This will lead to an exponential complexity semantic matching scheme. We propose a schema based on semantic indexing that enhances


INTRODUCTION
Faster information retrieval in a large distributed database system (DDBS) is a challenging task.Slower retrieval of data is the major problem in DDBS, especially when the network or server load is high (Chen & Roussopoulos, et al., 2009).The overall retrieval time can be reduced when the same data needs to be accessed frequently.A good caching near the response time, increase the throughput, and provide fault-tolerance (Chen & Roussopoulos, 1994;Dar et al., et al., 2009).
A cache system does not come without a cost and has management overheads.To keep the overheads at a minimum, one has to manage the cache system intelligently (Ahmed et al., 2005).There are many techniques to manage cache systems such as adaptive database caching (Cluet et al., 1999;Altinel et al., 2003), semantic caching (Ren et al., 2003;Ahmed et al., 2005;2008a;b;2009;2010;2012;Ahmad & Qadir, 2009), page caching and tuple caching (Ren et al., 2003).A caching technique that stores data as well as semantics of executed queries is referred to as a semantic cache.Semantic cache enhances the performance of conventional cache by answering partially overlapped queries (Dar et al Ahmad et al., 2008b).In semantic cache, the semantics of a newly posed query are matched with the stored semantics of already processed queries.On the basis of semantic matching, a user query is divided into two subqueries: probe (portion available at cache) and remainder (portion that is not available at cache) queries (Dar et al., March 2017 Journal of the National Science Foundation of Sri Lanka 45 (1) et al., 2003).Probe query is processed locally and remainder query is processed at the server side.There are two basic factors to evaluate the performance of a cache system; hit ratio (h r ), and query processing (Q pt ) time (Ahmad et al., 2008a;b).
Cache is called hit if data is found in it (Cai et al., 2005;Ahmad et al., 2008a) and hit ratio is the percentage of user posed queries that can be answered locally (partially or fully) from the cache.Therefore, the cache system should be designed in such a way that it increases the of stored data (Ahmad et al reuse of stored data, lesser amount of data is required to be retrieved from remote locations.In this context it system. Query trimming (Q tr ) and query rebuilding (Q rb ) are two major activities in query processing (Roussopouls, 1991).The query trimming process splits the user query into probe and remainder queries, whereas the query rebuilding process merges the results of probe and remainder queries.Query trimming process is further divided into two basic steps.
First, semantics of the newly posed query are matched with the stored semantics on cache, which is called query matching (Q m ).The query matching process helps out to enquire about the availability of data in hidden semantics out of the stored semantics in cache.ratio.Optimum query matching ensures optimum query trimming.The query matching process depends on how the semantics of already processed queries are indexed (S ind ).
In the second step of query trimming, the user posed query is divided into two sub-queries; probe and remainder.The division process of a posed query depends upon the splitting algorithm as well as on how trimming depends on the indexing scheme.
The focus of this paper is to optimise the query trimming process.This goal is achieved by proposing an indexing scheme and a query matching algorithm.A critical analysis of existing indexing schemes has been time.On the basis of the critical analysis, limitations of the existing schemes such as run time complexity and hit ratio have been highlighted.A schema based hierarchical semantic indexing scheme has been proposed that proved useful to overcome the limitations of the existing schemes.An algorithm for query matching (sMatch) is designed for SELECT and PROJECT queries by using the query splitting algorithm (Sun et al., 1989;Guo et al et al., 2010).Complexity analysis of the proposed algorithm and existing schemes has been done.On the basis of complexity analysis, it has been found that the proposed indexing scheme is instrumental in decreasing the query matching complexity from 0 (m x n x (2 n -1)) to 0 (m x n).sMatch has an ability to process Select* type queries and due to this the hit ratio is increased.Finally, a comparison was done with the existing well-known algorithms proposed by Ren et al. (2003) and Ahmad et al. (2010) on the basis of hit ratio and query processing time.From the comparison we conclude that the proposed algorithm reduces the query relational algebraic notations that are used in the paper.
: User query (Q U ) will be represented by : Given a database D = {R i } and its attributes set A = A Ri C will D, S A , S P , S R , S SA , C D is the name of the database, S R is the name of relation, S A is a set of attributes, S SA is a status of attributes, S P is the predicate (condition) on which data has been retrieved and cached, and C is the reference of contents.
: Given a user query D U and D C will be the retrieved rows in the execution of Q U and Q C , respectively.
: Given a user query Q U and cached query Q C , probe query (pq) will be Q U C and dataset against pq will be D U D C .
: Given a user query Q U and cached query Q C , remainder query (rq) will be Q U -Q C and dataset against rq will be (D U -D C ).
: Given a user predicate Q P and cached predicate S P , predicate implication (Q P P ) holds if and only if (Q P S P ) -Q P : Given a user predicate Q P and cached predicate S P , holds if and only if Q P S P != .
: Given a user predicate Q P and cached predicate S P , holds if and only if Q P -S P = Q P .
: Given a user query Q U and cached query ) holds if and only if Q A S A and Q P P .
: Given a user query Q U and cached query Q C , holds if and only if Q A S A and Q P S P != .
: Given a user query Q U and cached query Q C , holds either Q P S A or Q P S P = .
: Given a user query Q U and cached query Q C , common attributes (C A ) is a set of attributes which are common among user and cached queries and will be computed as : Given a user query Q U and cached query Q C ; difference attributes (D A ) is the set of attributes, which exists in user query but not in cached query and will be computed as Semantic caching has been extensively studied by researchers in both relational and XML databases ( .They claimed and discussed some scenarios to prove that previous schemes were unable to trim the query in an optimal time.They improved the runtime complexity with the help of query visualisation concept.Still, this scheme was expensive because its query matching process depended on the segment based indexing query trimming process that enhances semantic caching et al., 2000).This scheme was also expensive due to its dependency on segment based indexing scheme.A 3-level indexing scheme (Sumalatha et al., 2007a;b) was proposed to overcome the limitation of the segment based scheme.It improved the query matching process but query trimming was not clear in that scheme.Bashir and Qadir query matching.Although 4-HiSIS is a better approach 4-HiSIS that covers the query trimming process.A scheme based on content matching was presented by Bashir et al 4-HiSIS to split the query into probe and remainder queries (Bashir & Qadir, 2007).This work only coverd simple predicates and failed to split complex (having conjunct operators) queries.4-HiSIS was merged with Still this scheme was not appropriate to trim the query because it was designed for single relation and was only applicable for single predicate.To improve this, semantic matching process has been enhanced by presenting the graph based semantic indexing scheme (Ahmad et al., 2010).This enhanced scheme achieved the required None of the previous schemes in literature was able to match the semantic of this query with the stored semantics.From the above we conclude that there is a the required hit ratio and perform semantic matching semantic indexing that is able to achieve both goals.

METHODOLOGY
query matching.Therefore, to make the query processing strategies used for query matching, graph based semantic indexing (Ahmad et al., 2010) scheme is the most or not the required attributes are available at cache.After behaves like a segment based scheme for building probe and remainder queries, which is costly and time consuming.
Schema is required to be stored in semantic cache to process a query (Ahmad et al., 2008b;2009).Therefore, we have merged the query's semantics with schema.The main advantage of this amalgamation is preserving However, in previous studies (Ahmad et al., 2008b;2009) semantics and schema were stored separately in cache, which demands extra memory.In this scheme semantics are associated with stored schema (semantic enabled schema).This is called a schema based hierarchical indexing scheme for semantic cache.Initially, the schema for each database is stored in cache with database names, relation names and attribute names with false status in the form of a tree as given in Table 1.Due to keeping the schema of database its space complexity will be An example of library database is given in Table 2. Now suppose that a user enters a query as given below:

DB name
SELECT Author, Title FROM Books WHERE After execution of the above query the retrieved contents will be stored (assume the contents will be stored with name 1; just like materialised view) and the semantics will be updated as given in Table 3. Author and Title across books are posed to retrieve in the query so their status will be changed to true.Condition and content reference will also be updated accordingly.After managing semantics, the next step is to use the database names will be matched exactly; secondly if database name is matched then the relation names will be matched; otherwise processing will be stopped.After matching is given in Figure 1.

DB Name
Due to the schema based hierarchical semantic indexing scheme we are able to perform query matching in linear fashion and to handle the SELECT * type queries.
There is a simple driver algorithm (sMatch) to perform the query matching as given in Figure 2.
exactly matching the relation name, attribute names are matched.When it is found that attributes are part of the schema, then their status is checked.In case of true status of attributes, data across particular attributes will be available; otherwise it will be retrieved from the server.If status of the attribute is also true then condition across a particular row is matched.Finally probe query is generated with the generated condition from the referenced content.Hierarchical schema based semantic

MATHEMATICAL PROOF AND RESULTS
A comparison of the proposed sMatch with previous studies by Ren et al. (2003) and Ahmad et al. (2010) was done.The comparison is given in different aspects; run time complexity, hit ratio, and handling of incorrect and SELECT * of the proposed semantic indexing scheme with the segment based semantic indexing scheme.We calculated the complexity of both as follows.
Theorem1: For a relation R having attribute set A = Ai , where ; query matching complexity for segment m × n × 2 n -1 m' is the number n' is the number of total attributes in a relation.

Proof: (Constructive)
Segment based query matching scheme depends on the number of segments and the number of attributes in each segment.The number of segments in cache depends on the number of attributes in a relation.Possible n' attributes are proved in lemma 1.1.
: For a relation R having attribute set A = Ai, where , the maximum number of segments is 2 n -1.

Proof: Number of attributes in a relation R = n
Number of subsets, P, for n attributes can be computed as follows; Number of disjoint queries (D Qu ) on relation will be equal to the subsets except the empty set.

|D Qu |= P(n)-
There will be only one empty subset in P(n).By replacing values in equation 2 from equation 1 we get; As we know (Ren et al., 2003)  Proof: Finding a single attribute Ai over a number of segments |S| required to visit each and every attribute in each segment.
For a single attribute in a relation, there can be only one segment (according to lemma 1.1) and the number of comparisons (NoC n ) required for query matching will also be one, which can be computed as follows: For two attributes, there will be three segments (according to lemma 1.1).The number of comparisons (NoC n ) required for query matching in this case will be four and can be computed as, For three attributes, there will be three segments and the number of comparisons (NoC n ) required for query matching is 12 and can be computed as follows: For four attributes, there will be three segments and the numbers of comparisons (NoC n ) required for query matching is 32 and can be computed as follows: can be computed as follows: = 2 5 -1 + 2 0 (2 5-0-1 -1) + 2 1 (2 5-1-1 -1 + 2 2 (2 5-2-1 -1) +2 3 (2 5-3-1 -1) n' attributes the number of comparisons (NoC n ) can be computed as follows: We can write it as By using geometric series we know that Hence, the worst case complexity of the segment based scheme is exponential while the worst case complexity of the schema based scheme will be polynomial, which can be plotted as given in Figure 3.
Similarly, best case complexity analysis of the schema based and segment based scheme is given in Figure 4. We assumed that each comparison will take one millisecond to compute.In Figure 5, the comparison between segment based (Ren et al., 2003), graph based (Ahmad et al., 2010) and

Best case complexity analysis
In this comparison we are assuming that all of the data queries), which indicates that the user needs all of the attributes.As soon as we increase this type of queries, the hit ratio for graph based and segment based will be (segment and graph based) techniques are unable to type of queries by using schema.Therefore, the hit ratio (Ren et al., 2003), graph based (Ahmad et al., 2010) and user is going to pose incorrect queries.As soon as we increase the number of incorrect queries, the computing time for graph based and segment based schemes will be techniques are unable to reject incorrect queries.In level.Due to this, the proposed techniques perform better.
In Figure 7, space complexity for segment based (Ren et al., 2003), graph based (Ahmad et al., 2010) and earlier, space complexity will be higher than the previous n n' distinct queries have been processed and their semantics are cached in semantic cache.

CONCLUSION
of the caching system.One of the major activities of query runtime complexity from exponential to polynomial.
improve hit ratio, because the graph based scheme was SELECT * If a user pose a new query as given below; SELECT * FROM Books WHERE March 2017 Journal of the National Science Foundation of Sri Lanka 45(1) attributes in relationTime comparison on the base of incorrect queries of the National Science Foundation of Sri Lanka 45(1) Ren et al., 2003;Cai et al., 2005;, 2003)al., 2002;Sumalatha et al., 2007b;2007c;Sanaullah et al., 2008).An effort has been made to build a semantic caching system on the basis of description logic(Tariq et al., 2010)but failed to answer the overlapped queries locally.The related work is discussed on the basis of semantic indexing and query (SELECT and PROJECT) trimming in relational data semantic cache.techniqueswereproposedbyDaretalstructurebasedquerytrimming process proved to be expensive (Roussopouls, 1991;Ahmad et al., 2009)in terms of runtime complexity due to no indexing strategy.number of queries stored in it are a few tens.To overcome this limitation the cache was organised into chunks(Deshpande et al., 1998;Ren et al., 2003)and segments Rockey, 2010).Query trimming process improved up to some extent due to indexing the semantics in the form of segments(Ahmad et al., 2008a; 2009).In the presence of segments, query matching (basic step of query trimming) still has high runtime complexity(Ahmad et al., 2009)due to the large number of possible segments (there can 2 n -1 attributes).Segment based indexing scheme is used in 1997;Ren et al., 2003;Cai et al., 2005; Jonsson et al.,  et al  & Rockey, 2010).All of these have higher runtime complexity due to the large number of possible segments.

Table name
that the number of segments |S| on cache will be equal to the number of disjoint queries.i.e.For a relation R having attribute set A = Ai, where , n x 2 n-1 number of comparisons (NoC n ) Ai over a number of segments |S|.
(1)ber of segments |S| in worst case are 2 n -1 as proved in lemma 1.2.Working algorithm sMatch March 2017Journal of the National Science Foundation of Sri Lanka 45(1):