Start Submission Become a Reviewer

Reading: Chi-square based hierarchical agglomerative clustering for web sessionization

Download

A- A+
Alt. Display

Research Articles

Chi-square based hierarchical agglomerative clustering for web sessionization

Authors:

Tasawar Hussain ,

PK
About Tasawar
Department of Computer Science, Capital University of Science and Technology, Islamabad, Pakistan.
X close

Sohail Asghar

PK
About Sohail
Department of Computer Science, COMSATS Institute of Information Technology, Islamabad, Pakistan
X close

Abstract

Clustering is one of the fundamental techniques to organise similar objects into proper groups based on features in the domain of data mining, machine learning and pattern recognition. In each cluster, objects are more similar to each other on the basis of particular features. Clustering has numerous applications in multiple domains such as information retrieval, data mining, machine learning, pattern recognition, mathematics, medical and bioinformatics. Web centric applications are expanding day by day and the web has become one of the largest data repositories. During the last decade, information and knowledge retrieval from the web has become a challenging research area. Similarity computation among the data objects (web sessions) is complex, however is a significant problem in unsupervised learning. This research is an attempt to overcome these challenges and problems. The objective of this research paper is to introduce a chi-square based similarity measure to compute the similarity among the sessions. A chi-square based approach is being applied to compute the statistically significant relationship between observed and expected frequencies of the number of pages visited and the time consumed by a user during a session. Moreover, a chi-square based hierarchical agglomerative clustering (Chi-HAC) technique is proposed to extract useful knowledge from web log. The Chi-HAC helps to improve the visualisation of web logs and is equally important for website designers, developers and owners for the improvements of websites at each level. Experimental results with two different log files reveal that the proposed similarity measure with Chi-HAC algorithm has significantly improved the computation among data objects in web sessions.

How to Cite: Hussain, T. & Asghar, S., (2016). Chi-square based hierarchical agglomerative clustering for web sessionization. Journal of the National Science Foundation of Sri Lanka. 44(2), pp.211–222. DOI: http://doi.org/10.4038/jnsfsr.v44i2.8002
Published on 30 Jun 2016.
Peer Reviewed

Downloads

  • PDF (EN)

    comments powered by Disqus