Development of Document Clustering Technique for Gurmukhi Script using Fuzzy Term Weight
Mukesh Kumar1, Amandeep Verma2
1Mukesh Kumar, PG Dept. of Computer Science, Mata Gujri College, Fatehgarh Sahib, Punjab, India.
2Amandeep Verma, Punjabi University Regional Centre for Information Technology & Management, Mohali, Punjab, India.
Manuscript received on 03 March 2019 | Revised Manuscript received on 08 March 2019 | Manuscript published on 30 July 2019 | PP: 1646-1653 | Volume-8 Issue-2, July 2019 | Retrieval Number: B2386078219/19©BEIESP | DOI: 10.35940/ijrte.B2386.078219
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Document clustering is an unsupervised machine learning technique which designates the creation of classes of a certain number of similar objects without prior knowledge of data-sets. These classes of similar objects are known as clusters; each cluster consists unlabeled data objects in such a way that data objects within the same cluster have maximum similarity and have dissimilarity to the data objects of other groups. The purpose of this research work is to develop domain independent Gurmukhi script clustering technique. It is the first ever effort as no prior work has been done to develop domain independent clustering technique for Gurmukhi script. In this paper, a hybrid algorithm for the development of document clustering technique for Gurmukhi script has been developed. The experimental results of proposed document clustering technique reveal that the proposed hybrid technique performs better in terms of defining number of clusters, creation of meaningful cluster titles, and in terms of performance regarding assignment of real time unlabeled data sets to the relevant cluster as a result of various pre-processing steps like segmentation, stemming, normalization as well as extraction of named/noun entities, creation of cluster titles and placing text documents into relevant clusters using fuzzy term weight.
Index Terms: Data Mining Techniques, Document Clustering, Gurmukhi Script Clustering Technique, Machine Learning, Punjabi Text Document Clustering, Unsupervised Learning.
Scope of the Article: Data Mining