Lossless Tamil Compression using ASCII Substitution and Modified Huffman Encoding Technique
B. Vijayalakshmi1, N. Sasirekha2
1B.Vijayalakshmi, Ph.D. Research Scholar, Department of Computer Science, Vidyasagar College of Arts and Science, Udumalpet, Tamilnadu, India.
2Dr.N.Sasirekha, Associate Professor, Department of Computer Science, Vidyasagar College of Arts and Science, Udumalpet, Tamilnadu, India.
Manuscript received on March 12, 2020. | Revised Manuscript received on March 25, 2020. | Manuscript published on March 30, 2020. | PP: 2900-2906 | Volume-8 Issue-6, March 2020. | Retrieval Number: F8177038620/2020©BEIESP | DOI: 10.35940/ijrte.F8177.038620
Open Access | Ethics and Policies | Cite | Mendeley
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: Tamil language is a longest existing classical language in the humankind. It is one of the scheduled languages in India and also official language for many countries. Communication using Tamil language is drastically growing after the practice of internet. Storage of Tamil documents also emerged greater than before. So there is a high requirement for data compression to improve the efficiency of storage and fast communication of Tamil documents. This research paper provides a novel approach for Lossless compression technique especially for Tamil documents. The compression process involves three major steps: separation of English alphabets appears with in Tamil text, substitution of ASCII in the place of Unicode Tamil characters using static dictionary and building a Huffman tree with a variation method for encoding the Tamil document. Performance of Tamil compression is measured by finding the space efficiency of memory storage needed to store the compressed file. The space efficiency can be measured by finding the parameters of compression ratio, compression factor and percentage of compression. Time efficiency is calculated by finding the time taken by the algorithm to compress and decompress a file. The average compression achieved through this compression technique is 72.08%. The decompression process restores the original file without any loss of data.
Keywords: Text Compression, Dictionary, Unicode, ASCII And Huffman Encoding.
Scope of the Article: Routing, Switching and Addressing Techniques.