Retrieving and Saving Meaningful Keywords in Unstructured PDF Documents using Binary Decision Diagrams
Anuragini Sharma
First Author name, Anuragini Sharma, Mcmaster University, Hamilton, Canada.
Manuscript received on 1 August 2019. | Revised Manuscript received on 8 August 2019. | Manuscript published on 30 September 2019. | PP: 1950-1955 | Volume-8 Issue-3 September 2019 | Retrieval Number: C4480098319/19©BEIESP | DOI: 10.35940/ijrte.C4480.098319
Open Access | Ethics and Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: With the growing intricacy in data engendered and processed across sundry platforms today, the desideratum for consistency has grown. Structured data is utilized for a number of purposes which is not feasible with unstructured data. The purpose of this study was to convert data from unstructured format to structured in portable document format with the help of new framework using the concept of Binary Decision Diagrams and Boolean operations. Binary decision diagrams are data structures for representing Boolean functions taking Boolean as input and generating Boolean as output and hence creating a binary diagram. This research is mainly carried out to show how we can store large number of data easily in the form of bits. The entire focus is on retrieving the meaningful information from unstructured textual data in PDF documents using Boolean operations and bag model, thus, saving the meaningful keywords in the form of binary decision trees. Later on clustering the documents based on commonalities between the documents. This research presents a way for increasing the efficiency of converting unstructured data to structured in PDF and saving huge number of data in the form of bits using this novel framework.
Index Terms: Unstructured Data, Structured Data, Binary Decision Diagram, Bag Model, Clustering, PDF Data Retrieval.
Scope of the Article: Software Engineering Decision Support