Information Extraction from Multifaceted Unstructured Big Data
Kiran Adnan1, Rehan Akbar2, Khor Siak Wang3
1Kiran Adnan, Department of Information & Communication Technology, Universiti Tunku Abdul Rahman, Kampar, Malaysia.
2Rehan Akbar, Department of Information & Communication Technology, Universiti Tunku Abdul Rahman, Kampar, Malaysia.
3Khor Siak Wang, Department of Information & Communication Technology, Universiti Tunku Abdul Rahman, Kampar, Malaysia.
Manuscript received on 21 August 2019 | Revised Manuscript received on 11 September 2019 | Manuscript Published on 17 September 2019 | PP: 1398-1404 | Volume-8 Issue-2S8 August 2019 | Retrieval Number: B10740882S819/2019©BEIESP | DOI: 10.35940/ijrte.B1074.0882S819
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: In the era of digital globalization, huge volume and variety of data are being produced at a very high rate. Every day, the world is producing around 2.5 quintillion bytes of data. According to IDC, by 2020, over 40 zettabytes of data will be generated and reproduced. Digital data have become a deluge, overwhelming in every field of information technology (IT), business, science and engineering. These fields are shifting to smart and advanced technologies such as smart manufacturing industries, data-aware medical sciences, and other smart applications. These applications are facilitating the industries in context of data-driven decision making, big data storage, and complex analysis of large data sets. Also, these applications are contributing to generate big data deluge where a variety of data necessitate the industries to use advanced IT approaches. 95% of the digital universe is unstructured data. It is rich data as it contains information that can play a vital role to improve big data analytics. The heterogeneity, complexity, lack of structured information, poor quality and scalability of unstructured data generates difficulties in adapting traditional information extraction techniques. Information extraction can play a vital role in transformation of unstructured data into useful information. A multistep pipeline with data preprocessing steps, extraction methods and representation are utmost requirement to improve the unstructured data analytics. In this regard, this paper presents a short review of information extraction process w.r.t. input data type, extraction methods with their corresponding techniques, and representation of extracted information. The issues with unstructured data and the challenges to information extraction from multifaceted unstructured big data as well as the future research directions have also been discussed.
Keywords: Unstructured Data, Information Extraction, Big Data, Unstructured Data Analysis.
Scope of the Article: Big Data Analytics and Business Intelligence