Loading

Discovery of Knowledge by using Data Warehousing as well as ETL Processing
Arif Ali Wani1, Bansi Lal Raina2

1Arif Ali Wani, Department of Computer Science and Engineering, Glocal University, Saharanpur (U.P), India.
2Bansi Lal Raina, Department of Computer Science and Engineering, Glocal University, Saharanpur (U.P), India.
Manuscript received on 24 August 2019 | Revised Manuscript received on 05 September 2019 | Manuscript Published on 16 September 2019 | PP: 936-945 | Volume-8 Issue-2S6 July 2019 | Retrieval Number: B11800782S619/2019©BEIESP | DOI: 10.35940/ijrte.B1180.0782S619
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Testing is very essential in Data warehouse systems for decision making because the accuracy, validation and correctness of data depends on it. By looking to the characteristics and complexity of iData iwarehouse, iin ithis ipaper, iwe ihave itried ito ishow the scope of automated testing in assuring ibest data iwarehouse isolutions. Firstly, we developed a data set generator for creating synthetic but near to real data; then in isynthesized idata, with ithe help of hand icoded Extraction, Transformation and Loading (ETL) routine, anomalies are classified. For the quality assurance of data for a Data warehouse and to give the idea of how important the iExtraction, iTransformation iand iLoading iis, some very important test cases were identified. After that, to ensure the quality of data, the procedures of automated testing iwere iembedded iin ihand icoded iETL iroutine. Statistical analysis was done and it revealed a big enhancement in the quality of data with the procedures of automated testing. It enhances the fact that automated testing gives promising results in the data warehouse quality. For effective and easy maintenance of distributed data,a novel architecture was proposed. Although the desired result of this research is achieved successfully and the objectives are promising, but still there’s a need to validate the results with the real life environment, as this research was done in simulated environment, which may not always give the desired results in real life environment. Hence, the overall potential of the proposed architecture can be seen until it is deployed to manage the real data which is distributed globally.
Keywords: Data Quality, Data warehousing, ETL and Testing.
Scope of the Article: Data Mining and Warehousing