A Novel Scalable and Effective Partitioning Approach for Big Data Reduction

Malhat, M. G. and Elmenshawy, M. and Mousa, Hamdy and Elsisi, A. B. (2019) A Novel Scalable and Effective Partitioning Approach for Big Data Reduction. IJCI. International Journal of Computers and Information, 6 (1). pp. 9-19. ISSN 1687-7853

[thumbnail of IJCI_Volume 6_Issue 1_Pages 9-19.pdf] Text
IJCI_Volume 6_Issue 1_Pages 9-19.pdf - Published Version

Download (1MB)

Abstract

The continuous increment of data size makes the traditional instance selection methods ineffective to reduce big training datasets in a single machine. Recent approaches to solving this technical problem partition the training dataset into subsets prior to apply the instance selection methods into each subset separately. However, the performance of the applied instance selection methods to subsets is negatively affected, especially when the number of partitioned subsets is increased. In this work, we propose a novel scalable and effective automated partitioning approach, called overlapped distance-based class-balance partitioning. This approach distributes the training dataset instances to the partitioned subsets based on a given distance metric and ensures the equal representation of data classes into partitioned subsets. Moreover, the instances might be assigned to two subsets once they satisfy the dynamic threshold. We implement and test empirically the scalability and effectiveness of the proposed approach using condensed nearest neighbor method over eight standard datasets. The proposed approach is compared empirically and analytically with stratification partitioning approach and a non-overlapped version from our approach with respect to 1) the reduction rate, classification accuracy, and effectiveness metrics, and 2) the scalability aspect, where the number of subsets is increased. The comparison results demonstrate that our approach is more scalable and effective than other partitioning approaches with respect to these standard datasets.

Item Type: Article
Subjects: Academics Guard > Computer Science
Depositing User: Unnamed user with email support@academicsguard.com
Date Deposited: 14 Jul 2023 12:03
Last Modified: 08 Jun 2024 09:09
URI: http://science.oadigitallibraries.com/id/eprint/1362

Actions (login required)

View Item
View Item