Design of data preprocessing recommendation algorithm for deep learning model learning
Type of Presentation | Oral ( 0 ) / Poster ( ) / Anything ( ) |
Scope and Interests | Machine Learning, Deep Learning |
Title of Paper | Design of data preprocessing recommendation algorithm for deep learning model learning |
Corresponding Author(s) |
Name:Yoosoo Oh e-mail: yoosoo.oh@daegu.ac.kr Tel:+82-53-850-6654 |
Author(s) name / Affiliation / e-mail |
1)Hyeonji Kim 2)Dept. of ICT Convergence, Dept. of AI, Daegu University 3)hyunji-k@daegu.ac.kr |
Abstract | To ensure the integrity of collected data in deep learning, we need to apply the data preprocessing process. This paper proposes an algorithm that identifies the learning data type and recommends the data preprocessing method for deep learning. Learning in deep learning models has different preprocessing processes depending on image files (jpg, png, etc.) and numerical files (CSV, XML, etc.). Thus we classify collected data as numeration and image files using file extensions to determine the file type. Then we use knowledge-based filtering techniques to recommend the data preprocessing process for the deep learning model. In addition, we utilize Titanic's analytical data set as training datasets to evaluate recommendation algorithms. Usually, numerical files are less accurate in the learned model when outliers exist. Accordingly, the numeration file uses the mean and standard deviation among learning data to explore outlier values and perform preprocessing to remove outlier values or replace them with others. Primarily, we apply z-score and z-test methods to explore outliers. Image files in a deep learning model perform preprocessing that changes the image's size, color, etc. Also, the proposed method receives input from the user and processes the image size and data type required for the deep learning model. Moreover, this paper designs an algorithm that uses a knowledge-based filtering model to recommend suitable preprocessing methods according to the numerical file and image file. In particular, the proposed recommendation algorithm recommends preprocessing techniques such as removing missing values, substituting missing values, and inserting predicted values depending on the presence or absence of abnormal data in the case of a numerical file. Furthermore, in an image file, pre-processing methods such as image size change, image transparency processing, color classification, and boundary detection are recommended. Consequently, we compare the recommended preprocessing method with the pretreatment method by user analysis to evaluate the proposed recommended algorithm. For evaluation, we use the dataset, which is the collected data concerning Titanic. As experimental results, we obtained that the proposed preprocessing method has higher result values than the existing preprocessing method performed by user analysis. |
Keywords | Preprocessing, Recommendation Algorithm, Knowledge-Based Filtering |