ISIITA

ISIITA 2024

UNIVERSITY OF THE PHILIPPINES CEBU AND CRIMSON RESORT & SPA MACTAN, CEBU

Cebu island, Philippines

January 15 ~ 18

ISIITA 2021 Summer backup

Design of data preprocessing recommendation algorithm for deep learning model learning

Abstract
Author
Hyeonji
Date
2021-07-08 17:55
Views
228
Type of Presentation Oral ( 0 ) / Poster (   ) / Anything (   )
Scope and InterestsMachine Learning, Deep Learning
Title of PaperDesign of data preprocessing recommendation algorithm for deep learning model learning

Corresponding Author(s)

Name:Yoosoo Oh
Affiliation: Dept. of ICT Convergence, Dept. of AI, Daegu University

e-mail: yoosoo.oh@daegu.ac.kr

Tel:+82-53-850-6654

Author(s)
name / Affiliation / e-mail
1)Hyeonji Kim
2)Dept. of ICT Convergence, Dept. of AI, Daegu University
3)hyunji-k@daegu.ac.kr
Abstract

To ensure the integrity of collected data in deep learning, we need to apply the data preprocessing process. This paper proposes an algorithm that identifies the learning data type and recommends the data preprocessing method for deep learning. Learning in deep learning models has different preprocessing processes depending on image files (jpg, png, etc.) and numerical files (CSV, XML, etc.). Thus we classify collected data as numeration and image files using file extensions to determine the file type. Then we use knowledge-based filtering techniques to recommend the data preprocessing process for the deep learning model. In addition, we utilize Titanic's analytical data set as training datasets to evaluate recommendation algorithms.

Usually, numerical files are less accurate in the learned model when outliers exist. Accordingly, the numeration file uses the mean and standard deviation among learning data to explore outlier values and perform preprocessing to remove outlier values or replace them with others. Primarily,  we apply z-score and z-test methods to explore outliers. Image files in a deep learning model perform preprocessing that changes the image's size, color, etc. Also, the proposed method receives input from the user and processes the image size and data type required for the deep learning model.

Moreover, this paper designs an algorithm that uses a knowledge-based filtering model to recommend suitable preprocessing methods according to the numerical file and image file. In particular, the proposed recommendation algorithm recommends preprocessing techniques such as removing missing values, substituting missing values, and inserting predicted values ​​depending on the presence or absence of abnormal data in the case of a numerical file. Furthermore, in an image file, pre-processing methods such as image size change, image transparency processing, color classification, and boundary detection are recommended.

Consequently, we compare the recommended preprocessing method with the pretreatment method by user analysis to evaluate the proposed recommended algorithm. For evaluation, we use the dataset, which is the collected data concerning Titanic. As experimental results, we obtained that the proposed preprocessing method has higher result values ​​than the existing preprocessing method performed by user analysis.

Keywords Preprocessing, Recommendation Algorithm, Knowledge-Based Filtering