JRSSEM 2021, Vol. 02, No. 02, 159 166
E-ISSN: 2807 - 6311, P-ISSN: 2807 - 6494
TWO:
Comparative Analysis of Methods K-Nearest
Neighbor, Support Vector Machine and Decision
Tree on Prediction Model of Turnover Intention
Syarif Sagaf Adibaji
1
Onny Marleen
,2
1
Gunadarma University, Indonesia
2
Gunadarma University, Indonesia
*
e-mail: assegaf819@gmail.com
1,
onny_marleen@staff.gunadarma.ac.id
2
*Correspondence: assegaf819@gmail.com
Submitted
: 04 September 2022,
Revised
: 15 September 2022,
Accepted
: 28 September 2022
This study analyzed the comparison of methods on machine learning technique to predict turnover
intention, turnover intention refers to intention or possibility of an employee to leave a company
or the job that he is currently working on. The analysis with comparing the K-Nearest Neighbor,
Support Vector Machine and Decision Tree methods, in an effort to predict turnover intention and
reduce the risks of turnover intention in employee. The dataset used is taken from the Kaggle
dataset, the dataset file is in the form of human resource (HR) data records with 311 data records
with 24 features used out of 36 features. The dataset is obtained by using the K-Nearest Neighbor,
Support Vector Machine and Decision Tree methods to calculate the accuracy, precision and
sensitivity with a confusion matrix, the results of accuracy, precision and sensitivity from those three
methods are compared and the method with the highest average percentage of accuracy, precision
and sensitivity will be used as a prediction model.
Keywords: Prediction Model, Turnover Intention, K-Nearest Neighbor, Support Vector Machine,
Decision Tree.
Sharif Sagaf Adibaji, Onny Marleen| 160
TWO:
1. INTRODUCTION
Employees in a company are one
of the components that play an
important role in the sustainability of a
company. Recently, since the beginning
of 2020, at that time cases of the
COVID-19 virus have just emerged,
there is a new work dynamic that has
emerged, namely the number of
employees who quit their jobs. This was
stated by Professor Anthony Klotz from
the University of Texas A&M who
predicted that many employees would
want to move or leave their jobs in May
2021, which later in America became
known as the trend of the phenomenon
"The Great Resignation" or "Big Quit".
The desire to move or turnover
intention is a problem that is widely
highlighted because it has a negative
impact on the sustainability of projects
in the company, company productivity
and the sustainability of the company in
the long term. Turnover intention refers
to the desire or possibility of an
employee to leave a company or the
work he is doing (Balete, 2018).
Turnover intention consists of two
types, the first is voluntary, namely
employees who want to leave a
company or work that they do on their
own wishes while involuntary is based
on the wishes of the company or the
party where the employee works (Perez,
2008).
Predicting the risks that affect
turnover intention is one of the keys to
addressing this problem. by using the
implementation of machine learning
techniques to then be able to provide
insights for company leaders and
human resources (HR) teams. This study
was conducted to predict the risks that
can affect turnover using machine
learning techniques by comparing the
accuracy, precision and sensitivity of
several methods including K-Nearest
Neighbor, Support Vector Machine and
Decision Tree. Based on the percentage
of accuracy, precision and sensitivity of
several methods, the method with the
highest average percentage will then be
made as a prediction model so that it is
hoped that the predictive model can
minimize turnover intention and
maintain the sustainability of workers in
a company in the long term.
2. MATERIALS AND METHODS
Some of the theories that are
referenced in this study related to the
desire to move or
turnover intention are
described in this section. Penelitian
which is started by collecting datasets
in the form of human resource
data
records
sourced from the Kaggle
dataset. The dataset consists of 311
data records
and 36 features, then the
selection or selection of features into
24 features, then the selection or
selection of features into 24 features is
161 | Comparative Analysis of Methods K-Nearest Neighbor, Support Vector Machine and
Decision Tree on Prediction Model of Turnover Intention
carried out . Before being implemented
into model
machine learning
,
data
transformation
is carried out first by
doing
date encode
,
ordinal encode
and
one-hot
encode
. Machine
learning
methods
compared to create prediction
models include the
K-Nearest Neighbor
method,
Support Vector Machine
and
Decision Tree
. The last stage is to test
the test data by measuring the level of
accuracy, precision and sensitivity using
the confusion matrix
.
Materials
The explanation of some of the
theories is explored as follows.
Turnover Intention
Turnover intention
is the intention,
will or will of the individual himself to
exit by itself from the organizationi
(Sudarmawan & Suhariadi, 2014).
Turnover intention
is the intention of a
person to quit the company because of
a reason either voluntarily (originating
from within oneself) or not voluntarily
(termination of employment from the
company)(Sianipar & Haryanti, 2014) .
Turnover intention
is the desire of
employees to leave the company and
try to find another job that is better
than before(Waspodo, Handayani, &
Paramita, 2017).
Turnover intention
consists
of two types, the first is
voluntary
, namely employees who want
to leave a company or work that they
do on their own wishes while
involuntary
that is, based on the
wishes of the company or the party
where the employee works (Perez M. ,
2008)
Machine Learning
Machine learning
is a series of
techniques that can help in handling
and predicting very large data by
presenting these data with learning
algorithms(Danukusumo, 2017).
Machine learning can be
defined as an
experiential computational method to
improve performance or make accurate
predictions. The definition of
experience here is previous information
that is available and can be used as
learner data
K-Nearest Neighbor
The
K-Nearest Neighbor
or
KNN
algorithm is a method that uses a
supervised ed
algorithm where the
results of the new test sample are
classified based on the majority of the
categories in the KNN. The propriety of
the KNN algorithm is determined by
the presence and absence of irrelevant
data , or the weight of the feature is
equivalent to its relevance to the
classification(Nugroho & Wijana, 2015).
Support Vector Machine
The Support Vector Machine
(
SVM
) is a set of
supervised learning
methods that analyze data and
recognize patterns, used for
classification and regression analysis.
This classification is done by looking for
hyperplanes
or
decision boundaries
that separate one class from another.
SVM
seeks to find the best hyperplane
Sharif Sagaf Adibaji, Onny Marleen 162
by maximizing the margins/distances
between classes(Hadna, Santosa, &
Winarno, 2016) .
Decision Tree
Decision tree
or pohon decision is
a very powerful and well-known
method of classification and prediction.
The
decision tree
method can be
described as a decision tree because if
visualized the structure is similar to a
tree where the decision tree turns a very
large fact into a decision tree that
represents rules. Rules can be easily
understood in natural language. In
addition it can be expressed in the form
of a database language such as
Structure Query Language
(SQL) to
search for records in a specific
category(Nasrullah, 2021).
Data Transformation
Methods in
data mining
or
machine learning
often require special
data formats or structures before they
can be implemented. The
process of
data transformation
is the process of
changing existing data from one format
or structure to another format or
structure that is ready to be processed.
Through the transformation process, it
allows
data mining
or
machine learning
thatcan beobtained more effectively
and efficiently. Not only that, but the
patterns found are also easier to
understand (Leolianto, Thayf, &
Angriani, 2020).
Confusion Matrix
Confusion matrix
is a table
consisting of many rows of test data
that are predicted to be correct and
incorrect by a classification or
prediction model, this table is needed
to determine the performance of a
classification or prediction model
(Wijayanto, 2015) .
Table 1. Table
Confusion Matrix
Prediction Class
Class
Positiv
e
Negativ
e
Actua
l Class
Positive
TP
FP
Negativ
e
FN
TN
TP: The predicted class is positive while
the actual class is positive.
FP: The predicted class is positive while
the actual class is negative.
FN: The predicted class is negative while
the actual class is positive.
TN: The predicted class is negative while
the actual class is negative.
Based on
the confusion matrix
,
accuracy, precision and sensitivity of a
classification or prediction model can
be calculated. Accuracy determines how
accurately the model is in classifying
test data correctly, precision describes
between the positive correct prediction
results and the entire positive class in
the actual class while sensitivity
describes between the prediction
results true positive with the entire
positive class in the prediction class.
Theequations for calculating accuracy,
163 | Comparative Analysis of Methods K-Nearest Neighbor, Support Vector Machine and
Decision Tree on Prediction Model of Turnover Intention
precision and sensitivity using the
confusion matrix table are as follows:

 
   


 


 
2. MATERIALS AND METHODS
Thedata set collected is sourced
from Kaggle. Kaggle is an online
community that gathers experts in the
field of
data science
, Kaggle was built by
Goldbloom in 2010 and already has
more than 1000 datasets. human
power. The record dataset is 311 data
with 36 features, sourced from
https://www.kaggle.com/datasets/rhue
bner/human-resources-data-set. The
features of the collected dataset include
Employee_Name, EmpID, Married ID,
Marital Status ID, Gender ID, EmpStatus
ID, Dept ID, Perf Score ID, From Diversity
Job Fair ID, Salary, Termd , PsitionID,
Position, State, Zip, DOB, Sex, Marital
Desc, Citizen Desc, Hispanic Latino,
Race Desc, Dateof Hire, Date of
Termination, Term Reason, Employment
Status, Department, Manager Name,
ManagerID, Recruitment Source,
Performance Score, Engagement
Survey, Emp Satisfaction, Special
Projects Count, Last Performance
Review_Date, Days Late Last 30 and
Absences. Furthermore, selecting or
selecting features from the dataset
used. feature selection is carried out by
choosing which f-features have a major
effect on
turnover intention
and are
relevant. If there is a feature that has no
connection at all, it can be removed
from the feature. The features removed
in the feature selection process totaled
12 features including Employee_Name,
EmpID, Marital Status ID, Sex,
PositionID, DeptID, Perf ScoreID,
Employment Status , Emp Status ID,
Date of Termination, TermReason &
Manager ID. Data transformation is
carried out by encoding
date encode
,
ordinal encode
and
one-hot
encode
.
3. RESULTS AND DISCUSSION
The result of creating a prediction
model created using the
K-Nearest
Neighbor, Support Vector Machine
and
Decision Tree methods.
To see the
implementation process of each model,
a simulation of the calculations of each
method is carried out against the data
sample.
Table 2. Sample Data
Em
pI
D
Date
ofHir
e
Perform
anceSco
re
Ter
m
d
10
02
6
7/5/
2011
Exceeds
0
10
08
4
3/30
/201
5
Exceeds
1
10
19
6
7/5/
2011
Fully
Meets
1
From the data sample, processing
is then carried outwith the K-Nearest
Neighbor method, Support Vector
Sharif Sagaf Adibaji, Onny Marleen 164
Machine and Decision Tree. After that,
the trial is carried out by measuring the
level of accuracy, precision and
sensitivity to the dataset used so that
the method used can be compared and
tested to be concluded and specified in
the creation of the prediction model. By
utilizing
the confusion matrix
of
measuring the level of accuracy,
precision and sensitivity of
the K-
Nearest Neighbor
method,
the Support
Vector Machine
and
Decision Tree
are
as follows:
Acuration, precision and
sensitivity of the K-Nearest Neighbor
method in predicting turnover intention
with the following equations of
calculation of accuracy, precision and
sensitivity.


 
  


 
The result of the calculation of the
accuracy value shows a figure of 76%.






 
The result of the calculation of the
precision value shows a figure of 91%.



 


 
The result of the calculation of the
sensitivity value shows a figure of 76%.
Acuration, precision and sensitivity of
the
Support Vector Machine
method in
predicting
turnover intention
with the
following equations of accuracy,
precision and sensitivity calculations.


 
 


 
The result of the calculation of the
accuracy value shows a figure of 90%.






 
The result of the calculation of the
precision value shows a figure of 98%.








The result of the calculation of the
sensitivity value shows a figure of 88%.
the accuracy, precision and sensitivity of
the
Decision Tree
method in predicting
turnover intention
with the following
equations of accuracy, precision and
sensitivity calculations.


 
 


 
The result of the calculation of the
accuracy value shows a figure of 98%.







The result of the calculation of the
precision value shows a figure of 100%.








The result of the calculation of the
sensitivity value shows a figure of 98%.
165 | Comparative Analysis of Methods K-Nearest Neighbor, Support Vector Machine and
Decision Tree on Prediction Model of Turnover Intention
CONCLUSIONS
Basedonthe results of the trials
carried out, several conclusions can be
drawn, this research can analyze the
features needed and can be used to
make prediction models. This study
successfully implemented
machine
learning
to predict
turnover intention.
This study can determine and select the
best prediction method among
K-
Nearest Neighbor, Support Vector
Machine
and
Decision Tree
. The best
method is
Decision Tree
because the
accuracy value is 98%, then the
precision value is 100% and the
sensitivity value is 98%. The accuracy,
precision and sensitivity values of the
Decision Tree
method are the highest
compared to the other two methods.
REFERENCES
Balete, AK. (2018).
Turnover Intention
Influencing Factors of Employees:
An Empirical Work Review.
Journal of Enterpreneur &
Organization Management, 1-7.
Danukusumo, KP. (2017).
Deep
Learning Implementation Using
Convolutional Neural Networks
for GPU-Based Classification of
Temple Images.
Yogyakarta :
Atma Jaya University.
Hadna, NMS. , Santosa, PI. and
Winarno, WW. (2016).
A Literature
Study of Comparative Methods
for sentiment analysis processes
on Twitter.
National Seminar on
Information and Communication
Technology, 1-8.
Leolianto, I. , Thayf, MSS. and Angriani,
H. (2020).
Implementation of
Naive Bayes Theory in the
Classification of Prospective New
Students of STMIK Kharisma
Makassar.
Science and
Information Technology Journal,
110-117.
Mohri, M. , Rostamizadeh, A. and
Talwalkar, A. (2018).
Foundations
of Mechine Learning.
Cambridge :
MIT Press.
Nasrullah, AH. (2021).
Implementation
of the Decision Tree Algorithm for
the Classification of Best-Selling
Products.
Scientific Journal of
Computer Science, Faculty of
Computer Science, Al Asyariah
Mandar University, 45-51.
Nugroho, RS. and Wijana, K. (2015).
Auxiliary Program to Predict Sales
of Goods.
Journal of EXISTENCE,
83-93.
Perez, M. (2008).
Turnover Intent
Diploma Thesis.
Basılmamış
Yüksek Lisans Tezi : University of
Zurich.
Sianipar, ARB. and Haryanti, K. (2014).
The Relationship between
Sharif Sagaf Adibaji, Onny Marleen 166
Organizational Commitment and
Job Satisfaction with Turnover
Intentions in Employees in the
Production Sector of CV. X.
Jurnal
Psychodemensian, 98-114.
Sudarmawan, SH. and Suhariadi, F.
(2014).
The Effect of Employee
Perceptions of Organizational
Fairness on Turnover Intentions in
PT. ENG Gresik.
Surabaya :
Airlangga University.
Waspodo, AA. , Handayani, NC. and
Paramita, W. (2017).
The Effect of
Job Satisfaction and Work Stress
on Turnover Intention on PT.
Unitex in Bogor.
Indonesian
Journal of Science Management
Research, 97-115.
Wijayanto, H. (2015).
Batik
Classification Using the K-Nearest
Neighbour Method Based on Gray
Level Co-Occurrence Matrices
(GLCM).
Fik UDINUS Journal, 1-6.
© 2021 by the authors. Submitted
for possible open access
publication
under the terms and conditions of the Creative
Commons Attribution (CC BY SA) license
(https://creativecommons.org/licenses/by-sa/4.0/).