Forecasting prevalence of dengue hemorrhagic fever using ARIMA model in Sulawesi Tenggara Province, Indonesia

Background: Dengue hemorrhagic fever occurs through the bite of Aedes mosquitoes, primarily Aedes aegypti, carrying dengue viruses. In recent decades, the risk increased dramatically, not only in the tropics but also in subtropical regions. Objective: This study aimed to determine the best model for forecasting dengue hemorrhagic fever prevalence in Sulawesi Tenggara, Indonesia. Method: This was a retrospective analytical study using secondary data from the Sulawesi Tenggara Provincial Health Office from 2014 to 2019. ARIMA model was used for data analysis. Results: ARIMA (0.1.1)(0.1.1) was selected as the best-suited model. Based on the forecast, there would be an increase in dengue hemorrhagic fever prevalence over the next two years, with a mean absolute percentage error value of 4.41%. Conclusion: Forecasting results indicated that the peaks of dengue hemorrhagic fever cases would be in March, July, and November, and the increase will occur in the same months each year. Also, forecasting results were very good. Public health practitioners can use this model to prevent and eradicate dengue hemorrhagic fever. The ARIMA model would also be useful for nursing practice in caring for patients with dengue fever in the future.

It is also well-known as a mosquito-borne disease that develops most rapidly in the world (Yoshikawa, Kusriastuti, & Liew, 2019). It is characterized by sudden fever and bleeding, either in the skin or elsewhere in the body, which can cause shock and death (Rosid, Fitrani, Findawati, Winata, & Firmansyah, 2019). In 2016, there was a large dengue epidemic throughout the world. In 2015, DHF prevalence in Southeast Asia reached 451.442 cases (Incidence Rate (IR) = 24 per 100.000 population) claiming a total of 1.669 deaths (Case Fatality Rate (CFR) = 0.37%) (World Health Organization, 2019).
In Indonesia, since the beginning of January 2019, DHF prevalence has continued to grow until it reaches 13.683 cases. In 2017, the number of reported cases was 68.407, with 493 dead and an IR of 26 per 100.000 less than that in 2016, 204.171 cases with an IR of 79 per 100.000. During the last ten years, starting in 2008, it tended to be high until 2010, then decreased dramatically in 2011 with an IR of 28 per 100.000, followed by an upward in 2016 of 79 per 100.000. However, there was a drastic decline again in 2017 with an IR of 26 12 per 100.000 and in 2018 with 65.602 cases and an IR of 25 per 100.000 (Ministry of Health of the Republic of Indonesia, 2019).
The incidence of DHF in 2016 to August 2019 in Sulawesi Tenggara province was 3.599 cases claiming 34 deaths (CFR = 9.44%) and in 2017 decreased to 911 cases with 14 deaths (CFR = 1.53%). In 2018, there fell again as many as 811 cases with the number of fatalities was 6 (CFR = 0.73%). In 2019, there were 683 dengue cases until August, with a total of deaths of 4 (CFR = 0.58%) (Health office of the Southeast Sulawesi Province, 2019).
The ARIMA model is a time series forecasting technique based only on the behavior of the observed variable data (Idrees, Alam, & Agarwal, 2019). It completely ignores independent variables because it uses both present and past values of the dependent variable to produce accurate short-term forecasting. However, as is well-known, no forecasting method can accurately predict the state of data in the future. Each forecasting method must have an error. If the resulting error rate gets smaller, then the forecasting results will be closer to the right. The measuring instruments used in this study to calculate prediction errors were the Mean Squared Deviation (MSD), Mean Absolute Deviation (MAD), and Mean Absolute Percentage Error (MAPE) (Liu, Tang, & Zhang, 2019).
The ARIMA model has several advantages. It does not require stationary data patterns and applies to data containing seasonal elements. This study aimed to predict Dengue Hemorrhagic Fever in Sulawesi Tenggara, Indonesia, using the ARIMA model.

Design
The current study was a retrospective analytical study using secondary data from the Sulawesi Tenggara Provincial Health Office from 2014 to 2019 (Dinas Kesehatan Sulawesi Tenggara, 2019).

Setting
Sulawesi Tenggara Province is located on Sulawesi island, Indonesia. Astronomically, it lies south of the Equator, extending from North to South between 02°45' to 06°15' South Latitude and stretching from West to East between 120°45' to 124°45' East Longitude. The area of Southeast Sulawesi is in the land area of 38.067.7 km 2 (Biro Pusat Statistik, 2017).

Data Analysis
The ARIMA process was carried out with a data stationarity test, in which in terms of variances was carried out using the Box-Cox Transformation. If the rounded value or lambda (λ) is more than or equal to 1, then the data is said to be stationary in variance. Otherwise, the transformation must be carried out until the rounded value of Box-Cox is or more than 1. The stationarity test in mean was done by analyzing the ACF graph of data that is stationary both in variance and mean. Tentative model identification. However, if the data is not stationary in mean, then a seasonal differencing process is carried out. The level of difference will also determine the value (d) of the model. If there are seasonal patterns, a seasonal differencing process is also carried out as much as the repeating patterns. A pattern that occurs every 12 months requires the differencing process to be carried out 12 times, or it can be interpreted as the first level of the seasonal differencing process. The seasonal difference value will also determine the value (D) on the model. Data that has been stationary both in variance and mean. Establishing tentative ARIMA (p, d, q) (P, D, Q) models as appropriate. If the data does not undergo the differencing process, then the value (d) is 0. If the data is stationary after the first difference, then d = 1 and so on. Likewise, in seasonal data, if the data experiences a difference of 1, then the value (D) is 1, and so on. Determination of the orders p, q, P, and Q can be done by observing the pattern of Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) that have experienced seasonal differencing process.
Estimating model parameters is used to see whether or not the ARIMA model's parameters are significant. A model will be significant if the significance value is less than the alpha (α) value of 0.05. The next step is the diagnostic examination to prove whether the model is sufficient or suitable for forecasting. The tests performed are the white noise and the normality tests. The white noise test on the residuals, using the Ljung-Box test, aims to determine whether or not the residuals are already white noise. Residuals are white noise if the p-value is more than the alpha (α) value of 0.05. The normality test is done using the Kolmogorov-Smirnov statistical test to test is whether or not the residuals are normally distributed. Residuals are said to have normal distribution if the p-value is more than the alpha (α) value of 0.05. The model is said to be good and feasible if it can meet all three tests. The best ARIMA model that has been obtained can be used for the process of forecasting DHF cases for the next two years. The best model should have the smallest error value. Evaluation is done to determine the accuracy of forecasting results by looking at the percentage of MAPE values and MAD values.

Ethical Consideration
The research permit of this study was obtained from The Research Office of Sulawesi Tenggara Province, with number 070/03199/Balitbang/2019.

Results
Based on Figure 1, from 17 regions, only three districts, namely Konawe Kepulauan, Konawe Utara, and Muna Barat, were free of DHF. It means that the DHF transmission spread to almost all subdistricts in Sulawesi Tenggara. There were five subdistricts with the highest numbers of cases, namely Konawe (94), Bau-Bau (98), Kendari (111), Konawe Selatan (144), and Kolaka (213), all of which were designated as regions with extraordinary events of DHF in 2018.
In 2017, the number of deaths of DHF was 14. In 2018 it decreased to be 5 cases in 3 sub-districts. The highest fatality cases were reported in Konawe and Kolaka with 2 cases and in Kota Bau-bau 1 case. Fourteen regencies reported no deaths of DHF. Deaths of DHF are categorized high if CFR > 2%. Therefore, Sulawesi Tenggara was classified as medium.  Figure 2 shows that the plot increased and decreased in certain months. Significant improvement appears in 2016, where DHF cases increased in certain months. In Figure 2, the increasing number of dengue cases occurred in February and March. The rising prevalence could be caused by several factors, including natural factors, namely the rainy season with high intensity, so that some areas are experiencing flooding. It was supported by geographical location, a tropical region with low land with a fairly dense population.
In Figure 2, the data plot shows that the data is not stationary, both in variance and mean. A stationary data is that that has neither a significant increase nor a decrease. Data that has been stationary fluctuate around the average value and is constant over time. Data can be used in forecasting with the ARIMA method if it is stationary, both in variance and mean. Hence, the stationarity test needs to be done to meet the initial assumption requirements. Data that is not stationary in variance will be transformed with the Box-Cox transformation. Meanwhile, if not stationary in mean, it requires a differencing process.  Figure 3 shows that the data is not stationary in variance because the Box-Cox's rounded value shows the p-value of (0.00). Data that has not been stationary in variance needs to be transformed so that the value of lambda (λ) or the rounded value is greater than or equal to 1.  Figure 4 shows the rounded value on the Box-Cox plot after being transformed at 1.00. It indicates that the DHF case data is stationary in variance. The next step is to determine whether the data is stationary in the mean by looking at the ACF (Autocorrelation Function) plot and the PACF (Partial Autocorrelation Function) plot.  Figure 5 shows that in the ACF plot, only the first 2 lags have come out of the confidence interval, while the value in lag 3 is not close to zero. It shows that the data is not stationary in the mean. This can also be seen in Figure 6 of the PACF plot.  Figure 6 shows that in the ACF plot, the first 3 lags come out of the confidence interval, indicating that the data is not stationary in the mean. Based on the ACF and PACF plots, a differencing process is needed. In Figure 7, the ACF plot of DHF case data after treated by a differencing process shows the data is stationary in mean because there is no lag coming out of the confidence interval.

Identification of the ARIMA Model
The next step is to find the best estimated-value for the parameters of the ARIMA model. Tentative identification of the ARIMA model can be denoted in ARIMA (p,d,q)(P,D,Q)s where (s) is a model with seasonal elements. The tentative ARIMA model formed after the transformation and one differencing process is ARIMA (p,1,q)(P,1,Q) S. The ACF plot is used to read the moving average value (q and Q) while the PACF plot is used to read the autoregressive value (p and P). The ACF plot in Figure 7 is cut off after the second lag, so it is estimated that the tentative model is MA = 1 for the non-seasonal model. Figure 8 shows that the PACF plot is cut off after the second lag so that the nonseasonal model is MA = 1, and the PACF plot contains seasonal elements in lag 4 so that SMA = (1) for the seasonal model. Then, the identification results produce the estimated tentative model, namely ARIMA (0.1.1)(0.1.1) 4 .

Model Parameter Estimation
The alternative hypothesis (H1) is accepted if it is significant and enters the model. The rejection criterion is if the P-value (significance) is less than α with α = 0.05 at a 95% confidence level. Significance test results for each model parameter in the ARIMA interim model are the model (0.1.1) for non-seasonal and (0.1.1) 4 for the seasonal model. The results obtained are that the tentative seasonal ARIMA models are MA1 (0.000), MA1 (0.000), and SMA4 (0.000). This shows that the model fulfills the parameter estimation requirements. The significance value of each parameter in the model is less than α (0.05), so they are matched with the model.

Diagnostic Examination
The diagnostic examination consists of two kinds, namely the white noise test and the normality test. The white noise test fulfills the assumption if the significance value (p-value) on Ljung-Box is > α, with α value is 0.05. If the p-value is < α, then the model is said not to meet the assumption. Based on the results from Ljung-Box, the ARIMA model (0.1.1)(0.1.1) 4 shows lag 12 with p-value = 0.124, lag 24 with p-value = 0.479, lag 36 with p-value = 0.884, and lag 48 with p-value = 0.983. All the pvalues are more than α = (0.05), indicating that the ARIMA (0.1.1)(0.1.1) 4 model has fulfilled the assumption.
The next test is the normality test using the Kolmogorov-Smirnov test. The residual test is said to have normal distribution if the p-value is more than α with the α value of 0.05. If the p-value is less than or equal to α, then the residuals are said to be not normally distributed. The ARIMA (0.1.1)(0.1.1)4 residual normality test results show a p-value of (0.150), meaning that the residuals in the model are normally distributed because the p-value is > α. This can be seen in Figure 9 as follows:

Use of the Best Model for Forecasting
The ARIMA model with seasonal elements that has fulfilled the assumptions is the ARIMA model

Forecasting DHF Cases in Sulawesi Tenggara Province
Forecasting methods with time-series data depend on the data patterns that exist in the past period's actual data, which will determine the appropriate forecasting method to be used. Figure 1 shows that there was a recurring pattern in the DHF case plot in Sulawesi Tenggara Province from January 2004 to August 2019. Figure 1 shows that cases of DHF in February and March always increase every year compared to other months, which means that in the plot of DHF cases, there are seasonal or recurring patterns.
In the forecasting of DHF cases in Sulawesi Tenggara Province in September 2019 to September 2021 using the ARIMA method, the data plot contains seasonal elements, so the forecasting method is specific to the ARIMA model with seasonal elements. The ARIMA forecasting method is a time series forecasting method that completely ignores the independent variables and only uses the dependent variable, time data (Zhou, Han, Li, & Xia, 2019). The main requirement for forecasting a time series is that the data must be stationary in both variance and mean. In fact, the time-series data used as the dependent variable may not be stationary, so the stationarity process needs to be done. The stationarity process is carried out by transforming data so that the data becomes stationary in variance and by the process of difference so that the data becomes stationary in mean (Shi, Worden, & Cross, 2019).
The data is stationary in variance by looking at the Box-Cox Transformation (Wahyuningsih, Goejantoro, Siringoringo, Saputra, & Aminah, 2019). Data is said to be stationary in variance if the value of lambda (λ) or the rounded value in the Box-Cox plot is 1 or more than 1 (Hyndman et al., 2019). Both figures 3, 4, and 5 show that the data is not stationary in variance. In Figure 3, a transformation process is performed on the Box-Cox plot. It is also seen in Figure 4 that the lambda value (λ) is already 1.00, which means that the data has been stationary in variance.
Data that has been stationary in variance will then be checked whether the stationarity is in the mean. The stationarity test in the mean is done by looking at the ACF graph of the data. If the data is not stationary in the mean, then a differencing process will be carried out. The number of differences determines the value of I (integrated) with the orders (d) and (D) in the model (Rosyid, Aniendya, Herwanto, & Shi, 2019). Figure 5 shows the need for the differencing process once because the values of the first 3 lags in the ACF plot are not close to zero. The results of the differencing process from Figure 5 can be seen in Figure 7, in which in the ACF plot none of the first three lags is near zero, and no lag comes out of the confidence interval. This shows that the data has been stationary in the mean.
In Figure 8, in the PACF plot, after experiencing the differencing process, there are two lags, namely lags 4 and 8, coming out of the confidence interval. Based on this, it can be identified that the model has a seasonal pattern every 4 months. In the PACF plot, two seasonal lags come out, so a seasonal differencing process is needed to eliminate the seasonal elements.
After the data is stationary both in variance and mean, a tentative model identification is then performed. The ACF and PACF plots produce an ARIMA tentative model based on the identification results, namely (0.1.1)(0.1.1) 4 . The model's identification by looking at the p-value for each model parameter is done. The model is accepted if the significance value of each parameter in the model is <α, where α value used is 0.05. The parameter values of the model are MA 1=0.000 and SMA 4=0.00. This shows that the model has been significant.
After the identification of the model, a parameter significance test and a diagnostic examination are carried out. The parameter significance test by looking at the Ljung-Box value is performed on the model to determine whether the residuals follow the white noise process. Based on the white noise test that has been done on the model (0.1.1)(0.1.1) 4 , the significance values of lag 12, lag 24, lag 36, and lag 48 are more than 0.05, respectively, so the model meets the test requirements of the white noise test and the errors are random, and there is no autocorrelation in the data.
The final stage in the diagnostic examination is the residual normality test using the Kolmogorov-Smirnov test. Residuals are said to be normally distributed if the p-value is >α, with a value of α is 0.05, and are said to be not normally distributed if the p-value is <α. The Kolmogorov-Smirnov test results on the model (0.1.1)(0.1.1) 4 show that the p-value is ≤0.150. This indicates that the model is normally distributed because the P-value is ≥0.05. The results of the tests that have been done show that the ARIMA model (0.1.1) (0.1.1) 4 can predict the prevalence of DHF for the next 24 periods.

Forecasting DHF Cases in Sulawesi Tenggara Province in 2019-2021
Based on the epidemiological triangle, the disease is influenced by the host, agent, and the environment (Ghezzi, 2020). The high incidence of disease can be caused by a decrease in the environment's carrying capacity, causing the host stronger to transmit the disease from the agent (Barker & Reisen, 2019). In connection with the incidence of DHF, rainfall and number of rainy days cause environmental conditions to decline by preparing conditions conducive to the breeding of Aedes mosquitoes that carry the dengue virus from sufferers (Ehelepola & Ariyaratne, 2016;Sharmin, 2016). The physical environment related to DHF events includes climatic conditions (temperature, humidity, and rainfall), geographical conditions, and geological structures (Ishak & Kasman, 2018;Pakaya, 2017;Setiawati, 2019;. The physical environment includes abiotic or inanimate objects such as water, air, weather, home, heat, sunlight, wind (Ana, Agbu, & Anetor, 2020;Ibe, 2015).
Forecasting results obtained from the ARIMA model are expected to help prevent the transmission of DHF and Extraordinary Events by eradicating the DHF-causing vectors. This cannot be done by one party, but by various sectors involved in it. The forecasting is also expected to reduce the number of deaths due to DHF by making various prevention efforts as early as possible by considering its causes.

The Implication for Public Health
Forecasting dengue fever is early warning information for health workers. The findings of this study become a reference in dealing with the incidence of DHF. Public health workers and professionals will prepare various things in dealing with DHF patients. Readiness in providing health services will provide benefits for the recovery of DHF patients. Also, public health practitioners can use this data as part of health reporting on DHF. This data can also be used to support the DHF surveillance system. The forecasting model obtained will predict the number of DHF sufferers in the next period, so that decision-making regarding prevention and eradication measures can be carried out appropriately and continuously based on existing forecasts.

Conclusion
The ARIMA model (0.1.1)(0.1.1) 4 has fulfilled the assumption needed and was selected as the best suited to predict the incidence of DHF cases in Sulawesi Tenggara, Indonesia. The model shows an increase in the next two periods in Sulawesi Tenggara, which indicates that the peaks of DHF cases will be in March, July, and November, and the increase will occur in the same months each year. The results of the DHF case forecasting provided a MAPE value of 4.41%, which shows that the forecasting results are very good.

Declaration of Conflicting Interest
There is no conflict of interest to be declared.

Funding
The authors received no specific funding for this study.

Author Contributions
Concept generation, data collection, writing, and editing of the manuscript (M), concept generation, critically reviewed, writing, and revision (Y and HL).