Applied and Computational Engineering

- The Open Access Proceedings Series for Conferences

Volume Info.

  • Title

    Proceedings of the 2023 International Conference on Machine Learning and Automation

    Conference Date

    2023-10-18

    Website

    https://2023.confmla.org/

    Notes

     

    ISBN

    978-1-83558-327-2 (Print)

    978-1-83558-328-9 (Online)

    Published Date

    2024-03-05

    Editors

    Mustafa İSTANBULLU, Cukurova University

Articles

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230134

    Application analysis of financial data mining in investment decision

    Amidst the escalating complexities that define the contemporary financial market and the rapid proliferation of information, traditional methods of formulating investment decisions confront increasingly formidable challenges. In response to these intricate dynamics, the realm of financial data mining has emerged as a prominent avenue of scholarly investigation within the investment domain. This paper's fundamental objective is to conduct a comprehensive retrospective analysis of the diverse applications of financial data mining in the context of investment decision-making.This scholarly pursuit entails a meticulous synthesis of existing academic inquiries, concurrently proposing potential avenues for future advancements in this field. By undertaking this academic endeavor, the paper strives to make substantive contributions to the refinement of methodologies essential for adeptly navigating the multifaceted landscape of modern investments. As the financial landscape continues to evolve, this study aspires to offer insights that not only enhance the efficacy of investment strategies but also foster a deeper understanding of the intricate interplay between data mining techniques and decision-making processes. Through the synthesis of empirical findings and theoretical perspectives, this paper seeks to underscore the pertinence of leveraging data-driven approaches in investment practices, thereby promoting a more informed and sophisticated investment landscape.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230074

    Data analysis based on COVID-19—Important factors in the COVID-19 outbreak

    With multiple industries around the world receiving significant impact following the 2020 pandemic outbreak, the Covid-19 pandemic has highlighted the importance of health care systems in managing and containing infectious diseases. This article examines the relationship between the level of health care services and the number of Covid-19 infections, taking into account factors such as detection and contact tracing, case treatment and management, and resource constraints. While countries with stronger health care systems may be better able to respond to a pandemic, resource constraints and other factors may also play a role in determining infection rates. Overall, the relationship between health care and Covid-19 infections is complex and influenced by multiple factors, highlighting the need for sustained investment in health care infrastructure and systems. In this article, we aim to analyze which factors influence the number of infections in the New Crown outbreak. Our study shows that some factors are significantly associated with the number of infections in the epidemic, while certain factors are not significantly associated, and are considered to be consistent with survey expectations and make recommendations and outlooks.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230078

    Predictive model on detecting ChatGPT responses against human responses

    The paper investigates the critical differences between AI-generated text and human responses in terms of linguistic patterns, structure, and content. The research makes use of datasets from HC3, collected in 2023. Our results are that ChatGPT with GPT-3.5 is more likely to use words like conjunctions and combinations of words in conversations compared to humans systematically. Our model has high accuracy in identifying AI-generated answers.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230079

    Sentiment analysis applied on Amazon reviews

    With the rapid growth of e-commerce, accurately capturing buyers' sentiments through their reviews is increasingly vital for online marketplaces. In this paper, we aim to deal with sentiment analysis in these reviews by exploring effective methods to analyze them. We use a review dataset containing user ratings and comments on Amazon products. Applying the two-step methodology of data preprocessing and model building, we intend to employ models like LSTM and SVM to analyze Amazon customer reviews and gain insights into their performance. The findings of this study may also allow e-commerce platforms to provide better service to sellers and buyers.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230093

    Applying self-attention model to learn both Empirical Risk Minimization and Invariant Risk Minimization for multimedia recommendation

    Multimedia recommendation systems have many applications in our daily life. However, how accurately capture a customer's preference is an issue that is difficult to deal with. The proposed Invariant Risk Minimization (IRM) and Empirical Risk Minimization (ERM) are ways to learn a customer's preference. Still, both frameworks show some limitations: although ERM performs excellently in a single environment, it fails to generalize well when faced with multiple and new domains. On the other hand, IRM learns invariant features across heterogeneous environments, but it lacks theoretical guarantees and performs less effectively where the invariants are unclear. This paper proposes an ERM and IRM Optimized Rating Framework (EIOR) as our final recommender model with direct rating scores. The EIOR enhances the accuracy and functionality of the multimedia recommendation systems by utilizing self-attention mechanisms to combine IRM and ERM with adjusted attention weights. Specifically, IRM learns invariant parts across different environments, while ERM learns variant parts. With self-attention, we can adaptively allocate attention weights for the two pieces and seek the optimal pair of attention weights based on the loss function. We demonstrate EIOR on a cutting-edge recommender model UltraGCN and use the open multimedia dataset of TikTok to finish all the experiments. The results validate the effectiveness of EIOR by comparing purely operating invariant representations alone with the framework of IRM.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230097

    Principal Component Analysis variants for Parkinson datasets

    Principal Component Analysis (PCA) is one of the most fundamental dimension reduction methods that need further research. With the widespread popularity of machine learning and the arrival of the era of big data, dimension reduction has become a hot topic and principal component analysis is a hot topic. However, although there are a lot of researchers who focus on the methods of the PCA, few researches on Parkinson Datasets have been made. As a result, the aim of our work is to discuss the PCA variants for Parkinson Datasets. This paper first introduces the three most commonly used PCA methods: PCA, Sparse PCA and Kernel PCA, and then introduces the Support Vector Machine (SVM) used to measure the dimension reduction effect. After that, we introduced the Parkinson's dataset and the meanings of root mean square error (RMSE), overall accuracy, Cohen’s kappa (Kappa) and computational time, the indicators that are used to measure the dimensionality reduction effect. Finally, we identified the variants among different PCA methods on the Parkinson dataset by comparing the indicators of the data obtained after dimensionality reduction using different methods.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230155

    The comparison and analysis of Skip-gram and CBOW in creating financial sentimental dictionary

    Textual analysis is increasingly used in various fields due to data availability, computing power, and machine learning techniques. In finance, sentiment analysis is essential for obtaining excess returns, and building domain-specific lexicons using word2vec is a prevalent method. The CBOW and Skip-gram algorithms have different predictive methodologies and performances depending on the task and dataset. This paper reviews financial sentiment analysis using a dictionary method and compares the performance of the two algorithms. CBOW trains faster than Skip-gram when dealing with a small amount of text data, but as the amount of data increases, Skip-gram becomes more efficient. Besides, the Skip-gram captures more synonyms of the selected words than CBOW.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230156

    A new approach based on machine learning to certain diseases

    Never in history is the importance of data science so emphasized in modern society. Focusing on obtaining conclusive results from the implicit features concealed in a huge amount of data, data science plays a remarkable role in various fields, including modern medical practice. Although the fascinating performance of cutting-edge technology is capable of coping with numerous diseases, certain diseases, such as breast cancer and Parkinson’s disease, still compromise people’s health since these diseases are difficult to predict and prone to exacerbate. In order to deal with that problem, we will introduce three different machine learning methods in our experiment to two different data sets to test the performance of classification. In the paper, we clarified the principle of each machine learning method (three different classifiers) at first. Then, we conducted our experiment, during which decisive parameters of classifiers were set by specific searching algorithms. Besides, we introduced metrics along with their principles for the evaluation of the numerical results, which were obtained by different classifiers. In the next step, we discussed the results by comparing the values of the metrics that represent the performance of a particular method. Therefore, we managed to obtain optimal classifiers for the two datasets. In the final stage of the paper, we discussed our experiment’s limitations as well as prospects, which includes further application in other fields.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230241

    Incorporating emotional trend into multi-emotion analysis models for long-text sentiment analysis

    The role of sentiment analysis is vital in natural language processing(NLP) and has garnered significant attention across different domains. However, multi-emotion analysis in long-text is still a challenging task due to the intricate emotional nuances that are conveyed. In this paper, a novel approach for long-text multi-emotion analysis is proposed by integrating emotional trends. This integration aims to enhance the ability of the model to recognize emotions by including word-level sentiment scores as supplementary features. To achieve this, the ISEAR and IMDB datasets are leveraged to investigate the impact of sentiment scores with varying weights on three models: BiLSTM, CNN, and CNN+BiLSTM. The models are trained for 20 and 50 epochs and evaluated by accuracy, precision, recall, F1 score ROC curve and AUC value. The experimental results indicate that the incorporation can improve the processing speed of the multi-emotion analysis task while maintaining performance with a 66.7% probability. The highlighted improvement over the baseline model reduced the time by 33.42%. In the best case, the accuracy of the model increased by 2.26% and the F1 score increased by 2.16% without affecting the running speed.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230247

    Enhancing recommendation with causal embedding: Considering social network influence

    In the realm of recommendation models, we consistently rely on observational interaction data. This data encompasses a variety of aspects, such as user conformity and genuine user interests. The key challenge for recommender systems is to extract a user's authentic interests from this interaction data in order to provide accurate recommendations. The current method, DICE, attempts to separate conformity and interest by assigning distinct embeddings for each to users and items. The method ensures that each embedding captures only one causal factor through training with specific causal data. In our research, we've enhanced this existing method by incorporating social networks into the disentanglement of conformity and real interest from observational interaction data. The results from our proposed method surpass those of the prevailing baseline, demonstrating significant improvements across various backbone models using a real dataset. Furthermore, we conducted a sensitivity analysis and provided recommendations for scenarios in which our new model would be most effective.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230249

    Distributionally Robust Optimization methods on robust medical diagnosis systems

    In the medical field, modern recommendation systems face significant challenges due to distributional shifts in data. We propose utilizing Distributionally Robust Optimization (DRO) and Distributionally and Outlier Robust Optimization (DORO) methods to address this issue. This paper aims to develop suitable DRO and DORO frameworks for the medical domain and validate their effectiveness through extensive experiments. We employ the DDXPlus dataset for our investigations and cluster patients based on age, sex, and initial evidence to partition the data into distinct distributions. Using a simple three-layer neural network, we incorporate CVaR and CHISQ as DRO methods and their respective DORO forms. The experimental results show that the overall DRO approach demonstrates more significant enhancements while all four methods exhibit improvements over the original distributional scenarios. Our research contributes to optimizing deep learning models in the medical domain and enhancing their robustness. Furthermore, we intend to use these methods to estimate and provide best-fit patient therapies, addressing real-world medical challenges. The application of these approaches has the potential to enhance the performance and practicality of medical recommendation systems, offering improved medical services to patients.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230280

    Parkinson’s disease diagnosis through electroencephalographic signal processing and neural network classification

    Parkinson's disease (PD) is the second most prevalent neurological disorder, following Alzheimer's. Despite this, there is currently no successful treatment for PD. Therefore, early detection of Parkinson's disease is crucial for preventing its progression. To address this, a computer-aided diagnosis system has been implemented to identify any abnormalities. Significant research has been conducted using speech and gait analysis. However, there is growing interest in using electroencephalographic (EEG) signals to diagnose Parkinson's disease at an early stage. This paper aims to use EEG to capture neural correlates of dysfunction in PD patients and compare with the normal ones to determine whether a person has PD. The method is to preprocess the EEG dataset using MATLAB and EEGLAB and to analyze and classify the preprocessed data using MLP neural networks, which has good expressiveness and adaptability. Our dataset contains 25 sets of data with 11 healthy people and 14 Parkinson's Disease patients. Experiments show that the model has an average test accuracy of 96.8% and average test loss of 12.8%.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230390

    Exploring the relationship between user’s characteristics and movie recommendations using a KNN-based recommender system

    The central aim of this research is to elucidate the degree to which demographic variables, including but not limited to age and gender, bear on the performance of movie recommendation algorithms within film recommendation systems. Further, we endeavor to uncover any existent correlations between these user characteristics and the resultant outputs of such systems. Leveraging the expansive dataset available via MovieLens, we employ a linear regression model to ascertain the four critical variables (age, gender, occupation, and the average user rating for films previously watched) that have the most profound influence on movie recommendation algorithms. Once these salient factors have been determined, we assign their respective weights and incorporate these into a KNN algorithm. We then subject the resultant model to rigorous testing to verify the accuracy of our results and to ascertain whether the integration of these weighted elements enhances the overall precision of the movie recommendation system. While extant literature predominantly focuses on the amalgamation of KNN with other algorithms, our study charts a novel course by using linear regression. This methodology allows us to intuitively illustrate the relationship between user demographics and the movie recommendation system and enables us to evaluate whether emphasizing certain characteristics can augment the system's effectiveness. Our findings suggest that of all the user characteristics examined, the mean of users’ ratings for movies previously watched exerts the greatest influence on the outputs of the movie recommendation system. Moreover, incorporating weights reflective of the average user ratings across all movie features within the KNN algorithm can significantly bolster the accuracy of the resultant movie recommendations.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230410

    Advanced approaches to prevent ARP attacks

    Nowadays, there exist various types of Address Resolution Protocol(ARP)-based attacks, such as ARP flood attacks, ARP spoofing host attacks, attacks that spoof gateways, man-in-the-middle attacks and Internet Protocol(IP) address collision attacks. Focusing on the prevention to ARP spoofing, this paper first introduces S-ARP, a secure version of ARP utilizing asymmetric cryptography and focusing on message authentication rather than traffic confidentiality that aims to mitigate such attacks. It then talks about a modular approach based on multiple modules utilizing databases instead of focusing on ARP table cache to detect and mitigate ARP cache poisoning. At last the paper talks about an approach with Software Defined Network(SDN) to prevent cloud computing from being vulnerable to ARP poisoning. We then make some comparisons of these methods from three aspects in the comparison section and give the advantages of each method. In the end, these scenarios are summarized in the concluding section of the paper.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230456

    Prediction of syngas yield from biomass by gasification and related application

    This research focuses on the prediction of synthesis gas generation from biomass through gasification and specifically estimates the syngas yield from rice straw from 2018 to 2020. The data of 2020 is visualized in the form of a colored world map A comprehensive literature review is conducted to explore previous studies on syngas yield models and gasification methods and the utilization of machine learning models. A machine learning model is built to calculate the prediction of the syngas’ total yield generated from biomass gasification. The inputs of the model include temperature, carbon content, hydrogen content, and oxygen content, with the latter three representing different types of biomasses. The output of the model is the total synthesis gas yield per kilogram of biomass. Subsequently, this model is utilized to predict the amount of syngas obtained from rice straw, which has a carbon, hydrogen, and oxygen content of 43.9%, 5.6%, and 32.1% respectively. From the model, an optimal gasification temperature of 667 degrees Celsius and a maximum syngas yield of 4.71 Nm3/kg for rice straw is obtained. Based on available data on rice straw production worldwide from 2018 to 2020, the amount of rice straw utilized for biomass gasification is estimated. The syngas yield in different regions of the world is calculated based on the maximum syngas yield and the mass of available rice straw. Outcomes of the calculation are visualized into a global map displaying the distribution of syngas yields which provides valuable insights into the potential for syngas production from rice straw in different regions.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230485

    Multi-source serialization cross-domain recommendation algorithm based on deep learning

    Cross-domain recommendation is an effective approach to solve the cold start and data sparsity problems in recommendation systems. Sequential recommendation can model user behavior sequences and improve the accuracy of recommendation. Currently, few recommendation algorithms consider both aspects together, and most of them do not utilize multi-source information sufficiently. In view of this, this paper proposes a multi-source serialization cross-domain recommendation model, which fully considers the temporal and contextual relationships in two domains, and fuses multi-source information on the basis of achieving cross-domain recommendation tasks, and reinforce the embedding representation by fitting the interest forgetting function. Finally, use a Multilayer Perceptron as the mapping function to learn the nonlinear mapping relationship between the source domain and the target domain, subsequently enabling recommendations for new users in the target domain. On Amazon dataset, this model can significantly enhance the accuracy of recommendation.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230578

    Efficient vehicular networks offloading using Hybrid Localization Algorithm and Deep Reinforcement Learning

    In the context of growing urbanization and increased vehicular traffic, the demand for efficient computation and location-based services is paramount. This paper proposes a pioneering solution to address the challenges of precise location services in Vehicular Networks within urban settings. The system combines a Hybrid Localization Algorithm (HLA) that integrates multiple methods for improved location accuracy with Deep Reinforcement Learning (DRL) for intelligent and adaptive offloading decisions based on real-time traffic conditions. Extensive simulations demonstrate the effectiveness of our approach in reducing response times, optimizing offloading strategies, and alleviating the burden of urban peak vehicle navigation pressure. This research paves the way for enhanced location-based services and intelligent transportation systems in urban areas.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230676

    Comparative analysis of VGG, ResNet, and GoogLeNet architectures evaluating performance, computational efficiency, and convergence rates

    This paper conducts an in-depth comparative analysis of three foundational machine learning architectures: VGG, ResNet, and GoogLeNet. The focus of the evaluation is their performance metrics on the CIFAR-100 dataset, a widely adopted benchmark in the field. Employing a comprehensive set of evaluation metrics, this investigation assesses not only testing accuracy but also the rate of training convergence and computational efficiency, providing a holistic perspective on the architectures' capabilities. Through rigorous experimentation, we elucidate the inherent advantages and drawbacks associated with each of these architectures. For instance, our findings delve into the nuances of how different architectures fare in terms of computational resources, which is vital for deployment in resource-constrained environments. Additionally, this study extends the analysis to explore the effect of hyperparameter settings, particularly learning rates, and the utility of data augmentation techniques in modulating the overall performance of each architecture. The ultimate objective is to furnish empirical insights that will assist researchers and practitioners in making well-informed choices when selecting a machine learning architecture for their specific application requirements.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230827

    Research on precise management and control technology of construction waste based on optimized PSO-ELM

    The turbulence in the transportation of the muck truck and the aging of the sensor, make the data from muck truck sensors contain a lot of noise points. This has a serious impact on the management of construction waste which can lead to This has a serious impact on the management of construction waste which can lead to a great deal of time and money wasted by the regulators and transport drivers. To solve this problem, this paper firstly analyses the fault diagnosis results of vehicle sensors. Based on it, this paper then uses the fuzzy clustering method to creatively build a fault credit system of the muck truck. This fault credit system analyses the past performance of truck sensors and presents the results in a form of reliability. Combining the data with the reliability of the sensors is beneficial to reduce the influence of noise on the discrimination of electronic bills. Finally in the pattern recognition section, this paper improves PSO-ELM method to make the fault credit system of the muck truck can adjust the weight matrix and the offset matrix in the neural network. Therefore, the credit system can directly adjust the result of the electronic single discrimination without wasting extra computing power. The effectiveness and superiority of the method is verified in the dataset collected from real truck.

  • Open Access | Article 2024-03-05 Doi: 10.54254/2755-2721/44/20230835

    Enhancing plagiarism detection methodology with the DQN algorithm on an improved differential evolution foundation

    Plagiarism detection has become increasingly crucial in real-world applications, demanding precise identification of content similarity. This paper introduces a novel plagiarism detection approach. Building upon LSTM as the foundation, it employs an enhanced DE (Differential Evolution) algorithm and reinforces learning with the DQN algorithm for sample classification and training. Throughout the training process, gradual parameter adjustments are made with the aim of improving the model's efficiency and accuracy.

Copyright © 2023 EWA Publishing. Unless Otherwise Stated