Applied and Computational Engineering

- The Open Access Proceedings Series for Conferences

Volume Info.

  • Title

    Proceedings of the 2023 International Conference on Software Engineering and Machine Learning

    Conference Date

    2023-04-19

    Website

    http://www.confseml.org

    Notes

     

    ISBN

    978-1-83558-199-5 (Print)

    978-1-83558-200-8 (Online)

    Published Date

    2023-12-11

    Editors

    Anil Fernando, University of Strathclyde

    Marwan Omar, Illinois Institute of Technology

Articles

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230065

    Review on audit data visualization method based on R language

    To be competitive in today's market, companies must constantly assess and improve their operations, and financial data analysis has emerged as a crucial tool for this purpose. Executives can utilize data analysis to gain a deeper understanding of the underlying facts in their data and make better informed decisions about their company and the market. Multiple data models in the language make it possible to have a fully functional language environment with tools for statistical analysis and visual visualization of data. With its ability to enhance the quality of work in statistical computations and graphical analysis, the R language is ideally suited to the industrial data analysis environment. In this paper, we use a literature review methodology to examine the domestic and international literature on R language big data visualization auditing, analyze the visualization application areas of R language big data auditing, discuss the benefits and challenges of big data auditing, demonstrate how R language can be adapted to real-world auditing scenarios, and offer reasonable recommendations and optimistic outlooks for the field's future.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230094

    The virtual face modeling method based on user facial recognition and Unreal Engine 5 MetaHuman Creator

    The face model based on the real face is a technique with hugely increased demands in recent years, The methods for the modelling are still in an early stage and only used in research and business fields. As the increased demands, areas other than business and research will require this technique. However, the cost of professional modelling is unaffordable for individual users. Therefore, through theoretical analysis and a literature review, this paper illustrates the possibility of combining face recognition technology with Metahuman Creator for face modeling so as to achieve the goal of reducing the cost of face scanning and popularizing it among individual users.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230101

    Application test of PCA based improved K-means clustering algorithm in analyzing NGO assistance needs in less developed countries

    In today's society where the amount of data is increasing by Peta Byte (PB) or Exa Byte (EB), it is an era of big data explosion, but there are also some unlabeled data or unstructured data. Compared with complex supervised learning, unmarked unsupervised learning has great potential and value in social development. The clustering algorithm K-means is one of the commonly used algorithms in unsupervised learning. However, after studying the shortcomings of K-means itself, a problem is found that the dimension attribute of the data set must be converted into a numeric type by means of arithmetic average to measure the distance. Different random selection will have a certain degree of influence on the final clustering results, and eventually lead to the decision deviation is too large. Especially for high noise points, multidimensional, nonlinear social big data. In order to solve this problem, the theme of this paper is the application test of PCA based improved K-means clustering algorithm in analyzing NGO assistance needs in less developed countries. First, read and clean up the national data of 167 less developed countries. Secondly, data visualization and data preparation are carried out to re-scale. The principal component analysis algorithm is used to analyze and deal with outliers. Clustering trends are analyzed by combining a k-means model determined by scores obtained from the Hopkins statistical test with a list of countries ultimately in need of assistance. Finally, it can be tested that PCA data cleaning can effectively reduce data noise and improve the clustering effect.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230111

    Performances evaluation of machine learning models on income forecasting

    Job seekers, especially those who are looking for their first job, often lack sufficient experience and guidance, which makes it difficult for them to obtain satisfactory salaries. Therefore, salary prediction is very important. For individuals, income ranges can be estimated; for companies, the use of such estimates can guide salary adjustments for employees and prevent the loss of talented personnel, increase company revenue, and reduce operating costs; for governments or countries, these estimates can provide a macro-level assessment of overall income for a large area, such as predicting GDP per capita in a city, making it easier to make economic adjustments and grasp macro development trends. This article uses three models: decision trees, random forests, and neural networks, to train relevant datasets. The dataset is Adult Income Dataset from Kaggle. A total of 32,561 adults are included, including 15 items of data including age, education level, occupation, marital status, working hours per week, and others. The training and test sets were divided into a 7:3 ratio, and the predictive result of each model was evaluated through following figures: accuracy, recall rate, and F1 score. The final conclusion was that the random forest model had the best performance. There is an inseparable relationship between residents' income and the development and happiness of individuals and social stability.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230119

    Predicting consumer acceptance of automobiles based on deep learning and traditional machine learning algorithms

    Researchers have made significant progress in machine learning in recent years. Machine learning can learn and predict large and complex data sets. Researchers have divided machine learning algorithms into two categories: deep learning and traditional machine learning. Every problem can be predicted in both ways. This paper uses the "Car Data" dataset to investigate deep learning and traditional machine learning. In order to find a machine learning algorithm that is more conducive to analyzing and predicting consumers' acceptance of different cars, this paper mainly explores the differences in the prediction accuracy of the three methods of Neural Networks, Random Forest and Support Vector Machine (SVM). We construct 3-hidden layers neural networks and 4-hidden layers neural networks. After testing, it is known that the result predicted by Random Forest is the worst. The prediction accuracy of 3-hidden layers Neural Networks is similar to that by SVM. When we added an extra layer of hidden layers on the basis of 3-hidden layers, the prediction accuracy was higher than that of SVM. Adding a hidden layer can improve the prediction accuracy, and both SVM and Neural Network can be used to analyze Car Data. But not all methods have similar predictive accuracy.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230133

    Features of realized volatility analysis and return predicting based on LGBM and RNN model

    This paper proposes a method of predicting the realized volatility of financial assets using LGBM and RNN models. The study utilizes Convolutional Neural Networks to construct sub-indicators capturing the liquidity and volatility of financial assets. These sub-indicators are used to develop comprehensive measures of liquidity and volatility. Lognormal random walk theory is applied to each asset dimension to price volatility for multiple assets, and the value of European options independent of path is obtained via multiple integration. Monte Carlo method is applied to solve the integral, which becomes inefficient in the case of high dimension and orthogonality. This study also involves leveraging LGB and other models to efficiently exploit data to create high returns and achieve the highest sharp ratio. The current dataset, which comes from a recognized international market maker, includes stock market data that is important for trade execution in the financial markets, particularly snapshots of the order book and executed trades. The study shows that the proposed method can accurately predict the realized volatility of financial assets.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230146

    Research on improvements of fraud detection system: basing on improved machine learning algorithms

    Nowadays, commercial fraud behaviors commonly occur in many industries. However, due to obstacles like concept drift, imbalanced dataset and uneven distribution of fraud entries, Fraud Detection System (FDS) fails to identify such behaviors. Among the problems mentioned above, most research focus on dealing with skewed dataset. This paper first presents common application scenarios of FDS which consist of credit card fraud, insurance fraud and supply chain fraud. Then, this study introduces five representative methods in dealing with problems mentioned above, which are K Nearest Neighbors-Synthetic Minority Oversampling Technique-Long Short-term Memory Networks (kNN-SMOTE-LSTM), Generative Adversarial Nets-AdaBoost-Decision tree (GAN-AdaBoost-DT), Wasserstein GAN-Kernel Density Estimation-Gradient Boosting DT (WGAN-KDE-GBDT), Time-LSTM (TLSTM) and Adaptive Synthetic Sampling-Sequential Forward Selection-Random Forest (ADASYN-SFS-RF). KNN-SMOTE-LSTM adopts KNN as an identifying classifier so as to only retain true samples. GAN-AdaBoost-DT generates new samples without referring to real transactions. WGAN-KDE-GBDT uses Wasserstein Distance as distance measurement instead, and thus improves training speed and guarantees successful generation. TLSTM tires to consider the weights of different time intervals and measures the similarity between the simulated behavior and the genuine behavior. ADASYN-SFS-RF employs SFS algorithm, basing on RF, to only reserve optimal subsets of features. Finally, result metrics prove that those improved algorithms do improve the overall performance of FDS, even if with limitations at some indicators.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230148

    A review on machine learning methods for intrusion detection system

    With the increasing access to the Internet and the development of information technology, concerns about computer security have been raised on a considerably large scale. Computer crimes contain various methods to undermine information privacy and system integrity, causing millions to trillions lose in the past few years. It is urgent to improve the security algorithms and models to perform as a thorough structure to prevent attacks. Among this prevention structure, an intrusion detection system (IDS) has played a vital role to monitor and detect malicious behaviours. However, due to the rapidly increasing variety of threats, the traditional algorithms are not sufficient, and new methods should be brought into IDS to improve the functionality. Deep learning (DL) and Machine learning (ML) are newly developed programs which can process data on a considerably large scale. They can also make decisions and predictions without specific programming, and these features are suitable to improve and enhance the IDS. This article mainly focuses on a review of ML methods used in IDS construction.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230164

    An AI-based ambulatory ankle brace with wearable sensor used for preventing ankle sprains

    Ankle sprain is one of the most common injuries in the game of basketball. The ankle sprain may bring tremendous time and cost loss, and patients with a history of ankle sprain are susceptible to further ankle injuries. This paper proposes an AI-based ambulatory ankle brace with wearable sensors that can be used for ankle-sprain prevention. The equipment consists a sensor, a microcomputer, a Bluetooth module, and a muscle stimulator. Ten volunteers performed twelve basketball moves with the ankle brace on, and the twelve basketball moves were labeled as high-risk and low-risk. The sensor on the ankle brace measured the 3-dimensional angular velocity and angular displacement of the subject’s ankle in real-time, and the data were then fed to different machine learning algorithms to create models to predict future ankle motions. The model with the best performance created by the Random Forest algorithm was imported into the microcomputer. Once the model predicts a high-risk move, the microcomputer sends a Bluetooth signal to the muscle stimulator. The one end of the stimulator is a pair of electrodes attached to the peroneal muscles to restrict ankle motion. When the stimulator receives the “high-risk” signal, it’s activated and the spraining motion would be alleviated. In this way, the ankle brace doesn’t restrict normal ankle movement while providing adequate protection for potential ankle sprain cases.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230173

    Scheme for improving RAFT-based blockchain performance

    Improving RAFT-based blockchain systems' performance and their relationship with environmental factors is a lack of concern in recent studies and is essential in the production environment. To attain this, it is necessary to analyze the performance, especially the blockchain system's throughput, latency, and robustness. The two most widely used RAFT-based platforms, etcd, and Hyperledger Fabric's evaluation, were conducted to discover the factors influencing the system's performance and promoting methods. The evaluation focused mainly on throughput, latency, and robustness, including evaluating the reading and writing process, changing the number of keys, connections, and clients in etcd, and comparing the process and the two platforms. The only number of clients significantly impacts etcd's performance, and etcd's performance is better than Hyperledger Fabric's. Besides, both two platform shows that reading performs better than writing. So, to improve the system's performance, controlling the number of clients and focusing on the writing process is the key.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230177

    An iterative loss correction method for deep learning with noisy labels

    Noisy labelling is a prevalent issue in real-world data, often causing deep neural networks (DNNs) to overfit. Prior research in this area primarily relies on the accurate estimation of noise transition matrices, which is contingent on identifying anchor points in the clean data domain. However, current methods typically estimate the anchor points using information from the noisy labels, potentially resulting in poor estimation. In contrast, our novel method aims to enhance precision by developing an estimator that jointly learns the transition matrices and anchor points through iterative learning. Our approach is validated on the IMDB and MNIST datasets, proving to be more precise and effective than previous methods.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230179

    Unity3D-based conference room scene preparation and construction

    With the continuous development of society, the trend of digitally enabled online teaching is becoming more and more obvious. In order to ensure the quality of online teaching and add fun to education, teacher and student teaching should be combined with virtual scenes. This paper provides a simple example of a virtual classroom for teachers and students by introducing the basic operation of Unity3D engine, the design and construction of conference room scenes, and the implementation of drawing interaction functions and mobile devices porting and script editors to explore the possibility of adding a new form of teaching in 3D virtual space. The test results in this study show that the virtual scene can improve the interactive experience and bring immersion to users, which has some practical significance.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230189

    Combine model fine-tuning freezing layers and adaptive filter modulation to implement transfer learning for GANs

    Generative Adversarial Network (GAN) requires more resources to train than other deep learning models and its loss function converges more slowly. For this reason, scholars at home and abroad have proposed a GANS algorithm based on transfer learning, which is applied to fewer samples, thus improving the training effect of GANS algorithm. In this paper, we provide a new way to perform the transfer of genetic algorithms and combine the two ways. On this basis, we will compare and analyze a variety of transfer learning algorithms to verify the feasibility and effectiveness of the joint application.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230190

    Study on the human-brain confusion levels and corresponding EEG levels

    Confusion is a state in which an individual is not clear about the situation at hand, does not understand the logic of the matter, or cannot arrive at a reasoned outcome. When people are confused, they often make poor decisions or even fail to make decisions. In this paper, we tested and validated our two models using the collected EEG signals of ten students while watching online courses of different difficulty levels. Both the LSTM model and the DNN model obtained an accuracy rate close to 70% in the validation part, but they showed different characteristics. In the end, we compare and summarize the results of the two models and try to generalize them to other domains in the future.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230200

    Rumor detection methodology based on sentiment analysis and the transformer model with decision-level fusion

    The use of transformer models in natural language processing (NLP) has gained significant attention in recent years due to their exceptional performance in various language tasks. This paper explores the application of transformer models in rumor detection, the relevant research on rumor detection, the use of transformer models, and the techniques used to boost the model’s performance. Ultimately, the purpose of this paper is to provide insight into the potential of transformer models in detecting rumors on social media. Unlike other rumor-detecting models, the author adds a sentiment analysis model as a supplement to rumor detection. Also, to address the issue of insufficient information in early-stage comments on rumors, this paper proposes a decision-level fusion method before the output layer, which effectively utilizes information from different sources and minimizes the negative impact of insufficient data sources. The early-stage rumor detection accuracy of the model is greatly enhanced by this method, therefore, the article’s main contributions can be regarded as follows: First, this paper proposes a combination of an aspect level text sentiment analysis method according to syntactic features, gated recurrent units, and a self-attention mechanism. Experimental findings demonstrate that, compared to the original model without taking the sentiment analysis method into account, the proposed network model has advantages in accuracy and Macro F1 evaluation indexes. Second, a cross-text rumor-detecting method based on Decision-level fusion is proposed. Its advantage is that when the cross-text data source is incomplete and a certain text is missing, another text can be used to continue the analysis. Experimental findings show the effectiveness of this method in improving the accuracy of emotion recognition by integrating data from different modes. Third, a comparison is conducted between the performance of the Transformer-sentiment model and other related models, Text-CNN, Bi-LSTM, etc. The result shows that this integrated Transformer-sentiment model can not only solve the rumor detection tasks at higher accuracy, but can also overcome the shortcomings of the lack of datasets, which means that the model is more robust, and is able to detect rumors at the early stage of the rumor spreading process.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230202

    A study on how to improve the accuracy of auto error correction from voice to text

    Voice recognition application has been widely used in people’s daily lives. Usually, if people find it inconvenient to directly play the audio in public, or it is too noisy for them to listen to an audio message, they would probably use the voice recognition application in their mobile devices to translate the voice into text, so that they can understand the message clearly. However, there are still various errors and problems occurring during the usage of this common application, for instance, it can be hard sometimes to translate the sound into words correctly. This usually happens when the speaker speaks too fast and pronounces unclearly. Besides, some other factors, such as environmental noise, transmission channel quality, and the radio equipment, would also cause this problem. This paper mainly analyzes the causes of the inaccurate translation problem and some potential improvements to make this function more perfect. In conclusion, to solve this problem, the parity check matrix is a good way, since it can check whether the digits behind each word are still correct. After doing so, the digits will be changed into the correct words syntactically. However, even though the words are syntactically correct after applying the parity check matrix, they might not be semantically correct. Therefore, Levenshtin distance and Latent Semantic Analysis can be used to analyze the hidden meanings of words and sentences, so as to find the best suitable words to change.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230297

    Language sense classification model based on neural network

    The common international language, English, is playing an increasingly important role in various fields with the rapid development of artificial intelligence in recent years. Artificial intelligence can improve students' English abilities as an additional teaching tool. Therefore, this study seeks the English language sense between different types of sentences based on Long Short-Term Memory (LSTM) and BERT model analysis sentences and generates a model to distinguish the types. This paper adapts the LSTM model and BERT model: first, this paper crawls the sentences from British Broadcasting Corporation (BBC) documentaries, podcasts, and YouTube and then constructs a data filter to remove the sentences with low quality and short. This paper analyzes the data set through the BERT module and LSTM model. this paper then compares the differences between different sentences in a large-scale corpus to generate a language model without long-term dependence. A model is expected to be generated after corpus analysis, and the model can be used to analyze new input statements and give their types. This study can help English learners improve their sense of the English language and the types of sentences they need to say in the face of different situations.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230300

    Lightweight food classification model based on MobileViT and ULSAM

    This paper presents a novel approach to enhance the image classification performance by incorporating Unified Local-Scale Attention Module mechanism into the lightweight MobileViT architecture. The MobileViT+ Ultra-Lightweight Subspace Attention Module model achieved remarkable accuracy on the ISIA food-500 dataset, while maintaining computational efficiency and parameter quantity similar to the original MobileViT model. Moreover, the MobileViT+ Ultra-Lightweight Subspace Attention Module model outperforms other lightweight models such as MobileNetV2 and LCNet. The ablation experiments confirmed the effectiveness of Ultra-Lightweight Subspace Attention Module in enhancing classification accuracy and its ability to uniformly optimize multiple model structures. Additionally, this paper explored a more lightweight model that significantly reduced FLOPs and parameter quantity while maintaining the same model performance. Overall, this research provides a practical and resource-efficient approach for improving image classification performance in various deep learning.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230088

    Illegal, unreported and unregulated fishing detection with machine learning

    Illegal, unreported and unregulated fishing is a worldwide problem that causes local and global economic losses, depletes natural resources, alters our diverse ecosystems and takes an undue toll on fisheries. This study describes a machine learning-based strategy for response generation. Identifying data storage and processes has led to the initial development of a viable IUU fishing detection system that classifies vessels for IUU fishing by combining (1) the likelihood that a vessel is fishing using geospatially referenced signal data, and (2) whether or not it is classified The likelihood of fish activity is scored for ships to be within the area of interest, and (3) classification of whether the vessel is allowed to enter its habitable area of interest. In this paper, certain parts of the system were prototyped, including using logistic regression to develop highly predictive catch or no-fish classification models for longlines and trawlers, and identifying whether a vessel was within an area of interest process. In addition, many fishing vessel registries have been identified, which regulate the rights of specific vessels to fish in regulated areas of interest. The accuracy with which fishing models can predict the probability of fishing when vessels have longline or trawl gear is acceptable, and can predict vessels with seine gear, but additional research and analysis are needed. "In ROI" classification models should be extended to score their likelihood of being in ROI instead of outputting true/false judgments. Using machine learning and data analysis skills, the project aims to make further efforts to predict IUU fishing in order to enable law enforcement and ultimately significantly reduce or prevent IUU fishing.

  • Open Access | Article 2023-12-11 Doi: 10.54254/2755-2721/27/20230136

    A robust VGG network combined with Denoising Autoencoder module for human emotion recognition

    Human emotion can be divided into multiple categories, which makes it possible to recognize emotions automatically. One critical approach for automated emotion recognition is applying the convolutional neural network to classify emotions on human expression images, but the performance decreases if input distortions occur. This paper introduced a hybrid neural network architecture to make the automated emotion recognition robust towards distorted input images and perform similarly to prediction on clean images. This hybrid neural network combines the Denoise Autoencoder (DAE) network with the Visual Geometry Group (VGG) network. Multiple standalone VGG and Hybrid network experiments were conducted with the control variables method. FER-2013 data set from Kaggle was used as the experimental data set. Distorted input images were generated by adding random noise to clean images. As a result, the research raised a valid hybrid network architecture. The hybrid network improved the emotion classification accuracy on the distorted data set from 16.70% to 57.73%, and the accuracy is similar to the classification result on the clean data set.

Copyright © 2023 EWA Publishing. Unless Otherwise Stated