Proceedings of the 5th International Conference on Computing and Data Science
Marwan Omar, Illinois Institute of Technology
Roman Bauer, University of Surrey
Alan Wang, University of Auckland
The lack of access to extensive and varied datasets remains one of the major issues facing the field of machine learning, despite recent advancements. This is especially true in the healthcare sector, where it can be challenging to gather and use patient data for research because it is frequently compartmentalized across many healthcare providers. By enabling secure and privacy-preserving access to distributed data, blockchain technology, and federated learning have the potential to overcome these difficulties. In this article, we'll look at how federated learning and blockchain are used in the healthcare industry and talk about their benefits and drawbacks. We will also examine the Hedera platform, which makes use of blockchain technology and a new algorithm called Gossip Degree to provide a revolutionary method of federated learning. We will also go over the potential effects of federated learning on the healthcare sector and what it means for future research.
As a distributed ledger technology, blockchain has found widespread use in a variety of industries, including finance, the Internet of Things (IoT), healthcare, and manufacturing. This technology addresses the trust issue by converting a low-trust centralized ledger into a highly trusted distributed ledger maintained by various entities. Consensus algorithms are one of the fundamental building blocks of the blockchain, controlling how nodes cooperate and synchronize data to perform secure and reliable activities in a decentralized setting. This paper examines the extant mainstream consensus algorithms, introduces six representative consensus algorithms, analyses their benefits and drawbacks, and discusses the application scenarios and suitability of each consensus algorithm in various blockchain platforms.
The task of handwritten digit recognition is to recognizing the handwritten digits from pictures. Applying machine learning based models to automatically perform handwritten digit recognition task can significantly improve efficiency. This paper applies two machine learning based models, including multi-layer perceptron and residual neural network, for such a task. Firstly, this paper introduces the basic concept of the simple multi-layer perceptron model and then presents the structure of the residual neural network model. Subsequently, such two models are trained on the MNIST corpus, one of the classical dataset for the handwritten digit recognition task. The data pre-processing, like the splitting of training and test set, is described. Also, the processes of testing and training of the two models are presented. According to the experiments on the test set of MNIST, it is observed that the residual neural network can achieve better performance where the accuracy score is 99.240%, while the accuracy score of the multi-layer perceptron model is 97.260%.
Currently, advanced technologies such as big data, artificial intelligence and machine learning are undergoing rapid development. However, the emergence of cybersecurity and privacy leakage problems has resulted in serious implications. This paper discusses the current state of privacy security issues in the field of machine learning in a comprehensive manner. During machine training, training models often unconsciously extract and record private information from raw data, and in addition, third-party attackers are interested in maliciously extracting private information from raw data. This paper first provides a quick introduction to the validation criterion in privacy-preserving strategies, based on which algorithms can account for and validate the privacy leakage problem during machine learning. The paper then describes different privacy-preserving strategies based mainly on federation learning that focus on Differentially Private Federated Averaging and Privacy-Preserving Asynchronous Federated Learning Mechanism and provides an analysis and discussion of their advantages and disadvantages. By improving the original machine learning methods, such as improving the parameter values and limiting the range of features, the possibility of privacy leakage during machine learning is successfully reduced. However, the different privacy-preserving strategies are mainly limited to changing the parameters of the original model training method, which leads to limitations in the training method, such as reduced efficiency or difficulty in training under certain conditions.
Stock analysis is a challenging task that involves modelling complex and nonlinear dynamics of stock prices and volumes. Long Short-Term Memory (LSTM) is a type of recurrent neural network that can capture long-term dependencies and temporal patterns in time series data. In this paper, a stock analysis method based on LSTM is proposed that can predict future stock prices and transactions using historical data. Yfinance is used to obtain stock data of four technology companies (i.e. Apple, Google, Microsoft, and Amazon) and apply LSTM to extract features and forecast trends. Various techniques are also used such as moving average, correlation analysis, and risk assessment to evaluate the performance and risk of different stocks. When compare the method in this paper with other neural network models such as RNN and GRU, the result show that LSTM achieves better accuracy and stability in stock prediction. This paper demonstrates the effectiveness and applicability of LSTM method through experiments on real-world data sets.
Heart failure is a complex medical condition that arises due to the heart's inability to adequately circulate blood throughout the body, which is challenging to predict. This research aims to investigate three distinct models, namely logistic regression, random forest and decision tree generation algorithms. Logistic regression is essentially a logistic function applied to linear regression, and the loss function associated with linear regression is similar to the mean square error-like loss. In contrast, the loss function for logistic regression follows cross-entropy loss. while cross-entropy loss is often used in practice, it differs from mean square error loss. The derivative of cross-entropy loss is a difference that updates rapidly when the error is significant and slowly when the error is small, which is a desirable trait for the purposes. Decision tree generation algorithms utilize tree structures in which internal nodes represent judgments on attributes, branches represent outputs of judgments, and leaf nodes represent classification results. Random forest is an integrated learning algorithm that employs decision trees as the base learner. In classification models, multiple decision trees are processed for voting, while multiple decision tree results are processed for averaging in regression models. Experimental results indicate that random forest outperforms the other two models, albeit with a marginal difference. Further studies should incorporate additional models to identify a more suitable model for predicting heart failure.
Stroke is a leading cause of death and disability worldwide, which requires the accurate and timely diagnosis for effective stroke management. Based on the Kaggle dataset, data preprocessing, which included addressing missing values, encoding categorical variables, and normalising numerical features, was done first in the study. Next, this paper implemented three commonly used machine learning models: logistic regression, decision tree, and random forest. To assess the performance of the models, the paper applied accuracy as the evaluation metric, which measures the proportion of correct predictions out of all predictions. This study also identified the most important features affecting stroke risk using feature importance analysis provided by the machine learning. All three models achieved accuracy rates, according to the experimental findings, albeit random forest outperformed the other two models. The reliability of the models for random forest, decision tree, and logistic regression were 0.963, 0.925, and 0.961, respectively. Feature importance analysis revealed that age, average glucose level, and work type were the most important predictors of stroke risk. Findings in this study suggest that machine learning algorithms, particularly the Logistic Regression model, can effectively predict the likelihood of stroke using the Stroke Prediction Dataset. These findings are in line with other research that also showed how machine learning has the potential to enhance stroke diagnosis. The identification of important features affecting stroke risk can provide valuable insights for clinicians and researchers in developing targeted interventions for stroke prevention and management.
Artificial intelligence is a branch of computer science, an intelligent system that can simulate human thinking, recognize complex situations, acquire learning abilities and knowledge, and solve problems. With the continuous development of information technology, artificial intelligence techniques are increasingly being improved and applied to large-scale genetics research, image detection and classification in medicine. Predictive models for medical data can be built using a wide range of machine learning algorithms: decision trees, multilayer perceptrons, plain Bayes, random forests, and support vector machines, etc., thus processing massive, high-dimensional data and conducting medical research. This paper addresses the specific applications of artificial intelligence in medical practice.
The article focuses on the application and development of the Stable Diffusion module in the field of artificial intelligence image generation. The article presents a comprehensive description, analysis and discussion of the module's overview, operating environment, usage methods and its instructions, and points out the corresponding advantages and disadvantages.
In the 21st century, there has been a growing importance placed on the "body" of artificial intelligence, particularly as it relates to language processing. Researchers have developed various machine learning models with a focus on language understanding, including Large Language Model (LLM), Bidirectional Encoder Representation from Transformers (BERT), and Natural Language Processing (NLP). These models have led to the development of numerous applications, such as ChatGPT-3.5, which has recently gained widespread attention. In addition to ChatGPT, other applications have also benefited from these language processing models, including Question Answering Systems (QAS). This paper will examine three QAS that have been enhanced by the context of ChatGPT, discuss the relevant applications, and analyze these different applications in order to predict future trends in this field. One notable QAS is OpenAI's GPT-3-powered AI that can answer questions about any topic. This application leverages the capabilities of GPT-3 to provide accurate and informative responses to a wide range of questions. Another QAS is IBM's Watson, which utilizes natural language processing and machine learning algorithms to understand and respond to user queries. Watson has been used in various industries, including healthcare, finance, and retail. A third QAS is Google's BERT-based system, which uses pre-trained language models to improve its responses to user queries. This system has been integrated into Google Search and other products, allowing users to receive more precise and relevant search results. Overall, the development of these QAS and other language processing applications marks an exciting period of progress in the field of artificial intelligence. As researchers continue to refine these models and explore new applications, we can expect to see even more advanced and sophisticated language processing systems emerge in the future.
Due to the ubiquity of the internet, cyber-attacks implemented through websites have become a severe issue with high frequency and appreciable overall financial damage. Detecting malicious URLs has become one of the most common solutions to tackle this threat, which is widely applied in the market and researched. Inspired by relevant work on URL classification using n-gram techniques and convolutional neural networks in other research areas, a method for detecting malicious websites using n-gram statistical features of URLs and a VGG-style neural network has been developed, which aims to provide classification for multiple website classes with arbitrary URL input lengths. Experimental results show that the method proposed in this paper provides an average accuracy of 96.60% on the 5-class ISCX-URL2016 dataset and 96.33% on the 4-class Malicious URLs dataset, which is 1.5 times larger. A further comparison reveals that the accuracies are competitive with similar methods for binary classifications that also use either n-gram features or a VGG-based network.
The paper discussed the process of data processing and algorithm selection for three different scenarios in order to improve accuracy in detecting DDOS attacks, SPAM emails, and malware. It provided detailed descriptions of each process involved in the simulation. For the DDOS attack detection simulation, three different datasets were used, and missing data was removed to ensure the quality of the data. In addition, features were processed to make sure they could be applied to specific algorithms. Both decision tree and random forest algorithms were selected and tuned to obtain maximum accuracy. Similarly, for the SPAM email detection simulation, binary was used to represent whether an email was spam or not, and Count Vectorizer function was applied to convert mail contents into feature vectors. The KNN and decision tree algorithms were chosen, and emphasis was given on parameter adjustment to eliminate overfitting and ensure optimal model accuracy. The paper also discussed the importance of considering multiple factors when selecting and tuning algorithms, such as accuracy, complexity, and computational efficiency. These factors must be balanced to achieve the best overall performance. Overall, the paper provided a comprehensive overview of the methods and processes involved in data processing and algorithm selection to improve detection accuracy for DDOS attacks, SPAM emails, and malware. This research can greatly benefit organizations that are looking to enhance their security measures and minimize the risks associated with these cyber threats.
In the current world of consistent cybersecurity threats, the priority of protecting precious data from malicious activities has never come this high. A network of infected computers that are under the control of bad actors is known as a botnet. These networks may be used for a variety of things, including spam distribution, distributed denial of service (DDoS) assaults, identity theft, and malware distribution. A botnet's constituent computers are frequently referred to as "bots" or "zombies.". And there have been appalling statistics of a 100% increase in DDoS attacks from 2021 to 2022, and attackers have been consistently evolving, implementing smaller, yet more persistent attacks. Fortunately, the measures for protecting computers from botnet attacks have also been evolving. The very first step to defending against botnet attacks is to spot suspicious requests, and in this paper, the machine learning method is utilized to help pinpoint the potential attacks. First, a comprehensive dataset is found and used to train the model. This is a dataset consisting of source IPs, protocols, bidirectional flows, packets and a total of 33 features of internet flows with a mix of normal and malicious internet flow. As for the models performed, random forest and logistic regression were chosen and run with 80 percent of the data from the dataset as a training set and 20 percent as a testing set. Overall, the two models perform greatly with the given dataset. It is a very basic study on the prevention of botnet detection, yet certainly, it provides insights and contributions into further developments in cybersecurity.
This paper introduces a novel machine learning-based approach for the detection of Distributed Denial of Service (DDoS) attacks. The proposed method employs three different classifiers, namely Support Vector Machine (SVM), Random Forest, and Multilayer Perceptron (MLP), to accurately classify network traffic as normal or malicious. The approach incorporates various features extracted from network traffic, such as packet length, packet inter-arrival time, and destination port number, to train the classifiers. The results demonstrate high accuracy rates, with Random Forest outperforming the other classifiers in terms of detection accuracy. This proposed method offers a promising solution for detecting DDoS attacks in real-time and has the potential to be integrated into existing network security systems. The issue of DDoS attacks is becoming increasingly critical, with the proliferation of connected devices and the growing dependence on the Internet. There is an urgent need for advanced techniques to detect and mitigate such attacks. Machine learning approaches have shown great potential in identifying anomalous behavior and detecting DDoS attacks in real-time. The proposed method is a promising step towards achieving this goal. Furthermore, the proposed approach has practical applications in the context of network security. By integrating this method into existing security systems, it will enhance the system's ability to detect and prevent DDoS attacks. The approach can be implemented in different network environments, making it versatile and applicable to a variety of settings. In conclusion, the proposed machine learning-based approach provides a robust and effective solution for the detection of DDoS attacks. It is capable of accurately classifying network traffic as normal or malicious in real-time, and has the potential to enhance the overall security of networking systems.
This paper explores the utilization of blockchain token voting technology in student course selection systems. The current course selection systems face various issues, which can be mitigated through the implementation of blockchain technology. The advantages of blockchain technology, including consensus mechanisms and smart contracts, are discussed in detail. The token voting mechanism, encompassing concepts, token issuance and distribution, and voting rules and procedures, is also explained. The system design takes into account the system architecture, user roles and permissions, course information on the blockchain, student course selection voting process, and course selection result statistics and public display. The technology offers advantages such as transparency, fairness, data security and privacy protection, and system efficiency improvement. However, it also poses several challenges, such as technological and regulatory hurdles. The prospects for the application of blockchain token voting technology in student course selection systems and its potential impact on other fields are summarized. Overall, the utilization of blockchain token voting technology in student course selection systems holds promising future implications, which could revolutionize the education sector.
The integration of instructional resources in colleges is gradually entering the innovation era from electronic and information technology. A new issue that has emerged is the storage and reading of electronic instructional resources in colleges. Additionally, with the rapid development of blockchain technology, its use in numerous fields has burgeoned. In the information management field, confidential and reliable technologies are still needed to help manage resource storage. Blockchain, which is immutable and untrusted, can be well applied in the establishment of an educational resource sharing platform in universities. The main focus of this paper is as follows: first, to analyze and compare the three educational resource sharing systems proposed by Chinese scholars, enumerate the main contents and functions of each system, compare and analyze the differences of the three systems, and secondly, to put forward the construction framework of a new educational resource sharing platform according to the disadvantages still existing in the three systems. The paper also explores the research on college instructional resource sharing platforms using blockchain. It uses blockchain techniques to build an educational resource sharing system, provides a reliable platform with high security and traceability through intelligent contract technology, and connects user groups through "alliance chain + private chain". Its aim is to facilitate efficient use by users of different levels. The advantages of using blockchain technology to construct a sharing platform for university educational resources have practical significance in further promoting educational informatization.
The increasing popularity of GameFi, a combination of gaming and financial models, has led to the emergence of certain problems with current financial models that can only be overcome through a transition towards de-game, or full-on-chain games. This paper provides an overview of the history and financial models of GameFi, identifies pain points on the path of transition from GameFi to de-game, such as capacity and TPS, and presents a concept for a de-game with achievable performance levels. Both single and double currency models have limitations that prevent them from reaching their full potential, demonstrating the need for a transition to de-game, capable of offering unique gaming experiences and financial rewards. Overcoming pain points such as capacity and TPS will be essential in achieving this transition. Proposed solutions include using sharding technology and optimizing smart contract architecture to enhance system capacity and TPS. Ultimately, this paper highlights both the challenges and opportunities presented by GameFi, proposing the transition to de-game to overcome the limitations of current financial models. Addressing critical pain points such as capacity and TPS, our proposed solutions offer feasible approaches to building reliable on-chain gaming systems. The future vision of GameFi in the form of de-game offers a promising outlook for the gaming industry, where players can engage in immersive games with financial incentives.
Building on the fundamental principles of quantum computing, this paper sequentially presents the development of quantum computing and its associated application domains. We also analyze and compare the differences between quantum and classical computing, highlighting the benefits of quantum computing for specific problems. To comprehend the advantages of quantum computing in particular fields, extensive research has been conducted in three areas: cryptography, chemistry, and artificial intelligence. We also discuss the wider applications of this emerging technology for future investigation, considering the current state of quantum computing development and anticipated trends.
The phenomenon of transfer learning, specifically the transferability of ImageNet, in the context of galaxy datasets has been relatively under-researched. This study seeks to address this gap by employing the Galaxy10 DECals dataset, which is a 10-classification dataset. Three classic models, namely MobileNet, VGG, and ResNet, were developed and customized to the current dataset by modifying the number of neurons in the last layer. The experimental phase is divided into three parts, including the comparison between the use of ImageNet and non-ImageNet, model performance comparison, and confusion matrix analysis. The results demonstrate that the utilization of ImageNet produces better outcomes, with the MobileNet model exhibiting the highest performance. Further analysis revealed that the inclusion of ImageNet weights can enhance the classification accuracy of some data. Although the present study was successful in achieving its objectives, future research should focus on exploring and elucidating the underlying mechanisms driving these findings.
Wildfires have emerged as a pressing issue in many regions of the world due to the ongoing impact of global warming on the planet. However, a reliable and high-performance detection system is currently lacking. This study strives to introduce a wildfire detection system that is based on neural networks and image recognition. This study utilized Edge Impulse to train a neural network to identify wildfire occurrences in the given pictures. To optimize the performance and adaptability of the model, an extensive dataset was compiled by collating images from two Kaggle projects, resulting in a final dataset of over 3000 images. The core technological advancements that Edge Impulse is based on are Auto Deep Learning (AutoDL) and Convolutional Neural Networks (CNN). By applying technologies like Neural Architecture Search (NAS), hyperparameters optimization, and transfer learning, AutoDL enables people interested in machine learning to approach the technology without an extensive understanding of math or programming that machine learning was built upon. CNN is a highly effective and efficient form of neural network popular for image classification. It consists of three different layers: the convolutional layer, the pooling layer, and the fully connected layer. The result of this study consists of a fully functional model for wildfire detection that is ready to be deployed, with a final testing accuracy of over 99%.