Proceedings of the 5th International Conference on Computing and Data Science
Alan Wang, University of Auckland
Marwan Omar, Illinois Institute of Technology
Roman Bauer, University of Surrey
Facial Emotion Recognition (FER) holds great importance in the fields of computer vision and machine learning. In this study, the aim is to improve the accuracy of facial expression recognition by incorporating attention mechanisms into Convolutional Neural Networks (CNN) with FER2013 dataset, which consists of grayscale images categorized into seven expressions. The combination of proposed CNN architecture and attention mechanisms is thoroughly elucidated, emphasizing the operations and interactions of their components. Additionally, the effectiveness of the new model is evaluated through experiments, comparing its performance with existing approaches in terms of accuracy. Besides, the results demonstrate that the CNN architecture with attention mechanisms outperforms the original CNN by achieving an improved accuracy rate of 69.07%, which is higher than 68.04% accuracy rate of original CNN. Moreover, the study further discusses the confusion matrix analysis, revealing the challenges faced in recognizing specific emotions due to limited training data and vague facial features. In the future, this study suggests addressing these limitations through data augmentation and to reduce the gap between training and testing accuracy. Overall, this research highlights the potential of attention mechanisms in enhancing facial expression recognition systems, paving the way for advanced applications in various domains.
The potential of electric vehicles (EVs) to reduce greenhouse gas emissions in the transportation sector has given the adoption of EVs a considerable boost in recent years. Concurrently, the field of big data analytics has witnessed exponential growth, providing unprecedented opportunities for extracting valuable insights and optimizing various industrial sectors. This paper presents a comprehensive overview of the intersection between electric vehicles and big data analysis. Various EV-related data sources are explored along with the discussion of data computing platforms. Followed by this, this paper analyzes different use cases of big data analysis in EVs, covering key areas such as energy management, charging infrastructure optimization, and vehicle condition monitoring, which demonstrates how big data can be crucial for the successful integration of EVs into green smart cities. Finally, the author provides future research insight and opportunities for the use of big data techniques in EV adoptions. In particular, this paper serves as a roadmap for future research in the area of data analytics in EV integration.
While digital meters can automatically communicate with a database to store reading values, many industries still utilize analog meters which are not economically or physically viable to replace. Instead, computer vision modules may be attached to cameras to automatically read values and record them. This paper tests three implementations of gauge reading, two simple Hough line and circle transform methods and one lightweight naive line rotation method. A dataset was created for testing purposes, consisting of 46 images from various sources and variations including pointer rotations, binarizations, and text or logo removal. Results showed the line rotation technique substantially more robust and accurate than both Hough line implementations. Two major obstructions were detected: pointer tails and dense text/logos, and their removal via photoshop tools improved the average accuracy to roughly 1 degree from ground truth. This is accurate enough to replace human readers in most imprecise situations and is lightweight enough to function under nearly all circumstances. Future research seeks to validate these findings further by testing line rotation on more varied gauges.
The challenge of addressing the issue of low accuracy in specific scenarios encountered during the implementation of facial emotion recognition systems arises due to the wide array of environments and varying conditions. In this study, the Facial Expression Recognition-2013 (FER-2013) dataset sourced from the Kaggle serves as the basis for training the models, with subsequent analysis conducted on the experimental outcomes. The dataset comprises a training set and a testing set, each annotated with labels representing seven distinct emotions, ranging from "angry" to "surprise". The models developed for facial emotion classification, tasked with automatically recognizing emotions based on provided images, consist of a MobileNet-based model and a self-built model employing convolutional neural networks. Both models exhibit an accuracy of approximately 60%, yet demonstrate deficiencies in predicting the "neutral" label. Additionally, the utilization of techniques such as confusion matrix and saliency map enable the comparative evaluation of model performance across different emotion labels and facilitates an analysis of their corresponding dominant facial regions. Based on a comparison of results obtained from representative cases, two potential factors contributing to these limitations are identified: a paucity of training data and the presence of ambiguous features. The findings of this study are expected to inform future directions for improvement and modification of facial emotion recognition models in order to enhance their applicability in diverse scenarios.
Brain tumor, recognized as one of the most formidable and aggressive diseases globally, continues to pose significant challenges for medical practitioners in clinical diagnosis and treatment. Addressing the burden on doctors and addressing resource limitations in certain hospitals necessitates the development of efficient and dependable alternative approaches. Convolutional Neural Networks (CNNs), renowned for their prowess in image recognition, hold immense potential in addressing this pressing issue. Leveraging transfer learning, the capabilities of established models such as VGG-16 and MobileNet can be harnessed to construct superior models within a comparatively abbreviated timeframe. This research paper aims to construct and evaluate VGG-16 and MobileNet-based models, employing transfer learning, to explore the applicability of these two classical models in the context of brain tumor diagnosis. The ultimate goal is to assist doctors and hospitals in alleviating the challenges associated with brain tumor diagnoses. The results demonstrated the effectiveness of brain tumor recognition based on CNNs.
The accurate detection of emotions holds significant importance in the field of psychology, necessitating the careful selection of an appropriate model for facial expression classification. In this study, emotion detection is the classification task to compare the performance of MobileNet, ResNet, and DenseNet. For the detailed model, MobileNet, ResNet50, and DenseNet169 are selected for comparative analysis. The dataset FER-2013 is from Kaggle, which contains a training set and test set consisting of 29, 709 samples and 3589 samples, respectively, with seven facial expression categories. In terms of preprocessing, normalization, and data augmentation are considered. The whole dataset is normalized by dividing 255 and augmented with a Keras image generator. In the model-building step, the structure of the test models is controlled in the same structure. The pre-trained model from the Keras application connects with one global average pooling layer and adds one dense layer at the last as the output layer with the SoftMax activation function. Moreover, this study kept all hyper all parameters the same during the training period. After the model training, the confusion matrix is used to show the class relativity and the loss and accuracy of each model are plotted for analysis. Experimental results demonstrated that the MobileNet achieves 56.08% accuracy on test set which is more competitive than the DenseNet169 and ResNet50 and provides a relatively stable loss.
Galaxy morphology classification is essential for studying the formation and evolution of galaxies. However, previous studies based on Convolutional Neural Network (CNN) mainly focused on the structure of convolutional layers without exploring the designs of fully connected layers. In this regard, this paper trains and compares the performance of CNNs with 4 types of fully connected layers on the Galaxy10 DECaLS dataset. Each type of the fully connected layers contains one dropout layer, and dropout rates including 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90% are tested in the experiment to investigate how dropout rates in fully connected layers can affect the overall performance of CNNs. Meanwhile, these models utilize the EfficientNetB0 and DensnNet121 as their feature extraction networks. During the training process, feature-wise standardization, morphological operations, and data argumentation are used for preprocessing. Technics including class weights, exponential learning rate decay, and early stopping are applied to improve model performance. Saliency maps and Grad-CAM are also used to interpret model behaviours. Results show that the architectures of fully connected layers have a significant effect on models’ overall performance. With the same dropout rate and convolutional layers, models using global average pooling and an additional dense layer outperform others in most cases. The best model obtained an accuracy of 85.23% on test set. Meanwhile, the experimental results on dropout indicate that dropout layer can reduce the effect of the architectures of fully connected layers on overall performance of some CNNs, leading to better performance with less parameters.
The national income level has always been a topic of concern, and there are many influences that affect the income. This paper focuses on the national work, age, education, marriage, gender, weekly working hours and other dimensions to explore the types of people with annual income above $50,000. In this paper, we select the data collected from the U.S. Census as the data set, divide the training set and the test set, and then construct logistic regression and decision tree models to predict the national income respectively. The experimental results show that the ACC of the logistic regression model is 0.773 and the AUC is 0.515, and the ACC of the decision tree model is 0.860 and the AUC is 0.900. It is verified that the decision tree has better performance in predicting national income.
Due to recent growth in technology, machine learning has emerged to be an effective auxiliary tool in medical field. However, the effectiveness of transfer learning architectures trained on non-medical image data remains unclear. In this paper, two VGG-16 models, a type of pre-trained Convolutional Neural Network architecture, were constructed to classify kidney CT images that belong to four categories: normal, stone, cyst, and tumor. Two VGG-16 models have identical parameters except for the pre-trained weights: one has pre-trained weights trained on ImageNet, and the other one trained on a random large-scale dataset. To gather a more detained insight into model’s performances, saliency maps and Grad-CAM are employed to assess the models' ability to extract relevant features from the CT images. The result demonstrated that VGG-16 model that is trained on ImageNet can achieve 98.96% accuracy, which is about 30% higher than the other VGG-16 model. The saliency maps and Grad-CAM also support the difference in test accuracy: the model with random pre-trained dataset has saliency map that highlights the whole picture and Grad-CAM image that does not highlight any part of the CT image data, showing that it cannot correctly locate the key features. Additionally, the model with ImageNet can correctly highlight the principal features in both maps. In this study, the utilization of ImageNet is proven to be effective in the usage of transfer learning in processing medical image. Future research and exploration should focus on further enhancing the application of transfer learning in the medical field.
With its potential to revolutionize a wide range of applications, including lie detection, social robotics, and driver fatigue detection, facial expression recognition is a field that is rapidly expanding. However, traditional machine learning methods have struggled with facial expression recognition due to limitations such as manual feature selection and limited representation capabilities. Additionally, these methods require large amounts of annotated data, which can be time-consuming and expensive to obtain. In order to overcome these difficulties, this paper suggests a novel method that builds recognition models using a multi-layer perceptron (MLP) and ResNet. This hybrid model offers improved performance over conventional CNN models, achieving an impressive accuracy rate of 85.71% on the FER_2013 dataset. Additionally, migration learning is used to increase the model's precision and avoid over-fitting. The FER_2013 dataset is used to train and test the model. The results of the trials show that the suggested model can recognize facial expressions while minimizing the overfitting problem typically associated with deep learning. The model will eventually include a self-attentive mechanism in the study in an effort to improve model resolution. By using it with color images, the team also hopes to increase the model's capacity for generalization.
In recent decades, integrated circuits (IC) have played a more significant role in many areas of society, and silicon is one of the most commonly used materials in this industry. However, as times change and with the development of technology, silicon can only partially meet people's requirements for device performance. Hence, people need new materials to produce more effective IC equipment and devices like chips. Here I present two kinds of materials, molybdenum disulfide and graphene, to discuss their properties of them and the realistic assessment of the prospect that they replace silicon for IC.
This essay provides a detailed analysis of the evolution of microchip technology from its inception to its prospects. It starts by discussing the foundational work of pioneers like Jack Kilby and Robert Noyce and the transformative effect of Moore's Law on chip design and manufacturing. It then evaluates the current state of chip technology, including the leading industry players and challenges such as chip shortages and trade disputes. The essay then explores the anticipated trends in chip technology, such as quantum computing and innovative materials like graphene. It concludes by assessing the expected influence of these advancements on various sectors, including Artificial Intelligence (AI), the Internet of Things (IoT), and autonomous vehicles, and their broader social, economic, and environmental implications. This essay presents a comprehensive and thoughtful analysis of microchip technology's past, present, and future and its far-reaching impact on modern society.
The recognition of similar facial expressions presents a notable challenge, necessitating a focus on the parameters within the fundamental Convolutional Neural Network (CNN) architecture, which serves as a cornerstone in the field of image classification. This research endeavor aims to enhance the model's capacity for facial expression recognition by employing a controlled variable method to examine two specific parameters in a self-designed small CNN: the number of filters and convolutional layers. More specifically, while the filters were fixed at 3, the layers varied from 3 to 6 to 9. Similarly, as the number of the filters rose to 6, the number of the layers also incremented from 3 to 6 to 9. Furthermore, while the number of the filters reached 12, the number of the layers went from 3 to 6 to 9 too. Finally, with the filters increasing to 24, the layers rose from 3 to 6 to 9 as well. Experimental results indicate that both increasing the number of filters and convolutional layers can increment the performance of model in facial expression recognition. Furthermore, increasing the number of filters can exert a more prominent influence on improving the accuracy of facial expression recognition.
Finding an efficient and accurate adaptive method that can automatically classify galaxies has become an industry consensus. However, most of the current studies on galaxy classification use a single model for direct output, without considering the combination with other models to output more satisfactory prediction results. Through convolutional neural network and classifier, this study studied the possibility of applying the deep learning model to the Galaxy 10 DECals dataset classification, and proposed DenseNet-Random Forest model through comparative analysis. By adjusting and training DenseNet-121 with appropriate hyperparameters, the input tensor is transferred to the basic model through the creation of a shape input layer, where GlobalAveragePooling2D is added to perform an average pooling operation on each feature map, reducing the spatial dimension of each feature map to 1. During the process, a complete connection layer with 64 neurons was added using the ReLU activation function, and a Dropout layer was added to randomly discard 20% of the neurons during training to prevent overfitting. In addition, ReLU Activation function with 32 full connection layers of neurons and softmax Activation function with 10 output layers of neurons are added. By acquiring the feature vector of the training model and the real label of the verification set, assign x and y values respectively, and import them into the Random forest classifier model. The experimental results demonstrated the model ultimately achieved a prediction accuracy of 68% when processing the Galaxy 10 DECals dataset, and achieved nearly 30% improvement in Precision, Recall, and F1 scores.
Federated learning, a machine learning technique that enables collaborative model training on decentralized data, has gained significant attention in recent years due to its potential to address privacy concerns. This paper explores the evolution, applications, and challenges of federated learning. The research topic focuses on providing a comprehensive understanding of federated learning, its advantages, and limitations. The purpose of the study is to highlight the importance of federated learning in preserving data privacy and enabling collaborative model training. The study conducted a literature review by systematically analyzing relevant papers from peer-reviewed journals, conference proceedings, and reputable sources. The results reveal that federated learning offers a promising solution for collaborative machine learning while addressing concerns related to data privacy and security. The study emphasizes the need for further research in optimizing communication protocols, scalability, and privacy-preserving techniques. Overall, this paper contributes to the understanding of federated learning and its potential for secure and efficient decentralized learning paradigms.
Precise and efficient product kitting evaluations are vital for boosting assembly performance and customer contentment for varied, small-batch, and intricate electromechanical products. However, current enterprise resource planning systems (ERP) provide inadequate support for analyzing the kitting of products. How to evaluate the degree to which customers' demands for a complete set of multiple products can be met when adjusting the importance level of customer demands without changing the customer demand date in the production system and affecting the scheduled customer orders has become one of the urgent problems that enterprises need to solve. In response to the above issues, this article presents a method for evaluating product kitting, which takes the maximum quantity of complete sets of products as the optimization objective, considers multiple constraints such as material inventory data and in process business data, and designs a multi-level BOM decomposition algorithm. Finally, a case study of a company producing core components for standard high-speed trains demonstrates the practicality and efficacy of the proposed method.
The development of digital technology has made non-linear landscape structures mainstream and has brought new challenges to design methods. The efficiency and accuracy of applying Rhino+Grasshopper parametric modelling to such structures has gained the attention of design practitioners and scholars. However, there are still research gaps in this approach in current research. Therefore, this study uses the ‘Flying Goose’ viewing platform as a case study to supplement the lack of specific case studies on non-linear landscape structures; it analyses the modelling logic through three parts: morphological characteristics, structural system, appendages and decoration to supplement the lack of systematic knowledge on modelling logic; it analyses the four methods of unfolding control, isometric projection, function shaping and linear interference. The analysis and discussion of the four core precision control methods used in the Rhino+Grasshopper parametric modelling process complements the current lack of specific research on Rhino+Grasshopper parametric modelling morphological precision control methods. The aim is to provide a systematic analysis of the modelling logic of non-linear landscape structures and a detailed analysis of the precision control methods on top of the non-linear landscape structures morphological realisation, providing a theoretical basis and application examples for future research of the same type.
In this era of extraordinary accessibility to information, business faces both unprecedented obstacles and opportunities. Constantly accumulating data encompasses everything from consumer behavior to market trends. However, the question of how to extract useful information from this enormous quantity of data and apply it to economic decision-making becomes crucial. Complex non-linear relationships and high-dimensional data frequently render conventional statistical methods and economic models ineffective. Integration of data science and machine learning techniques has enabled economists to extract valuable insights from large-scale and complex economic data. By examining the role of data science and machine learning in economics and tracing its historical development from the refinement of statistics to the era of big data with advanced computational power, this paper will discuss the significance of data-driven decision making and forecasting in the economy with specific algorithm in supervised and unsupervised learning and focus on future challenges and developments.
Because of its tremendous potential for financial advantage, the academic and business communities have begun to focus a significant amount of attention on the intelligent logistics system. For the purpose of addressing the issue of quality supervision of fruit while it is being transported logistically, this study presents a proposal for an intelligent logistics system that is based on the detection of fruit freshness. The physical, chemical, and image processing techniques that are utilized at various points during the logistics process make up the core detection methods utilized by this system. The ability of the system to perform monitoring of the freshness of the fruit in real time guarantees that the fruit will be transported in a manner that is both secure and effective. The results of the analysis show that this system plays an essential part in a number of steps, including fruit harvesting, fruit sorting, transportation planning, and fruit delivery right to consumers’ front doors.
We employed Neural Architecture Search (NAS) technology in this research paper and com-pared it with the classic convolutional neural network structures LeNet-15 and VGG (Visu-al Geometry Group)16. The objective was to optimize the performance of early warning forest fire image classification tasks. We adopted a public dataset "Forest Fire Early Warning 2 Clas-sification" from the Baidu PaddlePaddle platform, which comprises images of fires and no-fires under various environmental conditions. A convolutional neural network (CNN) model was automatically designed through AutoKeras and underwent 20 training epochs. In the ex-perimental results, NAS outperformed with an accuracy rate of 92%, surpassing LeNet-15 (83%) and VGG16 (49%). However, its training time was longer at 33 seconds, and GPU utili-zation was higher, ranging from 28% to 33%. Despite the room for improvement in training time and resource utilization, NAS has proven its superiority in complex image classification tasks due to its high accuracy. Although NAS has room for enhancement in terms of training time and resource utilization, its outstanding performance in the image recognition task of ear-ly forest fire warning shows great potential for future research and application.