Proceedings of the 3rd International Conference on Signal Processing and Machine Learning
Omer Burak Istanbullu, Eskisehir Osmangazi University
The aim of developing the technology of "image captioning," which integrates natural language and computer processing, is to automatically give descriptions for photographs by the machine itself. The work can be separated into two parts, which depends on correctly comprehending both language and images from a semantic and syntactic perspective. In light of the growing body of information on the subject, it is getting harder to stay abreast of the most recent advancements in the area of image captioning. Nevertheless, the review papers that are now available don't go into enough detail about those findings. The approaches, benchmarks, datasets, and assessment metrics currently in use for picture captioning are reviewed in this work. The majority of the field's ongoing study is concentrated on robust learning-based techniques, where deep reinforcement, adversarial learning, and attention processes all seem to be at the heart of this research area. Image captioning entails a brand-new field in research on computer vision. Generating a comprehensive natural language description for the source images is the fundamental issue of image captioning. This essay explores and evaluates earlier work on image captioning. Image captioning's application and task situations are introduced. The merits and disadvantages of each approach are explored after the analysis of the image captioning algorithms based on encoder-decoder and template structure. The assessment and baseline dataset for picture captioning are therefore shown. Ultimately, prospects for image captioning's progress are presented.
With the rapid development of the Internet and e-commerce, recommender systems have received great attention and wide application in this environment. Because it is difficult for people to choose the one that they like in the face of the dazzling array of items on the Internet, and these e-commerce sites also need to consider how to improve efficiency, the recommendation system is an excellent solution. This paper mainly reviewed the development of recommender systems, focusing on the research and experiments of a recommender system based on an item-based collaborative filtering algorithm. According to the experimental results and some previous studies, summarizing the advantages and disadvantages of this method, proposing some solutions, and pointing out some problems that will be faced by future researches on recommendation systems.
Recent advancements in the study of posttraumatic stress disorder (PTSD) have led to the discovery of innovative improvements to therapies that have already received empirical validation. The purpose of this paper is to investigate the theoretical feasibility and expected effects of a new treatment approach for the adolescent PTSD patient population that combines VR virtual reality technology with traditional treatment modalities by referring to relevant studies, literature, and survey feedback from the relevant groups. The main focus is on the use of virtual reality technology to address the reluctance of the adolescent patient population to accept treatment and to explore other possibilities for the development of a relevant target population. The limitations and drawbacks of current VR systems in the treatment of psychological disorders are also discussed, but theoretical solutions are also given. The specific role of the senses in the theoretical model and the role and usefulness for patients, physicians, and others, respectively, are also given.
A heuristic global optimization algorithm is the Ant colony algorithm, with several advantages, such as robustness, so the algorithm can be used in many fields of our daily life. This article briefly explains some of the principles of the basic ant colony algorithm, and a detailed analysis of the representative improved algorithm models, is carried out. Moreover, the research status of Ant Colony Algorithm in several fields, like travelling salesman problem, path planning problem, routing problem are also summarized in this paper.
All industries employ machine learning extensively, and one of the most promising fields is computer vision. Computer vision is a simulation of the human visual system that uses cameras and computers to take the role of the human eye to find the target, follow it, and gather data from it so that a decision may be made on whether to take further action or provide recommendations. The various uses of computer vision in sports are covered in this paper. Currently, computer vision is mostly utilized for broadcast enhancement, tracking and detection of players and balls. Although the game’s graphics has been substantially improved by this technology, there are still several flaws. For instance, some areas are not suited to employ this technology. Another is the issue of players being blocked in multiplayer sports. For broadcasters, computer vision has significant commercial value. For athletes, this technique can improve their performance.
As the gaming industry gradually expands, more and more people begin taking notice of the industry. Though with blossom, problems emerge. Many people don’t want to take the time to master a video game, so some will then resort to cheating. This is terrible news for developers as it ruins their reputation and player base. This paper will lay the groundwork for anyone interested in the industry on what has already been done to fight cheaters, as little research was conducted. This paper, will introduce the modern online connection architecture, categorize the most common cheats, and most importantly, introduce modern-day anti-cheat methods. The anti-cheat methods will be analyzed on their effectiveness on different online connection architecture and the type of cheat it works against. Lastly, the paper will also introduce the idea of kernel-level and its impact on anti-cheat.
With the popularization and development of the concept of artificial intelligence, the application of artificial intelligence has also begun to deepen into people's lives. While bringing convenience to people, it has also made some people worry about whether artificial intelligence will replace humans. Therefore, In order to make people understand the current development status and bottlenecks of artificial intelligence more intuitively, as well as the difference between artificial intelligence and human brain, this article will turn from speech recognition and natural language processing, human-computer dialogue, image recognition, and machine learning ability, that is, machine listening, reading, and thinking four aspects of research and discussion, and finally summarize why artificial intelligence cannot completely surpass humans.
As human approaches the big data period, artificial intelligence becomes dominating in almost every domain. As part of machine learning, reinforcement learning (RL) is intended to utilize mutual communication experiences around the world and assess feedback to strengthen human ability in decision-making. Unlike traditional supervised learning, RL is able to sample, assess and order the delayed feedback decision-making at the same time. This characteristic of RL makes it powerful when it comes to exploring a solution in the medical field. This paper investigates the wide application of RL in the medical field. Including two major parts of the medical field: artificial diagnosis and precision medicine, this paper first introduces several algorithms of RL in each part, then states the inefficiency and unsolved difficulty in this area, together with the future investigation direction of RL. This paper provides researchers with multiple feasible algorithms, supported methods and theoretical analysis, which pave the way for future development of reinforcement learning in medical field.
Scientific and reasonable analysis and determination of operational capability requirements can not only optimize and improve the operational concept, but also ensure the transformation and application of the operational concept. The development and construction of the traction force and the improvement of operational capability play a key role in the transformation of operational theory to actual combat capability. It is urgent to study scientific and applicable operational capability requirements analysis methods to support the development process of the operational concept. On the basis of defining the components of operational capability requirements, this paper combs, summarizes, analyzes and compares the main operational capability requirements analysis methods, points out the problems of existing analysis methods in combination with current research, and summarizes and prospects the next research direction.
Following the ever increasing trend in social media such as Twitter, Facebook, and Instagram, automatic analysis of people’s conversations and languages have become a problem of great significance for businesses and governments in attempt to understand and analyze people’s habits, thoughts, and patterns towards different subjects of interests. Within the field of natural language processing, sarcasm detection has always been a difficult challenge for sentiment analysis. Recent years, there has been great interests shown by researchers towards sarcasm detection. Neural networks achieve huge success and advancements surrounding this topic, but reviews for this task are very limited and there’s a lack of comprehensive review of the development of sarcasm detection so far. Thus, this paper aims to summarize and present the various methods directed towards sarcasm detection, the progress it has made, and examination of potential problems and availability for further improvements.
Beginning with Tesla, self-driving technology has become commercially available in recent decades. Target recognition and semantic segmentation remain significant obstacles for autonomous driving systems. Given that these two tasks are also part of the primary tasks of computer vision and that deep learning techniques based on convolutional neural networks have made advancements in the field of computer vision, a great deal of research has begun to apply convolutional neural networks to autonomous driving in the past few years. In this paper, we examine recent publications on CNN-based techniques for autonomous driving, classify them, and offer insights into future research directions.
There is a wide argument about the development of artificial intelligence and whether it has emotions right now, conveying concern about Artificial Intelligence in the future. While people have not formed a consensus about Artificial Intelligence, even in the technical development field, this article explores a paradox in the common recognition of Artificial Intelligence image, which is the mixture of fiction and reality, collection and individual, by focusing on media served as extensions of the body and perspective, and the difference between a world in artwork and reality. The article is going to talk about the paradox between the conceptional Artificial Intelligence image and the existing technology by illustrating how media like film and fiction played a role in shaping the conceptual image in human mind. Using the story and narrative layer theory in narratology and extension theory in philosophy to analyze excerpts in In Search of Lost Time, how the illusion provided by novel can be distinguished and understood will be more concrete and practical in the following texts. All the elements including time and character should be distinguished since the world in a fictional work is unequal to the real world. Moreover, how the fictional work can provide a different perspective of substance and inspiration for our future actions is crucial, instead of trapping us in the anxiety of being replaced by a machine in the future, which can never have an answer at this time.
The interaction between intelligent robots and humans has always been a hot issue, and researchers hope to make human-robot interaction as harmonious as human-human interaction. To achieve this, it is particularly important to enable robots to recognize human facial emotions automatically. However, many intelligent robots can already understand people's emotions through vocal communication. However, some people do not like to express their feelings through words, so it would be more convenient to let machines can automatically analyze people's facial emotions. This paper aims to make the machine recognize people's facial expressions and automatically analyze their emotions to make human-computer interaction more harmonious. The convolutional neural network has shown great influence on image feature extraction in the development of the machine learning field today. Therefore, this paper will adopt the advanced method of CNN to train the model on the FER2013 dataset. The abundant experiments demonstrate that the final trained model has good accuracy in recognizing three emotions: happy, surprise, and neutral.
Music is a grouping of musical tones from various frequencies. While artists composed through a deliberate arrangement of different notes, nowadays, A.I. programs learn to automatically generate short music through a machinal sequence of distinct notes. This essay compared the utility and efficiency of traditional machine learning (Regression Model) and deep learning methods (LSTM). This research only focused on instrumental classical music and used the MusicNet collection as the primary dataset. The comprehensive experiments are conducted from these two models, which suggests two results. Firstly, the LSTM model generates melodies that better fit the training styles. Secondly, models are better fitted on single music data than on the entire dataset.
Image inpainting, which is the repair of pixels in damaged areas of an image to make it look as much like the original image as possible. Deep learning-based image inpainting technology is a prominent area of current research interest. This paper focuses on a systematic and comprehensive study of GAN-based image inpainting and presents an analytical summary. Firstly, this paper introduces GAN, which includes the principle of GAN and its mathematical expression. Secondly, the recent GAN-based image inpainting algorithms are summarized, and the advantages and disadvantages of each algorithm are listed. After that, the evaluation metrics, and common datasets of deep learning-based image inpainting are listed. Finally, the existing image inpainting methods are summarized and summarized, and the ideas for future key research directions are presented and prospected.
Internet provides us with an abundance of useful tools and data. However, it also generates a vast quantity of data that may bewilder us. There must be a technique for automatically processing these data. Here, text classification becomes useful. Text classification is the algorithm-based process of categorizing data inputs into distinct labels. For instance, email software utilizes it to assess if an email should be filtered into the spam folder, social media forums use it to classify postings into labels that are relevant to the topic, etc. Text categorization is utilized in a variety of businesses, including search engines, sentiment analysis, emergency response systems, chatbots, etc. Review websites have emerged in recent years where customers may share their opinions on a business or a product. The review is extremely emotive but crucial to the company. It is possible to accurately assess the reviews for the sentiment they present through text classification. This paper compares the efficacy of various text classification algorithms for sentiment analysis.
Visual Transformer (ViT) has been a hot topic for research for the past few years after it first emerged in the field. On image recognitions, due to the amount of information ViT could retrieve from the source image, in cases it can rival the traditionally prevailing Convolutional Neural Network (CNN). Then there emerged different models based on ViT, all being built having a specific field or a flaw not addressed by original ViT in mind. In this paper these models are being tested on the same dataset along with a standard CNN to see how they perform compare to each other, and the best performing ViT model was then changed to see how there would be some possible improvements.
In these years, pre-training models gain a lot of attention in the summary generation area and demonstrate new possibilities for improving the sequence-to-sequence attention framework. This survey conducts a comprehensive overview of BERT-based pre-training models that can be used in abstractive summaries. Firstly, the BERT model is introduced as a typical pre-training model, followed by baseline models inspired by it. Then problems and developments of previous models are discussed including some recent SOTA approaches. Apart from that, some datasets used for models are demonstrated with main features. Besides, the commonly used evaluation methods are introduced. Last but not least, several potential research directions are suggested.
Due to practical needs, fine-grained image classification (FGIC) has been considered for many years as a direction of study in computer vision, which aims to subdivide images belonging to one coarse-grained category into multiple fine-grained classes. Traditional fine-grained image classification algorithms rely heavily on annotations. Recently, convolutional neural networks (CNN) have prefigured unprecedented opportunities for this research direction with the popularity and development in deep learning. To start, this study introduces the development history with various fine-grained image classification algorithms, as well as definition and research significance of the problem. After that, it compares and analyzes the different algorithms respectively in the aspects of strong supervision and weak supervision. This paper also compares the accuracy of these models on frequently used datasets. We conclude the paper by summarizing and evaluating the different aspects of these algorithms, and then discuss possible future research domains and challenges in this field.
Late in 2019, the unique viral disease coronavirus disease, or COVID-19, initially appeared. On March 11, 2020, the World Health Organization (WHO) proclaimed the COVID-19 outbreak a pandemic. It rapidly spread to every corner of the globe. This paper examines the use and actual application of big data in epidemic early warning. Based on the analysis of the value of big data epidemic early warning mechanisms, this paper divides the current big data epidemic early warning systems into three main categories according to the various channels of data acquisition: early warning systems based on the Internet and communication systems, early warning systems based on electronic medical information, and early warning mechanisms based on the Internet of Things information collection.