Intelligent Classification and Personalized Recommendation of E-commerce Products Based on Machine Learning

With the rapid evolution of the Internet and the exponential proliferation of information, users encounter information overload and the conundrum of choice. Personalized recommendation systems play a pivotal role in alleviating this burden by aiding users in filtering and selecting information tailored to their preferences and requirements. Such systems not only enhance user experience and satisfaction but also furnish opportunities for businesses and platforms to augment user engagement, sales, and advertising efficacy.This paper undertakes a comparative analysis between the operational mechanisms of traditional e-commerce commodity classification systems and personalized recommendation systems. It delineates the significance and application of personalized recommendation systems across e-commerce, content information, and media domains. Furthermore, it delves into the challenges confronting personalized recommendation systems in e-commerce, including data privacy, algorithmic bias, scalability, and the cold start problem. Strategies to address these challenges are elucidated.Subsequently, the paper outlines a personalized recommendation system leveraging the BERT model and nearest neighbor algorithm, specifically tailored to address the exigencies of the eBay e-commerce platform. The efficacy of this recommendation system is substantiated through manual evaluation, and a practical application operational guide and structured output recommendation results are furnished to ensure the system's operability and scalability.


Introduction
The personalized recommendation systems employed by companies like Amazon, Netflix, and various online businesses are vigorously promoted through articles, competitions, and other means.These recommendation systems have become exceedingly popular in the e-commerce sector, significantly boosting metrics such as UV, [1]PV, GMV click conversion, and order conversion by 400%-500% after their implementation.Such technology not only enhances online commodity businesses but also widens the scope of personalized recommendation technology adoption.The distinct characteristics of commodity recommendation highlight significant differences between commodities and articles or news, with commodities possessing inherent transactional attributes.Inaccurate recommendations hinder user interaction, making it challenging for users to click, browse, and ultimately make purchases.[2]Given that the cost of conversion for purchases far exceeds that of news and articles, precision becomes paramount in product recommendations.Typically, product recommendations rely on user preference models derived from browsing history, orders, shopping cart additions, searches, likes, favorites, comments, and other behaviors.These models are trained offline using logistic regression (LR) and calculated via map reduce.To provide real-time feedback to users, user client reporting tracks their browsing, clicking, and shopping cart activities.Real-time user preferences are generated through storm or spark streaming calculations, despite the considerable challenge of managing both product and user relationships and the associated demands on storage space and access performance for online services.Therefore, the application of machine learning in e-commerce has broad prospects and provides huge development opportunities for e-commerce enterprises.First, machine learning can realize personalized recommendation systems by analyzing massive user data and product information, thereby improving users' shopping experience and purchase conversion rate.For example, by analyzing users' browsing history, purchase behavior and other data, [3]Amazon has realized highly personalized product recommendations, which has greatly improved users' purchase willingness and loyalty.Secondly, machine learning can also help e-commerce enterprises optimize marketing strategies, achieve precision marketing and pricing strategies, and increase sales and profits.For example, Alibaba uses machine learning technology to analyze user behavior and accurately push coupons and promotions, effectively increasing sales.To sum up, the application prospect of machine learning in e-commerce is very broad, which will bring more business opportunities and competitive advantages to e-commerce enterprises.

Traditional e-commerce commodity classification
The main goal of the traditional e-commerce commodity classification system is to divide and classify the goods according to certain rules or standards according to static information such as the attributes, categories and labels of the goods, so that users can easily find and browse the goods in different categories.This classification method is usually based on the characteristics of the product, such as the category of the product, price, brand and other information.With this classification, users can more easily locate the product category they are interested in and browse related products.The main core of its work is to rely on the user's behavioral data, such as purchase history, browsing history, etc., as well as the property information of the product, to recommend the product that the user may be interested in.Often, these recommendations are based on factors such as the user's historical behavior and the similarity or popularity of the product.For example, if a user regularly buys athletic shoes, the system may recommend other types of sports equipment or other shoes of the same brand to the user.
In general, the traditional e-commerce commodity classification system mainly classifies goods through static information, providing users with a way to display goods in accordance with fixed standards, while the traditional e-commerce recommendation system recommends products that may be of interest to users based on user behavior data and commodity attribute information.
Although the traditional e-commerce commodity classification system can help users quickly locate the category of required goods to a certain extent, it also has some shortcomings.One of them is its relatively low efficiency, especially in the face of large-scale goods and user data, the system's response speed can become slower.[4][5][6]This is mainly because traditional classification systems usually rely on static commodity attribute and category information for classification, and this information may need to be constantly updated and maintained, resulting in bottlenecks in the system when processing data.
For example, suppose that the number of products on a traditional e-commerce website is very large, and it also involves several different product categories and subcategories.When the user visits the website, the system needs to classify and display a large number of goods in real time according to the user's needs and browsing behavior.However, due to the updating and maintenance of commodity information may take a certain amount of time and resources, the system may not be able to reflect the latest commodity classification information in a timely manner, resulting in difficulties for users in the browsing process, and even affecting the user's shopping experience.
In addition, the traditional e-commerce commodity classification system may also have inaccurate or imperfect classification problems.[7]As the attributes and categories of goods may be more complex and diverse, the traditional classification system may not be able to accurately classify the goods, resulting in some goods being wrongly classified into inappropriate categories, which brings confusion and confusion to users.Although the traditional e-commerce commodity classification system provides the convenience of commodity display and browsing to a certain extent, its shortcomings such as low efficiency and inaccurate classification still exist, which need to be further optimized and improved to enhance user experience and system performance.

Personalized recommendation system
The Personalised Recommendation System is an advanced business intelligence platform designed to provide users with personalised information services and decision support by analysing users' interests, purchasing behaviour and other relevant data.The Personalised Recommendation System builds a user interest model by mining large amounts of data, and makes personalised recommendations to users based on the model, helping users to find the goods or information they may be interested in, thus improving shopping efficiency and user experience.Such systems typically include content-based recommendation approaches, where content-based recommendation systems use product or user metadata to generate recommendations, while collaborative filtering uses the behaviour of user groups to recommend products to other users.Personalised recommendation systems not only improve user satisfaction and company sales, but also help solve the problem of information overload and help users find those long-tail goods that may have been overlooked.
At present, personalized recommendation system has become a key technology in the fields of ecommerce, social media and digital entertainment.With the rapid development of the Internet and the emergence of big data, personalized recommendation systems can analyze users' behaviors and interests, provide users with customized recommendation content, and improve user satisfaction and interactive experience.
Personalized recommendation system can usually be applied to e-commerce, content information, video, advertising and other recommendation scenarios.
1)E-commerce: the biggest advantage of e-commerce is the convenience of shopping, but with tens of thousands of Internet users continue to promote the development of e-commerce, while leaving tens of thousands of information data on the Internet, users tend to lose themselves in this massive information space, have to spend more time to find their favorite goods.This is the problem of information overload.[8]At present, e-commerce platforms such as Amazon, Taobao and Jingdong have responded to this problem by establishing personalized recommendation systems.The personalized recommendation system can solve the three major problems of cold start for new users, cold start for new goods and cold start for new scenes of the platform, provide shopping convenience for users to the maximum extent, and improve the conversion rate for the platform.One of the most well-known is that 30% of e-commerce giant Amazon's annual revenue comes from personalized recommendations.
2)Content Information Scenario: Content information recommendation is an important application area of personalised recommendation system.
In recent years, the number of Internet news users has reached 771 million.At present, the most representative products in China now have more than 300 million monthly active users of daily headlines, 573 million monthly active users of Weibo, and 100 million monthly active users of Zhihu.
Therefore, it is of great significance to use the personalised recommendation system to solve the problem of information overload in the field of content information.Through the personalised recommendation system, it can solve the two major problems of low user activity and difficult retention of new users, and help users discover new content in time, reduce user reading fatigue, and increase user dwell time.Linkedin has seen ten times more consistent growth through personalised recommendations.
3) Media scene: Personalized recommendation of video scene is playing an increasingly important role in the Internet field, especially on short video platforms.According to the latest data, in 2024, the monthly active users of Internet integrated video have reached 842 million, while the monthly active users of popular products such as Douyin short video have reached 400 million, and the monthly active users of [9]Kuaishou short video have also reached 250 million.In this huge user group, personalized recommendation system plays a key role.
Personalized recommendation system can effectively solve the problem of 28 effect of short video platform.The so-called 28 effect means that a few head videos occupy the vast majority of the traffic, resulting in a low exposure rate of long-tail videos.With the personalized recommendation system, short video platforms can avoid this situation, while ensuring the efficient distribution of new videos, making the traffic distribution of video content more balanced and reasonable.
For example, YouTube is a typical short video platform that utilizes a personalized recommendation system.By analyzing a user's browsing history, viewing habits, likes and other behavioral data, YouTube is able to recommend the most relevant and interesting video content to each user.This personalized recommendation not only improves the user experience, but also drives the growth of the platform.According to statistics, YouTube can add hundreds of thousands of hours of viewing time per day, and the number of video views has increased by more than 50% per year.
Therefore, the emergence of personalized recommendation system has overcome the static information limitations of traditional recommendation system, and has brought great progress for ecommerce, content information and media industries.In the field of e-commerce, personalized recommendation system solves the problems of inefficient and inaccurate classification of traditional commodity classification system, provides users with a more personalized shopping experience, and greatly improves shopping efficiency and user satisfaction.In the field of content information, personalized recommendation system solves the problem of information overload and low user activity, helps users discover new content, reduces reading fatigue, and improves user retention and platform activity.In the media scene, the personalized recommendation system effectively solves the 28 effect of the short video platform, realizes the balanced and reasonable distribution of video content traffic, and promotes the rapid growth of the platform.In summary, the personalized recommendation system has brought more efficient and intelligent services to various industries, and has become an important driving force to promote the development of the industry.

E-commerce personalized recommendation system and challenges
In the big data environment, there may be problems with data quality, such as noise, missing values, inconsistencies, etc., which will affect the accuracy and reliability of data mining and analysis.In the process of data mining and big data analysis, privacy and security issues are an important consideration, and it is necessary to protect personal privacy and the security of sensitive information.Processing large-scale data sets and complex computing tasks requires powerful computing and storage capabilities, and for some application scenarios, the results of data mining and big data analysis need to be interpreted and interpreted, which poses challenges for black box models such as deep learning.Despite their benefits, e-commerce personalized recommendation systems face several challenges.One of the primary challenges is data privacy and security concerns.As these systems rely heavily on user data, ensuring the privacy and security of this data is crucial.With increasing regulations and consumer awareness regarding data privacy, e-commerce platforms must navigate this challenge carefully to maintain user trust and compliance with regulations.
Another challenge is algorithmic bias and fairness.Personalized recommendation algorithms may inadvertently reinforce biases or discrimination based on factors such as race, gender, or socioeconomic status.Addressing algorithmic bias requires careful algorithm design and continuous monitoring to ensure recommendations are fair and inclusive.Scalability is a significant challenge for e-commerce personalized recommendation systems.As e-commerce platforms grow and accumulate more users and products, the volume of data to process and the computational resources required for personalized recommendations increase exponentially.
Developing scalable algorithms and infrastructure to handle this growth is essential to maintain system performance and responsiveness.
Additionally, the cold start problem poses a challenge for e-commerce personalized recommendation systems, especially for new users, new products, and niche categories.Without sufficient historical data, it can be challenging to provide accurate and relevant recommendations, leading to a suboptimal user experience.Overcoming the cold start problem requires innovative approaches, such as hybrid recommendation techniques combining collaborative filtering and contentbased methods, as well as leveraging auxiliary data sources.
In summary, while e-commerce personalized recommendation systems offer significant benefits for both users and businesses, they also face various challenges, including data privacy, algorithmic bias, scalability, and the cold start problem.Addressing these challenges requires a multidisciplinary approach, combining expertise in data science, ethics, and engineering to build robust and effective recommendation systems in the e-commerce domain.

Methodology
BERT has been popular in the nlp field since google announced its excellent performance in 11 NLP tasks at the end of October 2018.In this article, we will examine the role and advantages of BERT model in e-commerce personalized recommendation system, and understand how it works through the model used by eBay.It also has great reference value for students in other fields.The advantage of BERT model in personalized recommendation system lies in its strong semantic understanding ability.

BERT Model Introduction
BERT is an algorithmic model that has broken records for a number of natural language processing tasks.Shortly after BERT's paper was published, the Google team also opened up the code for the model and provided some ways to download the algorithm model pre-trained on a large data set.Google open source this model and provide pre-trained models, which makes it available to anyone to build an algorithm model involving NLP, saving a lot of time, effort, knowledge, and resources needed to train language models.Because BERT can deeply understand the semantic information of the text, it can more accurately capture the semantic relationship between the user and the product, thus improving the accuracy and personalization of the recommendation system.The training of BERT Model can be divided into two parts: Masked Language Model and Next Sentence Prediction.In this algorithm, we migrate these tasks to e-commerce recommendation tasks: Masked Language Model on Item Titles and Next Purchase Prediction.

Figure 1. Training architecture diagram of BERT model
Masked Language Model on Item Titles: [10]The distribution of tokens in the project title on the ecommerce platform is very different from the distribution of tokens in the natural language corpus used for BERT model pre-training.In order to better understand the semantic information in the ecommerce recommendation environment, we take this task as one of the goals to retrain the model.In the training process, we follow Devlin et al. 's scheme and MASK 15% tokens in the project title, then the loss function of this part is as follows: Next Purchase Prediction: In the BERT model, the Next Sentence Prediction is used to predict whether A sentence A is the last sentence of another sentence [11]B.We turn this into A recommendation problem: given a seed item A, predict whether another item B is the next item the user will click to buy.Replace sentence A in the original model with the title of item A and sentence B in the original model with the title of item B, concatenating the titles as input to the model.We collect the training data of the model from the historical behavior data of users.For seed items, the items purchased in the same user session are taken as positive examples, and negative examples are randomly selected from inventory.Thus, given the set of positive cases and the set of negative cases, the loss function of this part is: The joint loss function of the two parts of the model is as follows:

Experimental design
In this notebook, we adhere to the CRISP-DM (Cross-Industry Standard Process for Data Mining) pipeline, aiming to develop a recommendation system tailored for eBay's customers.The objective stems from addressing the challenge posed by eBay's extensive product offerings, which could overwhelm users and potentially deter them from completing purchases.[12] By crafting an effective recommendation system, we strive to streamline the browsing experience, thereby increasing user satisfaction and potentially reducing environmental impact by minimizing unnecessary returns and associated transportation emissions.
In this notebook, to develop a recommendation system tailored for eBay's customers.The online store's extensive product offerings can overwhelm customers, potentially leading them to abandon their search.Addressing this challenge not only enhances user experience but also has positive environmental implications, such as reducing emissions from transportation due to fewer returns.The resources include images of [13]products, metadata for each product and customer, and transaction data.The transactions_train.csv file contains product-customer entries that are not unique, meaning there can be multiple transactions for the same product by the same customer.Success in this project will be measured by the Mean Average Precision (MAP@12) metric.The higher the MAP@12, the better the recommendation system performs.Missing customer_ids in transactions_train.csvwill be excluded from scoring.Additionally, we will manually test if the model recommendations make sense.

Data sets and preprocessing
The impact of missing values is further investigated, particularly in the `detail_desc`column, by examining the distribution of missing descriptions across different categories (`index_name`).

Figure 2.
Histogram of distribution of the original data set After comparing the histograms of descriptions with and without missing values, it's evident that the discrepancies in the counts of all index_names are minimal.The maximum number of missing entries per index_name is approximately 90, whereas even the smallest index_name (like "Sport") has around 2500 entries.Consequently, the missing descriptions are deemed insignificant.
Therefore, the decision has been made to utilize the entries without descriptions, despite constituting a small subset of the entire dataset.[14]This choice is grounded in the understanding that real-world scenarios may entail products without descriptions that could still be suitable for customers.Hence, relying on other categorical features is deemed sufficient.
In the preprocessing step, the lengths of sequences, such as product names and descriptions, were analyzed.The NaN values in the "detail_desc" column were filled with empty strings, and a function was devised to compute the length of text sequences.Multiple whitespaces were substituted with a single whitespace, and surrounding whitespace was removed before determining the sequence length.Subsequently, the distribution of description lengths was visualized using a histogram to gain insights into the data.

Figure 3. Data distribution model
In conclusion, the analysis conducted in this section has provided insights for the preprocessing pipeline to prepare the data for model input.The steps involved include filling [15]NaN values in the 'detail_desc' column with empty strings, merging selected textual columns into a new column named 'text', and converting the text to lowercase.Subsequently, the text values will be embedded into vectors using BERT.Then, a left join operation will be performed between the transactions dataframe and the articles dataframe to acquire the new vector column.

Modeling
This section explains how to convert text into vector representations using the [16] BERT model, and uses the nearest neighbor algorithm in Sklearn to recommend products to customers.It first uses BERT to embed the text as a vector, then uses the nearest neighbor algorithm to find other products similar to the customer's purchase behavior based on their previous purchases, and finally provides the customer with a list of recommended products.In this step, models are evaluated and compared through manual evaluation, and the results of manual evaluation show that the recommendation system performs well.By comparing the purchase record of a specific customer with the system recommendation, it is found that the recommended product type is consistent with the customer's purchase preference.The system recommends a variety of different types of clothing, such as dresses and skirts, which coincide with a particular customer's purchase history.This indicates that the recommendation system can provide personalized recommendations according to customers' previous purchase behaviors, thus increasing customers' satisfaction and purchase intention.
Combining the results of the manual evaluation, use guidelines, and submission preparation, the following research conclusions can be expanded: Through manual evaluation, the effectiveness of the recommendation system is confirmed, and it is found that the products recommended by the system are highly consistent with customers' purchase preferences, which provides strong support and verification for the recommendation algorithm.The user guide section describes in detail how to operate and use the system, providing clear guidance for practical applications, and users can easily use the system to provide personalized product recommendations for customers.[17]The submission preparation part shows the structured data output by the system, and provides the operation steps and sample code for submitting the recommendation results to the system, so that the research results can be successfully applied in practice and provide more intelligent and personalized recommendation services for e-commerce and other fields.Therefore, the recommendation system has not only been fully verified in theory, but also has operability and scalability in practical application, which provides users with better shopping experience and brings more business value to merchants.

Experimental conclusion
By using BERT model for text vector representation and nearest neighbor algorithm for product recommendation, we designed a personalized recommendation system to solve the problem of eBay ecommerce platform.[18]Through this recommendation system, we can effectively improve the browsing experience of users and reduce the situation that users may give up buying because of search difficulties.At the same time, the recommendation system also helps to reduce unnecessary returns and associated transport emissions, thus having a positive impact on the environment.Through manual evaluation, we confirmed the effectiveness of the recommendation system and found that the types of products recommended were highly consistent with eBay users' purchase preferences.Combined with the usage guide and submission preparation, we provide clear operational guidance and structured output of recommendation results for [19]practical applications, thus ensuring the operability and scalability of the recommendation system.Therefore, our research results not only provide an effective solution, but also have feasibility and practicability in practice, which can provide more intelligent and personalized shopping recommendation service for eBay platform users.

Conclusion
In our proposed algorithm, each project is represented not by a conventional identifier but by a vector representation derived from the semantic understanding of its title tokens.This enables the algorithm to gather and group together projects that share similar semantic attributes, facilitating more nuanced and accurate recommendations.By harnessing the semantic richness captured by the BERT model, our [20]algorithm transcends the limitations of traditional approaches, which often struggle to discern meaningful relationships between a large number of items.By doing so, we effectively capture and consolidate the information pertaining to similar projects, thereby addressing longstanding challenges such as long-tail recommendation and the cold start problem inherent in traditional collaborative filtering recommendation algorithms.
Through extensive experimentation conducted on real-world, large-scale datasets, we have empirically validated the efficacy of our algorithm.Our results demonstrate significant advantages in terms of understanding the nuanced semantic information inherent in items and effectively learning the intricate relationships between a multitude of projects.[21]By leveraging the semantic richness of item titles and historical user behavior data, our algorithm offers enhanced recommendation accuracy and relevance, thus enriching the user experience and bolstering engagement on e-commerce platforms.
Therefore, personalized recommendation system is expected to become the core technology in the field of e-commerce, bringing more business opportunities and competitive advantages to e-commerce enterprises.Personalized recommendation system also has important application value in the field of content information.With the increasing number of Internet news users, users are faced with the problem of information overload and difficulty in obtaining content.Personalized recommendation system can recommend content information in line with users' preferences according to their interests and behaviors, help users discover new content in time, reduce reading fatigue, and improve user retention and platform activity.LinkedIn, for example, has achieved tenfold consistent growth through a personalized recommendation system.Therefore, the application of personalized recommendation system in the field of content information will further enhance user experience and platform value.

Table 1 .
using all customer IDs.