Efficient vehicular networks offloading using Hybrid Localization Algorithm and Deep Reinforcement Learning

. In the context of growing urbanization and increased vehicular traffic, the demand for efficient computation and location-based services is paramount. This paper proposes a pioneering solution to address the challenges of precise location services in Vehicular Networks within urban settings. The system combines a Hybrid Localization Algorithm (HLA) that integrates multiple methods for improved location accuracy with Deep Reinforcement Learning (DRL) for intelligent and adaptive offloading decisions based on real-time traffic conditions. Extensive simulations demonstrate the effectiveness of our approach in reducing response times, optimizing offloading strategies, and alleviating the burden of urban peak vehicle navigation pressure. This research paves the way for enhanced location-based services and intelligent transportation systems in urban areas.


Introduction
The exponential growth of urbanization and the escalating vehicular traffic have resulted in a critical demand for efficient computation and location-based services.Urban settings present distinctive challenges for providing precise location services in Vehicular Networks.Accurate and reliable positioning is essential for diverse applications, including navigation, emergency services, and traffic management.This paper introduces an innovative solution designed to address these challenges and revolutionize location services within urban environments.
In response to the complexities of urban Vehicular Networks, we introduce a novel system that synergizes two cutting-edge technologies: the Hybrid Localization Algorithm and Deep Reinforcement Learning.The Hybrid Localization Algorithm integrates multiple methods, harnessing the unique advantages of each to achieve superior location accuracy compared to traditional approaches.Concurrently, Deep Reinforcement Learning facilitates intelligent and adaptive offloading decisions grounded on real-time traffic conditions, thereby ensuring resource efficiency and enhanced system performance.
The remainder of the paper is organized as follows.In Section II, we introduce the related work.In Section III, we present the whole system model.In Section IV, we introduced our proposed methods of deep reinforcement learning.In Section V and Section VI, we evaluate the performance of our methods and simulate the whole Vehicle networking system and then introduce our future work.Finally, conclusion is made in Section VII.

Related work
With the development of wireless network technology, urban vehicle network puts forward higher requirements for real-time accurate positioning of vehicles.In paper [1] , the mean square error (MSE) helped with the vehicle to be positioned under different dimensions and variances To solve the problem of communication bandwidth and limited computation resource, [2] propose a cooperative location algorithm based on vehicle-road cooperative communication exchange.The algorithm has low computational complexity, easy implementation, stable performance and high reliability.Article [3] propose a distributed cooperative vehicular localization framework with truth discovery, assisting vehicles to learn which neigh-boring nodes they should cooperate with and ignore the others.However, the current research results do not take into account that different tasks have different requirements for positional accuracy.For location information requiring different accu-racy, we can use matching algorithms to make the most efficient use of computational resources.In a congested vehicle environment, each vehicle is an individual, but can be located by means of contact with other vehicles.A new discovery found that adjacent vehicles can cooperate to complete positioning tasks in paper [4], which also propose a approach coined Team Channel-SLAM Evolution (TCSE) to take advantage of the interrelationships between virtual transmitter locations.Paper [5] believed that cooperative vehicles can adapt to the environment and orient themselves better than individual vehicles.In order to improve the accuracy of information connection, an improved centralized and cooperative monocular synchronous localization and mapping (CCM-SLAM) method is proposed.To solve the problem of GNSS (Global Navigation Satellites Systems) outages, page [6] established a fusion localization framework of GNSS/On-board sensors which achieve the accuracy below half a meter during 5s GNSS outage.Another article introduce the basic foundations of intelligent vehicles called Highprecision self-localization, in [7] the author used the Monte Carlo localization algorithm to obtain an optimal estimate of the vehicle position.
Because of the limited computing resources can no longer fully meet the needs of vehicles communication, there is a high latency, high energy consumption and low feasibility in the process of information transmission.With the explosive growth of global mobile traffic, traditional fixed cloud cal-culators may not quickly give instructions to vehicle communication methods.Therefore deep reinforcement learning algorithms combined with a variety of technologies to form a new model to solve computation offloading problem.In article [8], a joint task offloading and task migration optimization (JCOTM) algorithm based on deep reinforcement learning is proposed to reduce task processing delay and optimize computing offloading.In [9] authors proposed a shared unloading strategy based on deep reinforcement learning to reduce task unloading delay and energy consumption in complex networked vehicle computing environment.For the problem of limited communication resources, [10] propose a model named vehicular fog computing (VFC) based on the vehicle is designed for communication and computing infrastructure, which help enrich the communication resources and better use of individual vehicle's computing offloading.[11] authors prioritize the experience to have low task service time and high load balance, achieving high Quality of Experience (QoE).In paper [12], a hybrid task offloading scheme (HyTOS) based on deep reinforcement learning is proposed to consider the delay constraints and resource requirements.To improve vehicle communication method, a cooperative computation offloading and resource management approach is proposed in [13], meanwhile a deep reinforcement learning algorithm (DRL), namely Asynchronous Advantage Actor-Critic (A3C) is used to optimize the system model.And [14], A virtual platform for vehicle trajectory prediction based on deep neural network is developed to achieve reasonable allocation of computing resources.
To cope with tasks with different localization accuracies, we use a hybrid localization algorithm.In order to meet the demand of computing offloading in the heavy traffic flow, we consider a decisionmaking method to guide the vehicle to choose whether to communicate with the vehicle or the base station, and select the most efficient communication link.We design to deploy the Deep Reinforcement Learning in local data server to reasonable distribute computing and communication resources for the whole vehicle networking system.Experiments show that our proposed algorithm can effectively optimum the decision, achieving rational utilization of computing resources as well as Maximizing spectrum utilization and its improvement is proved by simulation.The main contributions of our work are as follows: • We present a comprehensive system model that integrates localization, offloading, and vehicular communication to optimize the performance of localization and edge computing system.• We propose a data-driven approach using hybrid localization and DRL algorithm, which considers both localization and offloading strategy, enabling efficient and adaptive decision-making in the face of the dynamic vehicular environment.• We demonstrate the effectiveness of our approach through extensive simulations, showing significant improvements in system performance compared to existing methods. .Suppose there are k vehicles and the computational power of each vehicle can be expressed as f veh k .There are N RSUs in total and the computation rate of each RSU assigned to the task is f RSU n,l .The computation rate of the system for a task assignment is the sum of the local computation rate and the transmission rate of V to RSU.Let {C 1 , C 2, C 3 , ..., C L } represent L types of tasks and each content includes three features {s l , τ l , β l }, s l and τ l are the size of content and maximum allowed access latency to obtain content, and β l is the popularity of content.The contents can be computed in three different types of nodes: computing in the local vehicle, in other vehicles and in RSUs.

Communication and computation model
In the vehicle communication system, there are uplink and downlink communication between vehicles and RSU.For the communication between vehicle i and vehicle m, the following equation is used to calculate the upstream signal-to-noise ratio γ i,m where p is the transmission power between the two vehicles, g is the antenna gain at the car i, ξ is transmission loss at a reference unit distance, d is the distance between the two vehicles, σ 2 i,m is the energy of Gaussian white noise introduced during transmission.The downlink signal-to-noise ratio γ m,i , satisfies the following equation: The uplink and downlink transmissions need to satisfy Shannon's formula.Using the following formula to calculate the uplink transmission rate r i,m , where B i,m is the uplink channel bandwidth between vehicle i and vehicle m.
The downlink transmission rate satisfies the following equation, where B m,i is the downlink channel bandwidth between vehicle i and vehicle m: The uplink and downlink communication indexes between vehicle and RSU have similar relationship with each parameter, and the signal-to-noise ratio γ i,n , The uplink and downlink communication indexes between vehicle and RSU have similar relationship with each parameter, and the signal-to-noise ratio γ i,n , γ n,i of the uplink and downlink channels can be calculated by the following equations: where p is the transmission power between the vehicle and the RSU, g is the antenna gain at the vehicle, ξ is the transmission loss per unit distance, d is the distance between the vehicle and the RSU, and σ 2 i,n is the energy of the Gaussian white noise introduced during transmission.The upstream and downstream transmission rates between the vehicle and the RSU are r i,n and r n,i , which satisfies the following equation, where B i,n is the uplink channel bandwidth and B n,i is the downlink channel bandwidth: The task content to be computed in vehicle communication is divided into different types and offloaded to vehicle local computation, other vehicle computation or RSU computation, using the following equation to compute the delay incurred in completing the computation of task l locally in the vehicle, where f veh k is the computation speed of vehicle k: where η l obeys a two-point distribution Pr (η l = 1) = β L , Pr (η l = 0) = 1 − β L and β L is the probability of generation, λ i,i is the proportion of task l that car i offloads to local computation, s l,i,i is the size of task l offloaded by vehicle i.When computational task l is offloaded to other vehicles, the delay d l,i,m satisfies the following equation: When computational task l is offloaded to the RSU, is the computation rate assigned to task l by the RSU, the resulting delay satisfies the following equation: The maximum delay generated by the three computational allocation methods is taken as the total delay generated by the computational task l in vehicular communication, which means the total delay d l satisfies the following equation: The total system delay d total is the sum of the total delay d l of all tasks and all vehicles of the system: The computational resource allocation obtained by feeding the tasks to be processed into the deep reinforcement learning algorithm to obtain the delay minimizing computational offloading scheme with the lowest latency, and the resulting computational resource allocation needs to satisfy max {d l,i,i , d l,i,m , d l,i,n } ⩽ τ l , where τ l is the maximum delay.Therefore, the proposed vehicular edge computing problem can be represented as minimize the total delay of the system.The mathematical of the problem can be expressed as follows:

Hybrid Localization algorithm
A novel localization algorithm that incorporates both multilateration and Extended Kalman Filter techniques.This hybrid approach dynamically switches between the two methods based on a userdefined threshold.When the localization accuracy falls below the threshold, the algorithm automatically switches to the multilateration algorithm, which excels in providing accurate position estimations under certain conditions.Conversely, when the accuracy surpasses the threshold, the algorithm switches to the more sophisticated Extended Kalman Filter technique, which offers improved performance and robustness in challenging localization scenarios.This adaptive approach ensures optimal localization results in a wide range of environments and conditions, striking a balance between accuracy and computational efficiency.

Multilateration algorithm
The distance between the anchor i and tag t can be expressed as: Organizing this nonlinear equation, we can obtain a linear equation of the following form: Solving the above equation by the least squares method:

.2. EKF algorithm
The Extended Kalman Filter (EKF) has emerged as an effective choice for localization, particularly when dealing with nonlinear motion models and sensor measurements.The EKF is an extension of the Kalman Filter, which is applicable only to linear systems.Its key idea is to linearize the nonlinear system at each time step using Taylor series expansion.By doing so, the nonlinear system can be transformed into a linear system, allowing the use of the standard Kalman Filter for state estimation.This makes the EKF well-suited for estimating the state of a vehicle in real-time, taking into account nonlinearities in the motion model and sensor measurements, and providing accurate localization even in challenging environments where GPS signals may be unreliable.The state equation of the EKF can be expressed as where x k is the state vector at time step k, f () is a nonlinear function representing the system's state transition, u k is the control input vector at time step k, w k is the process noise, representing the uncertainty in the system model.The observation equation can be expressed as where z k is the measurement vector at time step, h() is a nonlinear function representing the observation equation, v k is the measurement noise, representing the uncertainty in the measurement process.The EKF algorithm proceeds through two main steps: the prediction step (time update) and the update step (measurement update).
At each time step k, the EKF predicts the next state estimate and error covariance based on the state transition model: The linearization process involves calculating the Jacobian matrix F k of the function f () with respect to the state vector x evaluated at the predicted state xk|k−1 .This matrix is used to update the error covariance P k|k−1 as follows: where P k−1|k−1 is the error covariance matrix at the previous time step k − 1, Q k is the process noise covariance matrix.After obtaining a new measurement z k at time step k, the EKF uses the observation model to compute the predicted measurement ẑk|k−1 = h(x k|k−1 ).The Jacobian matrix H k of the function h() is evaluated at xk|k−1 and used to calculate the Kalman gain K k and update the state estimate: Finally, the error covariance matrix is updated using the Kalman gain and the observation model: where R k is the measurement noise covariance matrix.Implement EKF algorithm and predict the position if Lower − threshold < P riority < U pper − threshold then 7: Implement multilateration algorithm and predict the position 8: end if if 0 < P riority < Lower − threshold then 10: Remain the same 11: end if 12: end for In online stage, a two-dimension state matrix S of a timeframe is obtained from dataset.Required delay of each type of task τ 1 , τ 2 , ..., τ L are claimed in the matrix.It is then expanded into a state vector s t as the input of policy network(PN).PN predict a best action vector xt of the timeframe.Then xt is quantized into R nearby action vectors for Q-calculator to compute.The vector with highest Q-value Q * t is select and is regarded as x * t .The corresponding vector pair {s t , x * t } is combined and stored in replay memory as input and target output of PN.
When a dozen of timeframes pass, online stage pauses, then offline stage starts.A batch of combined vectors are extracted from replay memory to update PN coefficients.The loss function is mean square error(MSE).

Policy Network
In our RG-DRL, the predicted action vector xt is obtained by policy π(θ i ), which is a deep neural network.It consists of one input layer, two linear hidden layers and an output layer that are fully connected.The input layer corresponds with the state vector s t , and the output layer corresponds with the predicted vector xt .Hidden layers are in size of 1024 × 1, with ReLU as activation function.The output layer, instead, is activated by Sigmoid in order to be normalized.4.2.3.Random Generator and Replay Memory Conventional DRL Algorithms are compensate for binary-choice policies.In offloading, however, without a foreseen action space, continuous policies are difficult to be constructed.To create an action space for Q-value calculating, we applied a random action generator(RAG) in the algorithm.
When an action vector xt is given by the policy network, it is copied into the RAG as the first element x t,1 of action space.Then RAG randomly generates a series of new vectors x t,i near the former vector.Each proportion in action vector is changed within a maximum range w.As shown in Fig. 2, vectors on the top on the RAG is similar to x t,1 , whereas those at the bottom are different.After the action space is filled up, the vectors are send to Q-value calculator.Q-value is defined as It consists of one initializing value Q 0 and three penalties: delay penalty P delay reflecting delay caused by the action vector, miss penalty P miss for missing the required delay, and overflow penalty P overf low for allocating too much resources.Penalties are calculated by Each penalty has a weight λ d , λ m , λ o to note its importance.Thus the best nearby action in RAG is The vector x * t indicates that a better proportion exists, hence it's regarded as the target proportion in the timeframe for PN to train and update coefficients θ i .

Adaptive Settings
In the first third of timeframes, learning rate α = 0.01 as default, randomize range w = 0.1, both can be set manually.Then at each third of timeframes, learning rate and randomize range are both divided by 2. In addition, RAG size R t is adaptive basing on historical {R 1 , R 2 , ..., R t−1 } to avoid unnecessary choices.Consequently, nearby actions in RAG differs from the predicted one become less and PN is trained more finely as training going on.4.2.5.Evaluation Note that new occasions may always appear in new time frames, bringing unforeseen proportions to PN-DRL, along with possibility that exists in the real best action because of random processes may be missed, which means conventional evaluations such as MSE no longer fit our algorithm.
Hence, we use Q-ratio instead of loss to evaluate our RG-DRL In which Qt refers the Q-value corresponds with xt .Select the optimal action x * t by x * t = argmax(Q) if i mod ∆ = 0 then 10: Randomly sample a batch of data set from replay memory 11: Train the the policy network π θ with the batch, update θ i with Adam optimizer 12: end if 13: end for

Numerical results
We validate our hybrid localization algorithm using a real dataset, the algorithm is implemented using Matlab.We randomly generate a series of tasks and assign a value between zero and one.When this value is greater than 0.3, it means that this task needs to get the real-time position.When the value is between 0.3 and 0.6, it represents that the location of this task is realized by the multipoint localization algorithm, and when the value is greater than 0.6, it represents that the location of this task is realized by the EKF algorithm.A value less than 0.3 means that the task does not need location information.We then send the results to the DRL algorithm for task offloading.Fig. 3-4 shows the location maps of the vehicles corresponding to the required location information after randomly generating 10 and 20 tasks, respectively.For the result of tasks offloading, The Q-ratio of 100,000 and 1,000,000 timeframes are shown in Fig. 6 and Fig. 7. Its MSE loss in 1,000,000 timeframes is shown in Fig. 5.It is clear that outliers always appear due to RAG, most possibly because the maximum Q-value among RAG vectors happened to be close at 0. Additionally, MSE loss oscillates even at the end of training, demonstrating our former analysis.However, after 1,000,000 timeframes,despite outliers, most of the Q-ratio given by the trained PN are less than 0.2 in testing results, as shown in

Future work
In the future, we hope to find or collect more accurate and comprehensive data to further validate our methods.In addition, for the localization algorithm, we hope to design a fusion localization algorithm that combines both GPS and UWB technologies.For our DRL algorithm, one apparent problem is that the Q-ratio in RG-DRL does not converge very fast and robust due to the instability caused by RAG as a sacrifice for continuous proportion prediction.A practical solution might be some improvement in RAG generating mechanisms.storing some of the nearby actions generated by RAG in replay memory.Another access to better results can be a change in algorithm architecture, by replacing RAG and Q-value calculator with another DNN with random generator, thus an actor-critic-like DRL will be constructed.This actor-critic RG-DRL shall be more robust to outliers, as well as setting more appropriate weights to Q-value calculation.

Conclusion
This research proposes an innovative solution to address the challenges of precise location services in Vehicular Networks within urban settings.The system combines a Hybrid Localization Algorithm that integrates multiple methods for improved location accuracy with Deep Reinforcement Learning for intelligent and adaptive offloading decisions based on real-time traffic conditions.Extensive simulations demonstrate the effectiveness of our approach in reducing response times, optimizing offloading strategies, and alleviating the burden of urban peak vehicle navigation pressure.This research paves the way for enhanced location-based services and intelligent transportation systems in urban areas, providing a foundation for future advancements in smart mobility and urban management.

Figure 2 : 3 :
Figure 2: The framework of deep reinforcement learning algorithm

Figure 3 :Figure 4 :
Figure 3: The position of vehicles when 10 tasks were randomly generated

Table 1 :
Distribution of output.