Moose test is an evasive maneuver test that is applied to qualify how well can a vehicle avoid an object which suddenly appears on the road. The test was performed first in Sweden in the 1970s, when the name of the test was “the evasive maneuver test”, the first usage of “moose test” was in 1997 by Swedish journal Teknikens Varld. Nowadays, the moose test is one of the most commonly used car stability tests for new vehicles. During a moose test, the driver first releases the throttle, then performs a sharp lane-change maneuver, and returns to the original lane. The test is successful, if the car can avoid the obstacle – for example, the cones – without skidding or drifting. The indicator of the performance of the vehicle is the maximum speed at which the vehicle can perform the moose test, an average D-class sedan can perform the test at 72-73 km/h. The recorder is a Citroen Xantia Activa V6 since 1999, this car performed the test at 85 km/h, beating cars such as the 2008 Porsche 911 GT3 RS, Audi R8 V10 Plus, etc.
Machine learning, in-depth learning, and artificial intelligence have advanced rapidly since the beginning of the last decade. Parallel to classical control planning and decision algorithms, they advance in solving control tasks, especially for different vehicle control tasks. Combining artificial intelligence-based developments with control techniques is also clear that it can be an effective method for autonomous vehicle controls.
It is true for all vehicles, but especially for the nowadays trendy high center of gravity SUV category that needs to drive the most optimal route when performing a critical maneuver, such as a double lane change. Several optimization criteria can be defined to minimize jerk and lateral acceleration to increase safety and passenger comfort. The moose test defined by ISO 3888-2 is a good tool for testing the stability of a vehicle in a dynamic limit situation.
In the local trajectory planner design field, an agent for determining the path of a double lane change maneuver has been developed using reinforcement learning. Classical control techniques were integrated into the reward function during the training process. The system was first tested in a simulation and then in a ZalaZONE proving ground.
In reinforcement learning, an agent is placed in a simulation environment, and there, taking into account the state-space, it learns to act optimally. The usually long learning process consists of repeated steps in episodes to maximize the expected reward.
The state-space contains the lane's width, length, and position that the vehicle must traverse without touching the boundaries. On a field test, the lane boundaries can be marked with cones. The action-space is the path's arc length and curvature parameters consisting of polynomial and straight sections. The state-space consists of 11 and the action space 10 continuous values.
We developed a learning agent for generating the optimal path for a given state or lane option. In each case, this required a simulation environment where a balance must always be found between the complexity of the models and the runtime. An agent must be taught in the process of hundreds of thousands of iterations, often in the order of millions. For each iteration, the agent receives a different random but feasible or near-feasible state-space and predicts an action-space for this. Based on the action space, a path generator determines the route's x, y coordinates, and then the task is to get feedback on the goodness of the planned route. This is the value of the reward.
An MPC controller drives a modeled vehicle along the track responsible for lateral control to train the agent. Longitudinal control of the vehicle should only be performed until acceleration because, according to the ISO standard, acceleration should not be performed during the moose test. The slip values, lateral acceleration, angle, and distance deviation from the planned path are evaluated during the course. Based on these, the value of the reward or the penalty can be determined. A TD3 (Twin-Delayed Deep Deterministic Policy Gradient) agent was implemented for training in Python environment. A vehicle model and the solver are critical in terms of the runtime called from a C library.
The trained agent works with a neural network of 3 hidden layers and a few hundred neurons, suitable for fast prediction with low computational requirements. The agent has been implemented on a Jetson AGX Xavier running Robot Operating System 2, which receives state-space elements on a CAN network and predicts action-space. Based on the action-space, a dSPACE MicroAutoBox generates the track, and executes the MPC controller. An RTK GNSS-based is used for localization. A steering robot has been installed, and by removing the accelerator pedal, the engine control electronics receive the analog signal directly. All signals are logged for later evaluation.
A series-production BMW M2 car with a competition package was used for the tests, which after the modifications mentioned above, is suitable for throttle and steer-by-wire operation. The installation of the instruments is shown in the Figure 5.
The longitudinal and the lateral control of the vehicle is separated, a MPC is responsible for the lateral control of the vehicle, while the velocity is controlled by a PI controller. The application of MPC for path following of ground vehicles is getting more popular because this control technique includes all features necessary for accurate path-tracking at high-speed. The MPC predicts the behavior of the vehicle based on a vehicle model and has information about the path ahead of the vehicle. Furthermore, the MPC can handle actuator dynamics, in this case, the dynamics of the steering system are modeled and included in the MPC. In the MPC algorithm, a cost function is formulated based on the current and the predicted motion of the vehicle, and the optimum of the function is searched which results in the optimal steering demand for accurate path following. During the optimization, the deviation from the reference states and the amount of the steering command is penalized in the formulation of the cost function. In our research, the MPC is applied for following the path which is determined by the reinforcement learning algorithm.
The PI cruise control is applied until the vehicle reaches the throttle release point, which is at the beginning section of the cones. The vehicle starts from zero speed and increases the speed to a reference value, which is kept by the PI controller. When the vehicle reached the throttle release point, the speed demand becomes zero, and the vehicle runs along the path while the gear shift is in “drive” position and the throttle commend is zero.
A scientific publication has been submitted about the results, which is summarized in the following video.
It was a genuine pleasure for the Vehicle Dynamics and Control Team to work and reach the above results in cooperation with BME KJIT. A special thanks goes to MouldTech Systems and ZalaZONE Automotive Proving Ground.