Skip to main content

PhD Project – Jindi Wang

Application of gesture recognition technology in immersive virtual reality system

In this project, We will explore the application of machine learning in gesture recognition in immersive virtual reality systems (IVRS).


In recent years, virtual reality, augmented reality, artificial intelligence and other technologies have been widely used in people’s daily life. To give people a better sense of immersive experience, many virtual reality hardware devices have developed rapidly in virtual reality interaction, and the most important virtual reality interactive devices are head-mounted displays and gesture controllers. However, there are many problems in gesture interaction, such as the noise data generated by gesture involuntary shaking, the interference of magnetic field intensity in the environment to interactive equipment, whether gesture interactive equipment is easy for users to use, etc.

Research Outline and Goals

With the application of virtual reality technology and artificial intelligence in People’s Daily life, several problems mentioned in the previous background need to be solved as soon as possible. However, popular gesture recognition devices such as camera-based Leap Motion are easily disturbed by the environment and fail to provide tactile feedback necessary for the interaction. Therefore, gesture controllers based on data gloves in immersive interactive systems will be the development trend in the future. In this project, we still lack necessary gesture data as system input for the time being, so our work is to collect data of users of different ages and genders first. Therefore, this project aims to build a gesture data set in different scenarios and build a gesture recognition model that can be applied in various scenarios through this data set and improve the accuracy, delay, anti-interference and robustness of the model.

Feasibility analysis

A gesture recognition system can easily suffer from slight changes in the gesture. Some gestures, such as the casual shaking of fingers, are noise to the classification model, so how to use an effective filtering method can help improve the performance. This project uses the data glove as the data acquisition device, and gesture data in the data glove is the main contribution of the gyroscope and accelerometer. Therefore, how to filter the noise data can first determine these sensors whether high-frequency or low-frequency data were collected. For the accelerometer in typical gestures, change should be in a low-frequency state, and when the acceleration value of high frequency is detected, there is noise. However, the gyroscope should be in the high-frequency condition in the typical gesture, and it often produces low-frequency drift due to the loss of the reference frame. Therefore, it can be considered when detecting low-frequency values to filter out the data. So, according to these two conditions, we can use different filtering methods to carry out experiments. For the lightweight model, the number of network parameters is accessible to affect the response speed of the network because more parameters need to occupy more computing resources, so how to design a lightweight network is an essential part of gesture recognition. For the model that has been trained in advance under the premise of ensuring accuracy, it may be feasible to reduce the depth and width of the model gradually. When the depth of the model is reduced to a certain number of layers, there will be a critical point where the depth is no longer reduced, and the model’s width is reduced. Similarly, when the width is reduced to a crucial moment, the network has been scaled to the minimum structure with acceptable accuracy. So, we can use this method to slim down the model.

Future work 

The project involves four stages:

  1. Data preparation stage: design gestures for daily interaction of users, invite users of different ages and genders to participate in data collection of the project and preprocess the data.
  2. Based on the Kalman filter, the filtering method is improved to improve the anti-interference ability of the model against noise. Design the finite state machine to make the switch in gesture state more smooth and accurate.
  3. Build a deep neural network. Consider using CapsNet and fine-tune.
  4. Invite about 30 users of different ages and genders to survey their sense of user experience and analyse which factors users care most about in immersive virtual reality experience.