TY - JOUR
T1 - EmMixformer
T2 - Mix Transformer for Eye Movement Recognition
AU - Qin, Huafeng
AU - Zhu, Hongyu
AU - Jin, Xin
AU - Song, Qun
AU - El-Yacoubi, Mounim A.
AU - Gao, Xinbo
N1 - Publisher Copyright:
© 1963-2012 IEEE.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Eye movement is a new, highly secure biometric behavioral modality that has received increasing attention in recent years. Although deep neural networks, such as convolutional neural networks (CNNs), have recently achieved promising performance (e.g., achieving the highest recognition accuracy on the GazeBase database), current solutions fail to capture local and global temporal dependencies within eye movement data. To overcome this problem, we propose a mixed Transformer termed EmMixformer to extract time- and frequency-domain information for eye movement recognition in this article. To this end, we propose a mixed block consisting of three modules: a Transformer, attention long short-term memory (LSTM), and a Fourier Transformer. We are the first to attempt leveraging Transformers to learn long temporal dependencies in eye movement. Second, we incorporate the attention mechanism into the LSTM to propose attention LSTM (attLSTM) to learn short temporal dependencies. Third, we perform self-attention in the frequency domain to learn global dependencies and understand the underlying principles of periodicity. As the three modules provide complementary feature representations regarding local and global dependencies, the proposed EmMixformer can improve recognition accuracy. The experimental results on our eye movement dataset and two public eye movement datasets show that the proposed EmMixformer outperforms the state-of-the-art (SOTA) by achieving the lowest verification error. The EMg- lasses database is available at https://github.com/HonyuZhu-s/CTBU-EMglasses-database.
AB - Eye movement is a new, highly secure biometric behavioral modality that has received increasing attention in recent years. Although deep neural networks, such as convolutional neural networks (CNNs), have recently achieved promising performance (e.g., achieving the highest recognition accuracy on the GazeBase database), current solutions fail to capture local and global temporal dependencies within eye movement data. To overcome this problem, we propose a mixed Transformer termed EmMixformer to extract time- and frequency-domain information for eye movement recognition in this article. To this end, we propose a mixed block consisting of three modules: a Transformer, attention long short-term memory (LSTM), and a Fourier Transformer. We are the first to attempt leveraging Transformers to learn long temporal dependencies in eye movement. Second, we incorporate the attention mechanism into the LSTM to propose attention LSTM (attLSTM) to learn short temporal dependencies. Third, we perform self-attention in the frequency domain to learn global dependencies and understand the underlying principles of periodicity. As the three modules provide complementary feature representations regarding local and global dependencies, the proposed EmMixformer can improve recognition accuracy. The experimental results on our eye movement dataset and two public eye movement datasets show that the proposed EmMixformer outperforms the state-of-the-art (SOTA) by achieving the lowest verification error. The EMg- lasses database is available at https://github.com/HonyuZhu-s/CTBU-EMglasses-database.
KW - Biometrics
KW - Fourier transform
KW - Transformer
KW - eye movements
KW - long short-term memory (LSTM)
UR - https://www.scopus.com/pages/publications/105002583511
U2 - 10.1109/TIM.2025.3551452
DO - 10.1109/TIM.2025.3551452
M3 - Article
AN - SCOPUS:105002583511
SN - 0018-9456
VL - 74
JO - IEEE Transactions on Instrumentation and Measurement
JF - IEEE Transactions on Instrumentation and Measurement
M1 - 5021514
ER -