The paper contributes to scanpath bundling methods. We propose an analytical approach for statistical comparisons of aggregated scanpath visualizations by means of second-order gaze analysis metrics. The present study explores differences in attention distribution and cognitive processing over architectural objects between architects, art historians, and non-experts. The results show between-group differences in attention dynamics of the aggregated scanpaths. The aggregated scanpaths of both expert groups were focal, while non-experts' scanpaths were ambient. Experts also paid more attention to and tended to remember better the architectural details and their location. The discussion explores the scalability of the proposed approach for Human-Computer Interaction and accessibility technologies designed to enhance the experience of cultural heritage.
There has been increasing interest in non-invasive predictors of Alzheimer's disease (AD) as an initial screen for this condition. Previously, successful attempts leveraged eye-tracking and language data generated during picture narration and reading tasks. These results were obtained with high-end, expensive eye-trackers. Instead, we explore classification using eye-tracking data collected with a webcam, where our classifiers are built using a deep-learning approach. Our results show that the webcam gaze classifier is not as good as the classifier based on high-end eye-tracking data. However, the webcam-based classifier still beats the majority-class baseline classifier in terms of AU-ROC, indicating that predictive signals can be extracted from webcam gaze tracking. Hence, although our results indicate that there is still a long way to go before webcam gaze tracking can reach practical relevance, they still provide an encouraging proof of concept that this technology should be further explored as an affordable alternative to high-end eye-trackers for the detection of AD.
Recent work in XAI for eye tracking data has evaluated the suitability of feature attribution methods to explain the output of deep neural sequence models for the task of oculomotric biometric identification. These methods provide saliency maps to highlight important input features of a specific eye gaze sequence. However, to date, its localization analysis has been lacking a quantitative approach across entire datasets. In this work, we employ established gaze event detection algorithms for fixations and saccades and quantitatively evaluate the impact of these events by determining their concept influence. Input features that belong to saccades are shown to be substantially more important than features that belong to fixations. By dissecting saccade events into sub-events, we are able to show that gaze samples that are close to the saccadic peak velocity are most influential. We further investigate the effect of event properties like saccadic amplitude or fixational dispersion on the resulting concept influence.
Eye tracking can serve as a gateway to studying the mind. For this reason it has been adopted by a diverse range of scientific communities. With the improvement of the quality of head-mounted virtual reality devices (HMDs) over the past 10 years, eye tracking has been added to capture gaze in immersive environments. The use of HMDs with eye tracking is increasing significantly and so is the need for a toolbox enabling consensus about eye tracking methods in 3D. We present the Salient360! toolbox: it implements functions to identify saccades and fixations and output gaze characteristics (e.g., fixation duration or saccade directions), to generate saliency maps, fixation maps, and scanpath data. It also implements routines made to compare gaze data that were adapted to 3D. We hope that this toolbox will spark discussions about the methodology of 3D gaze processing, facilitate running experiments, and improve the gaze study in 3D. https://github.com/David-Ef/salient360Toolbox
In this paper, we evaluate different small neural network models for eye movement classification and show our so far developed improved model architecture. For evaluation, we used a subset (1.5 million sequences) of the TEyeDS annotations since it contains in the wild recordings and has the most eye movement annotations to our knowledge. We classified fixations, saccades, and smooth pursuits with four different network architectures and the proposed model improves the equally weighted accuracy by 3.8% to the best competitor while only using 6% of the amount of learnable weights.
In this work we apply a graph-theoretical analysis approach to eye tracking data recorded in virtual reality to investigate the underlying patterns of visual attention during spatial navigation. Based on the eye tracking data recorded in one virtual city, our graph-theoretical analysis identifies a subset of houses outstanding in their graph-theoretical properties which we define as gaze-graph-defined landmarks. Moreover, we are able to replicate these results with a different eye tracking data set recorded in a different virtual city. Finally, the initial model selection process of the participant’s performance in a point-to-building task in the second city suggests a stronger influence of graph-theoretical predictors on the performance compared to the non-graph related measures, however more research will be necessary to determine their relationship
Gaze is promising for natural and spontaneous interaction with public displays, but current gaze-enabled displays require movement-hindering stationary eye trackers or cumbersome head-mounted eye trackers. We propose and evaluate GazeCast – a novel system that leverages users’ handheld mobile devices to allow gaze-based interaction with surrounding displays. In a user study (N = 20), we compared GazeCast to a standard webcam for gaze-based interaction using Pursuits. We found that while selection using GazeCast requires more time and physical demand, participants value GazeCast’s high accuracy and flexible positioning. We conclude by discussing how mobile computing can facilitate the adoption of gaze interaction with pervasive displays.
Gaze and head tracking, or pointing, in head-mounted displays enables new input modalities for point-select tasks. We conducted a Fitts' law experiment with 41 subjects comparing head pointing and gaze pointing using a 300 ms dwell (n = 22) or click (n = 19) activation, with mouse input providing a baseline for both conditions. Gaze and head pointing were equally fast but slower than the mouse; dwell activation was faster than click activation. Throughput was highest for the mouse (2.75 bits/s), followed by head pointing (2.04 bits/s) and gaze pointing (1.85 bits/s). With dwell activation, however, throughput for gaze and head pointing were almost identical, as was the effective target width (≈ 55 pixels; about 2°) for all three input methods. Subjective feedback rated the physical workload less for gaze pointing than head pointing.