Making Paths for the Blind with Machine Learning
New research in Germany is proposing a new portable GPU-powered system to help visually impaired people navigate the real world. The system addresses one of the main challenges of real-time computer vision frames – identifying glass and other transparent obstacles.
the paper, from the Karlsruhe Institute of Technology, details the construction of a user-worn system, titled Trans4Trans, consisting of a pair of smart glasses connected to a portable GPU case, making it a lightweight laptop, which captures RGB and depth images at 640 × 480 pixels in a continuous stream, which is then executed through a semantic segmentation frame.
The system’s sensory feedback capabilities are enhanced by a pair of bone conduction headphones, which emit acoustic feedback in response to environmental obstacles.
The Trans4Trans system was also tested on the Microsoft HoloLens 2 augmented reality platform, achieving complete and consistent segmentation (i.e. recognition) of potentially dangerous obstructions such as glass doors.
Trans4Trans uses a dual approach, using both an encoder and a transformer decoder, and relying on a Transformer matching module (TPM) capable of assembling feature maps generated by dense partition integrations, while transformer-based decoder is able to consistently analyze feature maps from its paired encoder.
Each TPM consists of a single transformer-based layer, essential for low resource consumption and system portability. The decoder contains four symmetrical stages for the encoder, with a TPM module assigned to each. The system saves resources by integrating the functionality of multiple approaches into a cohesive system, instead of deploying two separate models in a linear workflow.
The glasses used in the system incorporate a RealSense R200 RGB-D sensor, while the host machine houses a Jetson AGX Xavier NVIDIA GPU, designed for embedded systems, and comprising 384 NVIDIA CUDA cores and 48 Tensor cores.
The R200 offers projection of speckles and passive stereo adaptation, making it suitable for indoor and outdoor environments. The speckle feature is especially useful for evaluating transparent surfaces, as it augments and clarifies incoming visual data without being blinded by extreme light sources. The infrared capabilities of the sensor also help achieve distinct geometry and form actionable depth maps, which are essential for obstacle avoidance, in the context of project objectives.
Prevent cognitive overload for the user
The system must strike a balance between an adequate data rate and an excess of information, as the wearer must be able to distinguish the environment consistently through audio feedback and vibration feedback.
Therefore, Trans4Trans artificially limits the amount of feedback data, with a single default threshold set at one meter, rather than forcing the user to learn a variety of vibration parameters that correspond to different distances from objects and barriers. imminent.
The Trans4Trans system was tested on two datasets dealing with the segmentation of transparent objects: Trans10K-V2, from the University of Hong Kong et al, which contains 10,428 images of transparent objects for validation, training and testing; and the Stanford2D3D dataset, which contains 70,496 images of mixed-transparency objects captured at 1080 × 1080 resolution.
During testing, Trans4Trans was also able to segment transparent objects misclassified by the Trans2Seg initiative published in early 2021 by the same researchers, while requiring less GFLOPS to calculate and segment surfaces.
Unlike Trans2Seq, which uses a CNN-based encoder and a transformer-based decoder, Trans4Trans only uses a transformer-based encoder-decoder architecture, surpassing the previous approach and also significantly improving PVT.
The algorithm also achieved peak results for a number of transparent classes, including pot, the window, door, Chopped off, box and bottle.