Robots may now accomplish manipulation tasks without predefined rules thanks to deep imitation learning. However, the biggest obstacle in achieving this is that current architectures infer a reactive action to the existing states. In contrast, real-world robots may be required to use the memory.
In order to achieve memory-based robot manipulation, recent research offers a sequential data-based gaze control. According to the researchers, a memory-based gaze generation system allows the robot to establish the proper location, only inferred from the preceding time step's data. The study could benefit Industrial Robotics Services Market as it proposes a self-attention architecture based on transformers for gaze prediction.
According to experiments on a multi-object manipulation task, the transformer's self-attention is a viable approach for such tasks.
Deep imitation learning is a potential method for autonomous robot manipulation that does not require hard-coded control rules. Deep imitation learning's current applicability to robot manipulation is confined to reactive control based on the current time step's states. On the other hand, future robots will be needed to complete tasks using their memory gained from experience in complex contexts, for instance, a robot equipped with the job to find a previously used object located above a shelf. Simple deep imitation learning may fail in this case due to distractions generated by complex settings. Hence, the present study claims that predicting gaze from sequential visual input allows the robot to complete a memory-intensive manipulation task.
The proposed technique leverages a Transformer-based self-attention architecture for gaze estimation based on sequential input to construct memory. A real robot multi-object manipulation task requiring recollection of prior states was used to test the suggested technique. The technique was found to be reliable and also practical.