Xiaomi Humanoid Robot Hits 90.2% Success Rate in Auto Factory Pilot

Takeaways by AI

Xiaomi’s humanoid robot hit a 90.2% success rate for simultaneous dual-side self-tapping nut installation in a three-hour factory pilot.
The system met the production-cycle target of 76 seconds per task while using VLA vision-language-action and TacRefineNet tactile fine-tuning.
Xiaomi combines multi-modal perception, hybrid whole-body control, and open-source models to accelerate real-world factory robotics deployment.

Xiaomi’s two core technological advancements in embodied AI are the Xiaomi-Robotics-0 VLA large model and the TacRefineNet tactile-based grasping fine-tuning model.

Xiaomi Auto announced today that its humanoid robot has completed its first real-world testing at the Xiaomi Auto Factory.

According to official data, during a continuous three-hour autonomous operation test, the robot was required to complete the entire workflow from grasping to installing self-tapping nuts at a workpiece station.

A collage of robotic hands performing various adjustments and manipulations on mechanical components. The images showcase functions like primary and secondary adjustments, direct placement, rotational adjustments, side movement adjustments, and angled connections. — Examples of successful self-tapping nut installation under different conditions

The challenge of the task lies in the spline structure inside the self-tapping nuts, which results in an inconsistent orientation of the nut in the robot’s gripper each time it is grasped. Additionally, the magnetic force from the positioning pin creates extra pulling interference. The robot must ensure precise alignment and reliable seating between the nut and the positioning pin.

Test results show that the success rate for simultaneous dual-side installation reached 90.2%, while also meeting the production line’s fastest cycle time requirement of 76 seconds.

Two robots operating in an automated assembly line, with performance metrics displayed on the screen for each robot. — Simultaneous dual-side installation success rate reaches 90.2%

Supporting this performance are two core technological advancements from Xiaomi in the field of embodied AI: the Xiaomi-Robotics-0 VLA (Vision-Language-Action) large model and the TacRefineNet tactile-based grasping fine-tuning model.

During this deployment, Xiaomi implemented three key technological solutions:

End-to-End Data-Driven Control: Building upon the VLA foundation model and incorporating reinforcement learning, this enables the robot to quickly adapt to different downstream tasks and continuously learn from interactive experiences in real physical environments. This framework effectively reduces reliance on teleoperation data and enhances the model’s generalization ability across different embodiments and scenarios.

Diagram illustrating the VLA Model and Offline-to-Online Training processes in machine learning, featuring components like the Vision-Language Model, Actor (Diffusion Transformer), Critic, and fine-tuning strategies. — Model framework and training process

Multi-Modal Perception Fusion: The robot integrates multi-modal information, including vision, touch, and joint proprioception, for coordinated perception and comprehensive judgment during operations. Relying solely on vision can lead to uncertainty under changing lighting or partial occlusion. Introducing tactile feedback significantly improves task execution stability and robustness.

A split image showing robotic arms performing tasks in an industrial setting, with hand movement on the left and associated finger heat maps on the right. — Display of head camera, wrist camera, and fingertip tactile information

Whole-Body Motion Control Hybrid Architecture: This architecture combines optimization-based control and reinforcement learning. The optimization controller, based on quadratic programming, achieves four levels of strictly prioritized control with a solution time of less than 1ms. The reinforcement learning controller, trained on a large-scale parallel simulation platform, enables the robot to learn balance strategies under extreme disturbances, allowing for zero-shot deployment in the real world.

Diagram of a high-level motion generator model for robotic control, featuring components like the control selection module, state estimator, controller, and robot hardware layer, alongside a visualization of robots in motion on a checkered floor. — Whole-body motion control block diagram

The self-tapping nut workpiece station represents the first step in the large-scale application of Xiaomi’s humanoid robot in automotive manufacturing scenarios. To this end, Xiaomi is also conducting deployment and validation work at other typical stations, including a tote handling station and a front emblem installation station.

It is noteworthy that alongside advancing its robotics technology, Xiaomi is also building an open-source ecosystem. Details of the preliminary technical solutions and experimental videos have been made public. The relevant code and models can be accessed through the following channels: