- Xiaomi’s humanoid robot hit a 90.2% success rate for simultaneous dual-side self-tapping nut installation in a three-hour factory pilot.
- The system met the production-cycle target of 76 seconds per task while using VLA vision-language-action and TacRefineNet tactile fine-tuning.
- Xiaomi combines multi-modal perception, hybrid whole-body control, and open-source models to accelerate real-world factory robotics deployment.
Xiaomi’s two core technological advancements in embodied AI are the Xiaomi-Robotics-0 VLA large model and the TacRefineNet tactile-based grasping fine-tuning model.
Xiaomi Auto announced today that its humanoid robot has completed its first real-world testing at the Xiaomi Auto Factory.
According to official data, during a continuous three-hour autonomous operation test, the robot was required to complete the entire workflow from grasping to installing self-tapping nuts at a workpiece station.

The challenge of the task lies in the spline structure inside the self-tapping nuts, which results in an inconsistent orientation of the nut in the robot’s gripper each time it is grasped. Additionally, the magnetic force from the positioning pin creates extra pulling interference. The robot must ensure precise alignment and reliable seating between the nut and the positioning pin.
Test results show that the success rate for simultaneous dual-side installation reached 90.2%, while also meeting the production line’s fastest cycle time requirement of 76 seconds.

Supporting this performance are two core technological advancements from Xiaomi in the field of embodied AI: the Xiaomi-Robotics-0 VLA (Vision-Language-Action) large model and the TacRefineNet tactile-based grasping fine-tuning model.
During this deployment, Xiaomi implemented three key technological solutions:
End-to-End Data-Driven Control: Building upon the VLA foundation model and incorporating reinforcement learning, this enables the robot to quickly adapt to different downstream tasks and continuously learn from interactive experiences in real physical environments. This framework effectively reduces reliance on teleoperation data and enhances the model’s generalization ability across different embodiments and scenarios.

Multi-Modal Perception Fusion: The robot integrates multi-modal information, including vision, touch, and joint proprioception, for coordinated perception and comprehensive judgment during operations. Relying solely on vision can lead to uncertainty under changing lighting or partial occlusion. Introducing tactile feedback significantly improves task execution stability and robustness.

Whole-Body Motion Control Hybrid Architecture: This architecture combines optimization-based control and reinforcement learning. The optimization controller, based on quadratic programming, achieves four levels of strictly prioritized control with a solution time of less than 1ms. The reinforcement learning controller, trained on a large-scale parallel simulation platform, enables the robot to learn balance strategies under extreme disturbances, allowing for zero-shot deployment in the real world.

The self-tapping nut workpiece station represents the first step in the large-scale application of Xiaomi’s humanoid robot in automotive manufacturing scenarios. To this end, Xiaomi is also conducting deployment and validation work at other typical stations, including a tote handling station and a front emblem installation station.
It is noteworthy that alongside advancing its robotics technology, Xiaomi is also building an open-source ecosystem. Details of the preliminary technical solutions and experimental videos have been made public. The relevant code and models can be accessed through the following channels:
- Project Page:https://sites.google.com/view/hil-daft/
- Arxiv:https://arxiv.org/abs/2509.13774
- TacRefineNet:https://sites.google.com/view/tacrefinenet
- Xiaomi-Robotics-0:https://github.com/XiaomiRobotics/Xiaomi-Robotics-0
Discover more from ChinaEVHome
Subscribe to get the latest posts sent to your email.