The Candy Man Can
How AI improves automation and motion control with reinforcement learning
A sneak peek into the factory of the future has a sweet spot—so to speak. Several conveyor belts transport chocolate bars that are part of a demonstrator machine that shows how artificial intelligence can be used for motion control. What remains to be done in a real factory is to automatically pack the chocolate bars. In this Intelligent Infeed Demonstrator machine from Siemens Digital Industries, the chocolate bars must be placed in evenly spaced slots on the outfeed belt.
“The bars are placed on the inlet belt at random intervals,” says Martin Bischoff, expert in virtual mechatronics at technology, the research division at Siemens. “The system controller achieves this by altering the speeds of the conveyor belts. A line of three conveyor belts can be accelerated or slowed down to ensure the chocolate is positioned correctly on the outlet belt. The development of an optimized control algorithm for this application is a tricky programming task—if you don’t believe it: just try it yourself. Via Reinforcement Learning, we have trained an artificial intelligence controller to realize this task.”
Reinforcement learning, according to Siemens Digital Industries, is an artificial intelligence method that works in much the same way as most people learn to ride a bicycle—by trial and error, without any knowledge of the basic physics: The novice cyclist experiences whether his or her own technique is good directly during the riding tests and thus gradually becomes better and better.
“This is exactly how reinforcement learning works,” explains Michel Tokic, a fellow expert at technology and a lecturer in applied reinforcement learning at Munich’s Ludwig Maximilian University. “The AI is given a target specification, such as ‘the candy bars may only be placed in the target fields, and the system should work as quickly as possible in the process.’ The AI then makes—initially completely random—control attempts on the simulation model and receives feedback, triggered by light barrier signals, on how good each attempt was. With this feedback, a goal-directed control algorithm emerges after many automated training cycles.”
Errors in a plant control system can have expensive or dangerous consequences. For this reason, controls are developed and tested on digital twins of the plants without risk as standard (Siemens Virtual Commissioning). The digital twin of the plant can also be used to train the AI.
“After about 72 hours of training with the digital twin (on a standard computer; about 24 hours on computer clusters in the cloud), the AI is ready to control the real machine. That’s definitely much faster than humans developing these control algorithms,” Bischoff says.