TL;DR Usually in reinforcement learning, the body designs of robot agents are fixed. Here, the agent can learn a task, and also learn a better design of its body at the same time.
Link to the blog article:
(There is a blooper section at the very bottom of the article so check it out!)
There's also some active discussion about this article on Twitter:
I have been meaning to write up this fun little experiment for a while now, after working on previous articles related to Reinforcement Learning (RL) and Evolution Strategies (ES) that you may have read before from a previous HN discussion.
While I was trying to solve a few RL/ES problems, I found the BipedalWalkerHardcore task extremely frustrating to solve (actually much harder than all of the standard MuJoCo tasks that most papers are based on), but eventually was able to crack it  after some effort. I was thinking perhaps the agent's body was not really suited to solve this task, and even minor tweaks here and there would result in making the task easier for an RL algorithm to learn a good set of parameters of the agent's controller neural network to perform well on the task.
There has been an exciting line of work on Passive Robotics where researchers such as Tad McGeer and Steve Collins made walking robots that walked on their own naturally without using any external power, unlike complicated, inefficient robots like the Asimo that had motors everywhere for controlling each joint that is all managed by a central computer. In some ways, many standard RL tasks we see are similar to the Asimo model where we train a neural network to control a fixed, pre-determined robot. I thought it might be an interesing little experiment if I also allowed the RL algorithm to not only learn the parameters of the neural network controller, but also learn a set of parameters that describe the structure of the agent's body at the same time.
We also see work done using Evolution, such as Strandbeests, virtual creatures by Karl Sims and Soft Robots, where novel morphology is being discovered (an excellent course on evolutionary robotics by Josh Bongard ). While RL is great at many problems, I feel a limitation of RL is to discover novel structures, although there have been recent attempts. But at the same time, RL is also much more sample efficient at learning the search space of a pre-defined design, which is what this article tries to explore starting with using only the simplest of all RL algorithms. Hopefully it will spark more life and discussion in the area of morphology learning and generative design in the RL community.
Any feedback welcome!
It looks like just 3 or 4. Do you think it could scale to thousands or millions of parameters?