This article is an essay I submitted to university this year. I thought it might be of benefit to whoever is looking for some viewpoints.
In the article written by Brooks, he describes his ideas for a new way of approaching the design of Artificial Intelligence (AI). His approach varies vastly with the then traditional AI research on several key aspects :
- He states that “the world should be it’s own model” and goes on explaining that basing an AI on anything else (meaning, a mere representation of the world in the form of a defined model) is error-prone from the start.
- His theory is that a more effective AI can be achieved by constructing different “layers” of actions that are independent as compared to the then traditional approach of building a central “decision” unit that would tell the AI how to act.
In this essay, we will try to study the reach of this theory and the obstacles that may stand in it’s way.
“The world as it’s own model”
Traditionally, robots have used (and still often do) some way of representing the world that surrounds them. This goes from recognising an object (a football, a table or even a human being) to ways of recognising relations between those objects. This process has been referenced to as shaping a “model” of its world. This model can be defined as a set of objects and the relations between them that the robot can use to analyse (and eventually act upon) it’s surroundings.
However, the world is vast and complex in many ways. We humans do not nearly know everything there is to know about it and learn every day. Brooks argues that any AI which is built using a mere “representation model” of the world will have a problems for that exact reason : How are we supposed to build a correct model of a world we do not fully understand ourselves yet ? The dangers of building an AI on top of an imperfect model are numerous. First of all it will not know how to handle and analyse situations that have been omitted in it’s own model. Secondly it can make bad judgements based on a flawed model (human-made errors in the representation of the world). For these reasons, Brooks states that “explicit representations and models of the world simply get in the way” and that “it turns out to be better to use the world as it’s own model”.
Although using a model has it’s advantages, the absence of a clearly defined model brings with it another category of problems. For example, it means that Brooks’ AI can only react to whatever it’s sensors read in real-time. This makes planning beyond it’s sensors a real challenge. If the AI can only act on what it’s sensors read, it means it cannot deal with anything else (anything that is beyond it’s sensor’s reach). In turn, this makes planning any further impossible. How could an AI based on this principle plan a trip to Spain if it’s sensors only reach to the tree that is blocking it’s view 30 meters from it’s current position ?
The distributed approach
Brooks’ idea of constructing an AI without a central control unit offers many advantages. His theory states that the best way to create AI is in layers, with each layer representing a specific action and groups of layers (modules) representing specific behaviours. This means we can focus on one single activity at a time (like avoiding obstacles), test that in detail and, only once it’s completely finished, add a second layer on top (like walking). These layers act in parallel, completely independent of each other. However, a higher-level layer can suppress the actions of a lower-level one, leaving space for some “direction” (walking to a certain goal, while avoiding obstacles for example). Brooks argues that to an external observer, this looks like a low level of intelligence while in reality, it’s simply the sum of the two layers acting independently.
However, even if this approach brings with it some advantages, there are also numerous problems.
Making multiple levels work together
One of the problems is that, with a distributed approach, the layers act independently. Thus, this means that a crucial part of AI, analysing a situation from multiple angles (in this case, effecting many layers) and coming to some sort of global conclusion does not seem achievable through this architecture alone.
On top of that, as the distributed architecture has no central control unit, making multiple levels work together at all seems to be a very tough challenge, if not completely impossible. We can take natural language processing and acting upon it as an example. If all layers really are independent of each other, how would an AI hear on one level, process whatever it hears on the next and then decide how to act from there (knowing that the action can be “movement”, “attention”, “listening” or any other type of action) ?
For a machine to be considered truly intelligent, it should have some kind of mechanism comparable to learning (if it couldn’t learn, the machine would only be an agglomeration of facts in a big knowledge-base). However, due to the nature of the subsumption architecture, each layer is independent of the next, not enabling any form of communication between them (other then suppressing signals). In turn, this means that if the subsumption architecture were able to find any way of learning, it would be restricted to a per-level basis and thus only be able to optimize each layer independently. This would be equivalent to a human trying to optimize his “walking” action without being able to take into account the country he is in (with cars driving either on the left or the right side of the road).
There seems to be a big problem with implementing this subsumption architecture on a big scale. In his article, Brooks talks about an AI with up to 14 different layers, that will be able to
- wander around an office
- find open doors
- retrieve empty soda cans from cluttered desks in crowded offices and
- return them to a central repository
This is by far not what one would consider a truly intelligent machine (it would not pass the Turing test for example, because it cannot even process natural language). In this architecture, the way levels interact with each other (only allowing or suppressing signals) creates a situation where the creation of every additional layer adds complexity to the final product as a whole. In order to create an AI that is deemed truly intelligent, one would need a large number of layers. For this reason, it follows logically that the creation of a truly intelligent machine is a very complex, if not completely impossible, task to handle from an engineering point of view.
On top of that, the fact that every layer has to be tested intensively in combination with it’s lower-level layers, creates a situation where would is immensely time-consuming to build a machine with many layers. As every layer makes the resulting AI more complex, the time-consumption for building a new layer on top of the AI will grow exponentially with each added layer.
Loosely related to my last point is the absence of a central control unit will prove very hard to deal with in more complex AI machines. An important aspect of a true intelligence is the ability to choose the best action to accomplish from a pool of many possible actions (also known as action selection) that present a more complex structure then “A suppresses B and B suppresses C”. A great example that I can give about this issue would be “buying food”. Normally, buying food is something one can do at any time and does not take long (a small store is around the corner in most cities). This means that if one is having a nice walk and comes by a supermarket, buying food is less important then profiting from your walk (you can always buy food later). However, I am currently writing this essay in the British country-side and, without a car at my disposal, it takes about 40 minutes one-way to the closest food-providing market (which is actually a post-office with some extra products). For this reason, I have had to interrupt my walk today in order to buy the much-needed groceries, knowing that if I did not do it then, it would have taken 80 minutes to do it later (two-way trip to the post-office after coming back home). This illustrates perfectly that layer A (buying food) sometimes suppresses layer B (profiting from my walk), but in other situations this is not the case.
Action-selection is an important part of any AI and the subsumption architecture seems fundamentally flawed in this area. Brooks recognized this problem and proposed an extension of his architecture called “hormonal activation”, based on animal hormone systems. The idea of this extension is that some layers only become active if a certain threshold value is reached. The threshold value is computed through analysis of the environment (like anything else in the subsumption architecture) and would enable the machine to “choose” different actions under different circumstances. This extension covers part of the problem, but still isn’t flexible enough in the situation I described (it merely activates / deactivates certain behaviours under specific circumstances). For these reasons, it seems to me that the subsumption architecture by itself does not seem to leave any room for the kind of flexibility at run-time level needed for a machine to be considered truly intelligent.
A non-symbolic architecture
The subsumption architecture does not provide for a way to represent symbols in the system (it is, by definition, non-symbolic). For this reason, agents running on this architecture have not (to date) been able to demonstrate any natural language processing, analogy usage or even naïve physics, for these all require the use of symbols.
Natural language & the web
The biggest resource of information today is, without a doubt, the internet and there has been a lot of discussion of it’s possible use to teaching AI (a vast resource of information of which AI agents could learn things such as object-relations, knowledge-bases, etc.). However, as the subsumption architecture does not provide a way of processing natural language or even representing knowledge-bases for that matter, this is a mountain it does not seem to be able to climb (and a valuable asset lost).
AI has made a lot of progress through the theories of Rodney Brooks. He undeniably helped the industry with his views and works as well as the mere fact he (and his company) has been able to take the first step to commercialising low-level AI both for consumers (the most-known example being the Roomba robots) and government illustrates the success he has had. I also agree with the fact that truly intelligent AI should use the world as it’s own model, because we humans simply do not understand it well-enough, let alone being able to write it down in code for a robot to understand.
However, I am convinced that the subsumption architecture alone is not enough to design a truly intelligent machine mainly because of the absence of any form of central control unit. In order to be considered truly intelligent, a robot should be able to analyse a situation using all of it’s senses, come to some global conclusion, act upon it and store the result for future reference (therefore being able to learn from it’s actions).
Brooks himself, when describing the actions of his 3 layer AI (avoiding obstacles, walking and exploring the world) explains that to an external observer, the actions are perceived as intelligent. To me, this is exactly the problem with the subsumption architecture : The result of multiple layers running in parallel might seem intelligent, but it really isn’t. It is merely the logical sum of all of it’s layers. He goes on saying “we will never understand how to decompose human level intelligence until we’ve had a lot of practice with simpler level intelligence”. Under those premises, Brooks himself then admits his AI (simpler level intelligence) is not truly intelligent.
The Turing test, which is another way of assessing the intelligence of a machine, states that a machine could be judged truly intelligent if it were undistinguishable from a human being on a conversational level. The test consists of a situation where a human talks to either an AI or another human being without him being able to see who he / it is. If he cannot clearly distinguish when he is talking to the AI and when he is talking to another human being, the AI is considered to have passed the test. Again, the subsumption architecture, with it’s inability to process natural language, would fail this test.
I am not saying that the idea of the subsumption architecture and it’s layered intelligence is to be thrown away as a whole (it has definitely proofed it’s merits as I described earlier), but a machine will never be able to take intelligent decisions based on complex situations on this architecture alone.
1: Rodney Brooks, Artificial Intelligence , 1987 , edition:47, MIT
2: Rodney A. Brooks, Integrated systems based on behaviors, SIGART Bulletin, 46-50
3: Errol Morris, Rodney A. Brooks – Subsumption Architecture, http://video.google.com/videoplay?docid=-5563890805802735704# , Consulted on : 19th March 2011