I see the convergence of the semantic web and robotic technology as the next big break through in human computer interaction.
Mankind's first forays into AI were not quite as successful as we would have liked it to be. But it’s highly probable that robotics would receive its IQ component, not from some super complex AI algorithmic break through, but rather from an unlikely quarter - the Semantic Web.
What we see in the development of the Semantic Web today, might well be the precursor of AI capabilities waiting to converge into the brain (read microchip) of a robot - perhaps a next generation Mindstorms NXT.
One of the key barriers to making intelligent robots is the problem of giving them human like capabilities of knowing what they see, hear or where they go. This unique ability of humans - cognition - has been a problem to bring into the robotic world in spite of complicated and expensive algorithms and years of research and development. But that might change with the onset of the Semantic Web and the range of technologies that it brings.
In this post, I will take a "theoretical" stab at building an intelligent robot using the existing semantic web technologies. We will take some liberties with our imagination and make some technical assumptions, but I promise you, none of it would be too far fetched.
I recently bought a Lego Mindstorms NXT, which ships with an RCX brick, sensors and motors, and which can let you assemble and program a basic humanoid.
Our task is to give this humanoid the ability to understand a verbal command, take basic decisions based on the command, recognise humans and places, and be location and direction aware.
Sounds like a tall order, but as you will see, its quite doable,or at least will be in the coming years.
We will begin with the problem of understanding verbal commands, and communicating with humans in a reasonably intelligent manner. Using any standard speech to text API, like spinvox, we can write an application that captures human voice and converts them into text.
Next we build an application that plugs into Opencalais API, Wordnet and Wikipedia, and uses known NLP techniques to parse this text and obtain - keywords, context and intention.
Our NLP application and the contextual information received from Wikipedia, etc. will be the speech recognition and input system for our humanoid, making it capable to decide what action to perform next.
The next problem to tackle would be the problem of eyesight, or image recognition. To keep things simple, lets just give our humanoid the ability to recognize and associate/remember human faces and places, identify its master's face and remember things it sees with specific associations.
We will use a photo recognition API such as tagcow or riya, photo tagging and storing using flickr API and Picasa, to capture images, recognize them, and store them with a specific tag, online. At any given time, our image recognition application will be able to go online into its Picasa or flickr account and retrieve image information for anything it sees, either from its previously stored images or from other images.
The third problem is of giving our robot the sense of direction, geography and distance. That’s perhaps the easiest of all the abilities that we have given it so far. Simply putting a GPS device and build an application that takes the latlong information and queries Google maps and other Geodata APIs in real-time. The robot then always 'knows' where it is and what is happening around it.
We would also want to give this robot a thinking and knowing brain. The simplest way to achieve this can be by plugging all these individual systems into the central processor of the robot, and then giving this central processor, the ability to query the web for any information, by plugging into Google, Powerset or wolfram alpha, based on a set of rules.
Now let’s take this entire code, burn it onto a microchip and place it inside an RCX brick. Let’s give this brick an ultrasound sensor, cameras, a gps, an onboard wifi, and motor arms and legs. We assemble this guy and lo! Our intelligent humanoid, powered by the Semantic Web is ready to go.
This humanoid might be able to only do such simple things as take a direct command from me and perform those actions, under very constrained scenarios. For example, if I have programmed it right, and I ask it go buy me a book (by specific name) from the store on (specific address), it should be able to parse this command as –
- Command given by master (based on my face when I speak)
- buy -> Part of Speech:verb -> action
- book -> Part of Speech:noun -> search wikipedia and get information and image of book
- store -> Part of Speech:noun -> Go to Google maps and get address/diretions
and then perform these commands sequentially with success. While this might sound a bit too over simplified or glossed over at this moment, the bottom-line is that the semantic web for the first time is bringing together an immense AI opportunity.
In the next decade, the Semantic Web will gradually turn into a living, thinking, growing sentient being. This is not going to happen overnight, and this is not going to happen in a centralized way. Thousands of different progresses in technology, lifestyle, social and economic participation, would contribute and further this advancement - either directly and indirectly.
And since this sentient web's information will be machine consumable, intelligent humanoids feeding off this web ecosystem, would become the order of the day.
Saturday, July 4, 2009
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment