How, for example, does the AI go away and download data from the internet and then verify that it is correct?
One method might be to verfiy the data, from varying sources. EG. Michael Jackson's death. A VERY popular news story and notice that I've chosen a subject that is interesting to the masses, as the AI will be used by the masses and not necessarily by geniuses and professors.
The AI will have built into it, certain web sites/sources that it will visit in the first instance. For example, the BBC website. Once it identifies that the top news story is that MJ is dead, it will realise that this is a story that might interest the user. AI will then verify this news story with another popular news site and then another, collating information, before informing the user that, "Michael Jackson has passed away, would you like to hear more?"
There will be other information, for example, that might not be so popular and may be more difficult to verify. Lets say, "Whats the population of London?" AI will search for this information, come up with numbers and perhaps give a trimmed mean of numbers it finds. The result, "The population is approximately X, based on a trimmed mean, from multiple sources".
Obviously, we would need to make the output more user friendly and close to the way a human would speak, but you get the idea.
I want to re-iterate that AI must be attractive to the masses and not geeks/geniuses. Remember, this software will eventually have to presented/marketed to people who will know very little about computers. They'll need to see a working demo/prototype that works at a basic level.
What you are suggesting is not impossible, very little is impossible, but it is a very challenging idea that will bring together many different areas of CS and indeed other academic areas.
Oh, I totally agree. In the first instance, it will be important to develop the AI to a point where it can have a basic conversation with the user. A later step will be to hook it upto the internet and increase its knowledge base, then to verify and organise this knowledge - but bear in mind that this would probably later down the line.
I can imagine, for example, chatting briefly with the AI about the possibilities of aliens one day, only for it to then spend a few hours collecting every single crackpots ramblings on the web about aliens, how would it catagorise what is factual and what isnt, ...
One method might be to ask the user for extra input. Consider the following conversation:
User: AI, I wish to discuss Aliens.
AI: I don't know much about Aliens, would you like me to look it up?
User: Go for it. How much time do you need?
AI: 3 minutes, max.
User: <waits>
AI: I have factual information and anecdotal/unverified opinions...which would you like to discuss.
User: Both
AI: OK you go first.
User: I believe aliens exist. There are far too many planets out there for Earth to be the only one with life.
AI: According to xxx source, the chances of life are ... There are also opinions from people that they have seen aliens at ...
AI could punctuate information with 'factual' or 'opinion'. It will never tell the user that what it is saying is fact (when it might not be).
Also, the user will have the option to ask AI to delete any information that he chooses. So, if the user feels that AI has got itself in a muddle, he could ask for certain sources to be deleted, keeping only the sources the user wishes to be kept in AI's memory banks (for later discussion).
Similarly, if the user feels that what AI has discussed is 100% correct, then the user will inform AI of this. AI could perhaps then categorise that bit of information under "factual". Alternatively, AI might have a database which stores information with certain level of perceived accuracy, rated anywhere between 0 - 100%.
Remember, AI will have a learning mechanism, which means that with time, it gets wiser, storing away factual information and discarding information that it or the user feels is incorrect.
Once searched and stored, AI will always have access to the information it previously gathered and verfied, negating the need to carry out the same searches. There is nothing to prevent it from "updating" its memory banks, though, perhaps while the user is away/asleep.
The list goes on, its a self replicating problem almost, designing an all encompassing Ontological descriptor for all and any data out there on the web would be a feat so impressive I reckon it would be worth a Noble prize.
I do realise that its difficult and in our lifetimes wont be nailed 100%, if ever. However, there is no reason why as humans we can't attempt to make something that can replicate the human thought process and in time develop it so that we can get closer and closer to that 100% figure, where an AI can have a rock solid conversation about anything and everything, without getting itself in a muddle or requiring intervention from the user.
Seriousely, though, as negative as I sound, I applaud anyone who even attempts to start anything of this magnatude and really would like to be kept updated, as i'm sure many others would.
Actually, you are not negative. You are promoting good discussion and pointing out the problems that anyone developing an AI like I've outlined, will have to deal with. I have no problem with constructive criticism and enjoy finding solutions to problems.