Storing data in a database VS .txt file

Wow, would love updates, is this a commercial idea, an eventual website or a massive uni project?

Would love to help out if it needs input, as I am guessing would most of OCuk! I can be a good beta tester as I love to find faults ;)
 
Mr DJ.
You wrote plenty of stuff.
But I'm sorry to say, you were no help whatsoever.
I'm bemused as to why you even bothered replying.

:p

Because I wanted to help, actually. I couldn't see what it is you were doing (it appears the only part of the entire thread I missed was you posting just that) and that you are making some rather naive statements (I do mean that in the polite sense of the phrase) like how you are keeping a cache in ram, but then also say you'll have 100s-1000s of clients connected. That's a lot of ram. More ram than anyone has.

I take it you've looked into mapping the phrases (and their associated counterparts; e.g. "The Queen is the British Monarch" and "The Monarch of Britain is The Queen" etc to reduce duplication as well, but that's somewhat of a digression from the topic and I am probably teaching you to suck eggs. :)

It sounds to me like you could want a multidimensional table, or as some call it OLAP. Something I don't have a great deal of experience with, admittedly.
 
Thanks for the advice.

My design avoids the use of storing duplication of data.

Pretty much all information is given a unique id. The relationships are made using the ids.

AI has the ability to sort data in such a way that if a peice of data is rarely accessed/used/written-to/read-from, it eventually falls away to the bottom of the table or gets deleted.

The idea is that the AICore behaves much like a human so that if we regularly use information, we tend to remember it. However, the less we use it, the more likely we are to completely forget it. At this point we would need to revise or re-learn the information. AICore will operate in a similar way which will help avoid the situation of getting bogged down with superfluous data/information.


This is where using a properly designed database will do you wonders. The database will handle the sorting of regularly used data for you, and will automatically adjust itself so the most accessed data is returned quicker than the least accessed.

However, I'm a little suprised at your size requirements if you have no duplication of data. If you store every word in the English language into a database (around 750,000 words, and that's being generous) then you are looking at a database size of about 50Mb (again, this is being generous). You could have a table (called 'Words') which stored all the words against a unique identifier and another table which contained the number of times a specific word had been entered (called 'WordCount' or something). This would just contain the 'Word' id and a count value. If an entry didn't exist in this table then it has never been entered. You could also have a 'Sentence' table, which contained each line a user entered, but built up as word Ids.

If you structure your database something like this then you would be able to store all the information that is entered but keep the database size down. Even with 100's of people using the system it would be a long time before you got to 1Gb, let alone 100Gb! It also let's the database do what it does best, which is organising, sorting and working out the best way of getting your data back to you as quickly as possible.
 
Last edited:
Back
Top Bottom