Storing data in a database VS .txt file

Soldato
Joined
18 Oct 2002
Posts
15,861
Location
NW London
Hi,

I am creating a program which is contstantly saving/accessing data.

At present I am using simple .txt files to save this data.

However, I would like to know what advantage would I have by setting up a database to save the data to and retrieve data from?

Speed is VERY important to me.

Would it be worth me changing from a .txt file data storage system to a full blown database?

Thanks
 
I'm using VB.NET to program my software.
Would it not be easier for me to use the inbuilt database offered in Visual Studio?
 
It all depends on how big the DB is going to get and what it is used for.

Flat file DBs have their uses as they cost nothing, but are slow if storing anything more than a few KB.

If the site is planned to get a good bit of traffic and a few queries in the course of the user experience then a dedicated DB solution is probably best.

If you are deploying your app on a client system you can include the lite version of MSSQL, online obviously depends on the host.
 
Last edited:
I've been reading around.

I like the look of MS SQL Express, which is free. The only problem is that it has a 10GB size limit...which is quite frankly laughable.

I fully expect the size of the database data to exceed 100GB..easily. The server (program) is going to be directly connected the database (initially, both will be run on the same machine). At no stage will the client connect directly to the database...only the server will ever connect to the database itself.

What is the maximum size of database which SQLITE can handle?

I'm thinking of using SQLITE, with a GUI for it, to create the tables. I shall then use the server (programmed in .NET), to connect to the database and read from and write to the database file. Does this sound like a decent plan?
 
Yeah I'd be careful if you are storing that much data, although that depends on what you are storing.

Be careful that you are only storing data you need, and that it is carefully designed otherwise you will find database operations will be hugely expensive.
 
Its going to be all text. At this point, the data contained in the database shall be all text. Lots and lots of text.

In time (may be in a few years), I plan to be storing other file types (most certainly binary).

I'm creating a Natural language processing AI, which is able to learn English Natural Language. To give you an idea of the amount of text which can be generated...in a 10 minute conversation with AI, it can quite easily save the text files of size > 2MB. This is with a single client, storing/processing text only. No images yet.

The plan is to have 100s of 1000s of clients hooked up to this thing. 100GB will be filled with ease. I shall also be creating an algorithm which discards the least used data, so that only the top x number of files shall be stored on the database.
 
Be careful that you are only storing data you need, and that it is carefully designed otherwise you will find database operations will be hugely expensive.

Yep. During testing, I shall iron out any issues I have with slowness. I am currently storing all data in multiple text files. The most regularly used info is cached in RAM. I spent a month developing the caching algorithm and it works beautifully.

It was only yesterday, while talking to someone, they suggested that I use a database.

What I like about using text files (instead of a database), is that it gives me much flexibility. I can use threads to read each file...eg. if I have 100 text files of size 100MB each, I can have 100 separate threads all reading data at the same time. In practise, I have got this working nicely. However, I was wondering that perhaps if I use a formal database, the speed may increase further, especially when I start reaching file sizes which are very large.

I've just read that sqlite is fine with smaller databases, but when you hit say 3M+ records, there are issues.

I'm now looking into mysql.

Here's a question. Google use databases and they are able to search masses of data within milliseconds. How do they achieve this, if people are scoffing at 100GB of text? I mean seriously 100GB is child's play. Its quite possible that my server will have text files totaling over 1TB, within a year or so...especially if I decide to deal with Spanish language.
 
For an A.I. training algorithm, I would recommend something like Firebird: http://www.firebirdfaq.org/faq59/; as outlined in its FAQ, and indeed true for most modern, robust RDBMS systems, you will be hitting the file size limit of the file system in question in use way before you exhaust the database constraints. Good thing about Firebird is that it is light weight, has good performance and you still get nice kinks like advanced data compression, backups and indexing. Check it out.
 
Yep. During testing, I shall iron out any issues I have with slowness. I am currently storing all data in multiple text files. The most regularly used info is cached in RAM. I spent a month developing the caching algorithm and it works beautifully.

It was only yesterday, while talking to someone, they suggested that I use a database.

What I like about using text files (instead of a database), is that it gives me much flexibility. I can use threads to read each file...eg. if I have 100 text files of size 100MB each, I can have 100 separate threads all reading data at the same time. In practise, I have got this working nicely. However, I was wondering that perhaps if I use a formal database, the speed may increase further, especially when I start reaching file sizes which are very large.

I've just read that sqlite is fine with smaller databases, but when you hit say 3M+ records, there are issues.

I'm now looking into mysql.

Here's a question. Google use databases and they are able to search masses of data within milliseconds. How do they achieve this, if people are scoffing at 100GB of text? I mean seriously 100GB is child's play. Its quite possible that my server will have text files totaling over 1TB, within a year or so...especially if I decide to deal with Spanish language.

you could have a look in to search engines such as lucene and see if you could modify this to suit your needs. It may well have a lot of the functionality you need already .

It uses text based indexes, similar to google i believe.
 
Last edited:
you could have a look in to search engines such as lucene and see if you could modify this to suit your needs. It may well have a lot of the functionality you need already .

It uses text based indexes, similar to google i believe.

Thank you.
 
For an A.I. training algorithm, I would recommend something like Firebird: http://www.firebirdfaq.org/faq59/; as outlined in its FAQ, and indeed true for most modern, robust RDBMS systems, you will be hitting the file size limit of the file system in question in use way before you exhaust the database constraints. Good thing about Firebird is that it is light weight, has good performance and you still get nice kinks like advanced data compression, backups and indexing. Check it out.

I've done a few hours of reading on firebird. It seems pretty good. Its performance is decent and scalable.

I downloaded all the files. I spent an hour trying to register the server...unfortunately it was throwing up all sorts of errors, so I shall be giving mysql a go. This database has more support so if I do run into problems, the support should be better.

Given that I've had this problem other collaborators who are working with me are also likely to have similar problems, so this is definitely a no-go.

Thank you for suggesting firebird...it would've been great if it had worked, but unfortunately it didnt.
 
Okay, I've had a thought here.

I was looking at some comparisons with Sqlite vs mysql. It would seem that for smaller DBs sqlite is faster.

Also, apparently sqlite is just a file. This is why it is so lightweight (a good thing).

So, I was thinking, is it possible to run multiple sqlite databases on a single machine? This way, I could run a separate database for each table. Of course I would lose the ability to relate these tables to one another, however, I can easily do this in my AI program itself.

If I arrange the DBs in this fashion, my server could use multiple threads to access each table, simultaneously, without needing to queue up (if the sqlite DB is being written to), as the locks will be minimal.

So for example,
we have 1 database called DB1, another called DB2.....DB1000, say. Each database is a separate table. Could I run multiple databases, in this fashion using sqlite?
 
That's a hell of a project you've got there sunama. :p

Way above anything I'll do, but choose your DB wisely and good luck with it. :cool:

Thanks roger.

I was writing down the concepts for the program 18 months ago, but only started coding in February (I think). So its been a tad over 6 months. Summer 2011 will be the make or break time for this.

The direction I take now, regarding the database will dictate how this program ultimately moves forward.
 
100GB of data and you weren't even convinced you might need a real database!?

Nope.

Multiple text files are able to do the job. Its just a question of how fast/slow they would be.

Even now, I am deliberating on whether to go with a proper database as I am currently getting the job done pretty well by using caching algorithms and multiple text files. If I use a database, then my caching algorithm will be bypassed and performance may decrease.
 
Last edited:
Back
Top Bottom