C strings

Associate
Joined
18 Mar 2007
Posts
291
Hi guys, just after a bit of advice as to how to go about this problem, not after a solution.

What I have to do is read in a text file and put the characters in it into strings. however, when there is a comma, i need to put it in a new string and i need to ignore whitespace and punctuation.

Probably a bad example, so here's an example:

The dog,sat, on, the.

Would be:

string1: "The"
string2: "dog"
string3: "sat"
string4: "on"
string5: "the"

However, what I don't know is how many characters each string will be.

To do the string allocation, would I have to assign it once I know how many characters there are, i.e: char string1[NO_OF_CHARACTERS]?

I'm assuming my best option would be to use fgetc and putting it in a while function, saying while(!ispunct(fgetc(file)) to ignore the punctuation and doing another one saying while(!isspace) etc...?

Just trying to brainstorm a bit here. Any advice would be very helpful.

Cheers.
 
I'd look at the fscanf function, it takes a 'format' variable much like printf, with a nice ability for a string to assign a buffer of the correct size internally and just set a provided pointer to point to that buffer.

Not sure it would be able to handle the parsing as well, but you could use it to read a line and then parse that yourself using the functions you mentioned already, 'man fscanf' will obviously give more detail on what it can do.
 
hey guys,

using strtok, can i ignore whitespace?

thanks

Not sure tbh. Worst case scenario is that you can trim out the whitespace after the tokenisation.

In other words, if you have "token1 ,token2 , token3", it will split it into:

"token1 "
"token2 "
" token3"

...which you can trim individually.
 
You also need to think about how your going to set up buffers to hold the string that your going to be tokenising. If the text file is multi-row then the lines may be of differing size - so you could set a fixed buffer size of say 100 characters or you could dynamically make the buffer after the length of the row is known (you could use getc until a \n is found and increment a counter on every getc - when a new line is hit, seek back the value of the counter and read the line into a buffer which you have created with counter size). Using the method though, your sacrificing processor time for memory usage so it depends which resource is more abundant in your system
 
hey guys, what i've got so far is this:

Code:
#include <string.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

int main(void) {
    
    FILE *fp;
	char delimiters[] = " ,";
	char string[10], *string1, *string2, *string3, *string4;
	int i=0,j;

    if ((fp=fopen("test.txt", "r"))==NULL) {
	    printf("Error opening file");
        exit(1);
    }
    
	while(1) {
		string[i]=fgetc(fp);
		
		if(feof(fp)){
			break;
		}
		i++;
	}
	
	for(j=0; j<i; j++) {
            printf("%c", string[j]);
    }
    
    printf("\n");


	string1 = strtok (string, delimiters);
	printf("%s\n", string1);
	string2 = strtok (NULL, delimiters);
	printf("%s\n", string2);
	string3 = strtok (NULL, delimiters);
	printf("%s\n", string3);
	string4 = strtok (NULL, delimiters);
	printf("%s\n", string4);

	return 0;
}

however, i know there is going to be a maximum of 4 strings per line, but there could be less than 4 strings.

the results returned by the file containing:
Code:
"The,        cat, is"

are:

Code:
The
cat
is
,


i.e: it has a ',' at the end :confused: any ideas?

i realise that this would only be a solution for one line and will need to work with more lines in the future, any tips?
 
Code:
    char *result = NULL;   
    result = strtok( string, " ," );
    while( NULL != result ) 
    {
         printf(%s\n", result)
         result = strtok( NULL, " ," );
    }

This would be more efficient if you didnt need to store the strings. It would also work for any length line and not just 4.
 
i do need to store the strings, but just from one line at a time. what i'm planning on doing is storing the strings line-by-line and performing checks on each string. for example, if string1 contains a number, post an error message and quit etc. however, this is for later on.

thanks.
 
A better way than your current method would be to use an array of char pointers then, this would mean you dont need to declare each individual char * like string1, string 2 etc. It would also mean you could advance the string being written to with the ++ operator on the char* array in each iteration of the loop. Let me know how you get on. Using the method you use without a loop checking for NULL will mean if the file is only 3 strings long and you try and tokenise a forth time you will get a seg fault.
 
Basically instead of having string1 and string2, you could have:

char **char;

char = malloc(2 * sizeof(char*));

That gives you the ability to store 2 strings, basically a 2-dimensional array.

so string[0] would be the first string, and string[1] would be the second.
 
In context basically something like this ( sorry its a bit rushed but am just leaving work :/ )

Code:
#include "string.h"
#include "stdio.h"

int main( int argc, const char* argv[])
{
    char myLongStringToTokenise[] = "This, can, now, be, variable, length";
    char *charPointerArray[10];//We can hold 10 strings here
    char **ptrChar = charPointerArray;//Points to char* 1 in array now

    char *result = NULL;
    result = strtok (myLongStringToTokenise, ",");
    while ( NULL != result )
    {
        *(ptrChar++) = result;
        result = strtok( NULL, " ,");
    }

    //Print charPointerArray here if you want to check it

    return 0;

}
 
ahh thats great thanks lads.

also, does anyone have a good tutorial on pointers in C? i start getting a bit lost at pointers to pointers i.e: ** and just pointers for storing strings etc.

thanks
 
Pointers dont store strings, they point to a location in memory where the string resides. Thats all a pointer is, a pointer to where something is found. Take a look at this little guide, I think its pretty useful ( although its aimed at c++ ) most of it applies to c.

http://www.cplusplus.com/doc/tutorial/pointers.html

Edit - Sorry in my program above I wrote a comment syaing we could store 10 strings here, I meant 10 pointers to strings.
 
ahh ok brilliant mate, i'm gunna have a good read over that and a tutorial over at gamedev tomorrow and see what i can come up with.

thanks for your help.

expect me back tomorrow!
 
Back
Top Bottom