Best way to remove unicode Char in text input

Soldato
Joined
27 Mar 2003
Posts
2,710
Hi Guys,

just wondering what the best way is for removing/ stripping out unicode characters. The reason I ask is that it is causing some issues with saving information in a particular way.

Currently I am using this code:

char[] stringconvertor = new char[TextBox1.Text.Length];
string returnstring = "";
stringconvertor = TextBox1.Text.ToCharArray();

for(int loop = 0;loop < TextBox1.Text.Length;loop++)
{

if (stringconvertor[loop] <= 127)
{
returnstring = returnstring+ stringconvertor[loop].ToString();
}

}
TextBox2.Text = returnstring;

now I have found that running this code it takes around 4 minutes to go through about 200,000 characters which is obviously unacceptable for users so is there a better way.
 
Code:
char[] stringconvertor = new char[TextBox1.Text.Length];
string returnstring = "";
stringconvertor = TextBox1.Text.ToCharArray();

for(int loop = 0;loop < TextBox1.Text.Length;loop++)
{

    if (stringconvertor[loop] <= 127)
    {
         returnstring = returnstring+ stringconvertor[loop].ToString();
    }

}

TextBox2.Text = returnstring;
 
Please don't do successive string concatenation like that :(

Use a StringBuilder:
Code:
string input = TextBox1.Text;
StringBuilder stringBuilder = new StringBuilder(input.Length);

for(int i = 0; i < input.Length; i++)
{
    if (input[i] < 127)
    {
         stringBuilder.Append(input[i]);
    }
}

TextBox2.Text = stringBuilder.ToString();
This should fix the long execution times.

Alternatively, get .NET to do it for you:
Code:
Encoding originalEncoding = Encoding.Unicode;
Encoding targetEncoding = Encoding.ASCII;

byte[] inputBuffer = originalEncoding.GetBytes(TextBox1.Text);
byte[] outputBuffer = Encoding.Convert(originalEncoding, targetEncoding, inputBuffer);
TextBox2.Text = targetEncoding.GetString(outputBuffer);
This will result in all non-ASCII characters being replaced by question marks.
 
Last edited:
i wasn't aware that string concatination was bad. I am still learning c# and although I have learnt a lot I am still doing things that way I probably would when I started out learning other languages such as VB and c++.

I shall try them out and see how much of an effect performance.

edit: well just tried the first example you gave me and it has just flown through a 13million character string in just over 90 seconds. :D
 
Last edited:
The reason your original algorithm is so slow is that strings are immutable (i.e. they can't be changed). So when you concatenate the two strings, .NET has to create a new string entirely to store the result, and the old one gets thrown out of the window. This is fine when you're doing it just a few times, but when you're doing it as many times as you are, it can adversely affect performance. Also, the fact that the string is getting longer each time you do it isn't helping ;)

More info here:
http://www.yoda.arachsys.com/csharp/stringbuilder.html

And lots more very useful and interesting articles:
http://www.yoda.arachsys.com/csharp/index.html
 
Last edited:
Back
Top Bottom