SQL - problem storing Irish names

Soldato
Joined
18 Oct 2002
Posts
3,245
Location
melbourne
I've been Googling this for the past hour and can't seem to find a very simple solution to this prob.

I have a field in SQL - VARCHAR(255) with latin_swedish_ci coalition (the default). I get these odd chars when posting strings with apostrophes or accents.

Example. O'Brien is stored as O\'Brien
Siobhán is stored as SiobhÃn
Éire is stored as Ãire.

I've tried changing the coalition for the field but nothing seems to work.

Do I need to format the strings in PHP before inserting to SQL?

Thanks guys.
 
Example. O'Brien is stored as O\'Brien

That's the default way that PHP will store the ' character. If left unescaped, the SQL query would fail at that point as it's used to define fields.

E.g. 'SELECT * FROM `names` WHERE name=O'Brien'

If you match up the ', you'll see that you've got an odd number which it won't like.
 
Siobhán is stored as SiobhÃn
Éire is stored as Ãire.

These are PHP encoding problems I believe. Have a read of this:

http://www.phpwact.org/php/i18n/charsets

That's the default way that PHP will store the ' character. If left unescaped, the SQL query would fail at that point as it's used to define fields.

Not quite. The slash comes from PHP's old and horrible (and deprecated) Magic Quotes feature.

You can just use stripslashes blindly on your input, but if Magic Quotes gets disabled at some point in the future (or if you move to a server on which it's already disabled), you'll be removing slashes that don't need removing. A better solution is to test for Magic Quotes first, and then apply stripslashes recursively to each GPC superglobal if it's enabled:

PHP:
if (get_magic_quotes_gpc())
{
	function cleanInput($input)
	{
		if (is_array($input))
		{
			foreach ($input as &$value)
			{
				$value = cleanInput($value);
			}
			
			return $input;
		}
		else
		{
			return stripslashes($input);
		}
	}
	
	$_GET = cleanInput($_GET);
	$_POST = cleanInput($_POST);
	$_COOKIE = cleanInput($_COOKIE);
}

You can then put this code at the top of each script and it'll automatically sanitize the input as necessary.
 
These are PHP encoding problems I believe. Have a read of this:

http://www.phpwact.org/php/i18n/charsets



Not quite. The slash comes from PHP's old and horrible (and deprecated) Magic Quotes feature.

You can just use stripslashes blindly on your input, but if Magic Quotes gets disabled at some point in the future (or if you move to a server on which it's already disabled), you'll be removing slashes that don't need removing. A better solution is to test for Magic Quotes first, and then apply stripslashes recursively to each GPC superglobal if it's enabled:

PHP:
if (get_magic_quotes_gpc())
{
    function cleanInput($input)
    {
        if (is_array($input))
        {
            foreach ($input as &$value)
            {
                $value = cleanInput($value);
            }
            
            return $input;
        }
        else
        {
            return stripslashes($input);
        }
    }
    
    $_GET = cleanInput($_GET);
    $_POST = cleanInput($_POST);
    $_COOKIE = cleanInput($_COOKIE);
}
You can then put this code at the top of each script and it'll automatically sanitize the input as necessary.

Have to do something similar to this on Monday.

Code saved for later, muwaahaha! :)

Thanks
 
The problem I'm having is actually the responding server (Payment Gateway)

I send POST data to the Payment Gateway

<input type=hidden name="firstName" value="Siobhán">

and it sends it back as SiobhÃn in the response.

Also, é comes back as é, and so on.

Is it possible decode the garbled string back to UTF without have to use str_replace?
 
Last edited:
It doesn't work for some characters.

É, for example, is sent back from the payment gateway as É

echo utf8_decode("É") outputs as �? (two questions marks, not É)


What I'd really like to know is what type of encoding (or whatever) the server is sending back to me. There must be an easy way to convert back to UTF8 without having to manually str_replace everything.

I just don't know what I should be searching for on Google - searching for these weird characters is proving difficult.
 
Last edited:
I don't think this is going to work.

í, Á and Í are sent back as Ã, at least, I can't see any difference between them in Firefox.

á = á
é = é
í = Ã*
ó = ó
ú = ú

Á = Ã
É = É
Í = Ã
Ó = Ó
Ú = Ú

How odd. I just don't understand it. :confused:
 
Back
Top Bottom