ChatGPT made an email validity function - what do you think?

Joined
12 Feb 2006
Posts
17,355
Location
Surrey
i've edited this slightly, but in general this was made by chatGTP.

I wanted a php function that checked not just if the format of an email address was valid, but check for any signs the email was fake, such as too short, fake extensions etc.

I thought i'd share it to see what others thought, along with also if others could see room for improvement.

Ignore any place it echos. this is just for testing so I can see how the emails are getting their rating.

PHP:
function checkEmailValidity($email) {
    $validityRating = 10;
    
    $isShortUsername = 0;
    $isFreeDomain = 0;
    $isUnknownDomain = 0;
    $isShortDomain = 0;
    $isSuperShortUsername = 0;
    $isNonExtension = 0;


    // Remove whitespace and sanitize the email address
    $email = filter_var(trim($email), FILTER_SANITIZE_EMAIL);

    // Validate the email address
    if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
        return $validityRating; // Invalid email address
    }

    // Get the username, domain and extension from the email address
    list($username, $fullurl) = explode('@', $email);
    list($domain, $extension) = explode('.', $fullurl);

    // Popular free email domains list
    $popularFreeEmailDomains = [
        'gmail',
        'yahoo',
        'ymail',
        'hotmail',
        'gmx',
        'googlemail',
        'btinternet',
        'live',
        'outlook',
        'mail',
        'protonmail',
        'mac',
        'me',
        'yandex',
        'aol',
        'icloud',
        'fastmail',
        'qq',
        'rocketmail',
        'zoho',
        'inbox',
        'ntlworld',
        'btopenworld',
        'sky',
    ];
    
   // Popular extensions list
    $popularExtensions = [
        'com',
        'co',
        'uk',
        'net',
        'it',
        'org',
        'info',
        'me',
        'io',
        'gov',
        'edu',
        'biz',
        'dk',
    ];   

    // Check if the domain is from a popular free email service
    if (in_array($domain, $popularFreeEmailDomains)) {
        $isFreeDomain = 1;
    } else {
        $validityRating -= 3;
        $isUnknownDomain = 1;
        echo "<br />unknown domain $validityRating";
    }
    // Check if the extension is not a common one
    if (!in_array($extension, $popularExtensions)) {
        $validityRating -= 2;
        $isNonExtension = 1;
        echo "<br />extension is not common one $validityRating";
    }


    // Check if the username length is shorter than 4 characters
    if (strlen($username) < 4) {
        $validityRating -= 1;
        $isShortUsername = 1;
        echo "<br />short username $validityRating";
    }
    
    // Check if the username length is super short, less than 3 characters
    if (strlen($username) < 3) {
        $validityRating -= 3;
        $isSuperShortUsername = 1;
        echo "<br />super short username $validityRating";
    }
    
    // Check if the domain length is shorter than 4 characters
    if (strlen($domain) < 5 && $isFreeDomain != 1) {
        $validityRating -= 3;
        $isShortDomain = 1;
        echo "<br />domain short and not free domain $validityRating";
    }
    
    // Check if the username is short and free domain
    if ($isShortUsername == 1 && $isFreeDomain == 1) {
        $validityRating -= 5;
        echo "<br />domain short and is free domain $validityRating";
    }
    
    // Check if the username is super short and free domain
    if ($isSuperShortUsername == 1 && $isFreeDomain == 1) {
        $validityRating -= 9;
        echo "<br />username supershort and is free domain $validityRating";
    }
    
    // Check if the username is short and not free domain
    if ($isSuperShortUsername == 1 && $isFreeDomain != 1) {
        $validityRating -= 3;
        echo "<br />username super short and not free domain $validityRating";
    }
    
    
    
        // Check if the username is only numbers
    if (ctype_digit($username)) {
        $validityRating -= 5;
        echo "<br />username is just numbers $validityRating";
    }

    return max(0, $validityRating);
}
 
But not to pee on your bonfire though, the conditions are a bit arbitrary and don't mean much when they fall outside of spec, ie - the local-part ("username" in your code) can be 1-64 characters, checking if it's less than 4 characters or that it is made entirely of integers (which is completely valid) is a bit meaningless.
I've got an email address along the lines of '[email protected]' and that function would suggest it's 'fake
It's based on the last 5 years of information I've collected for each customer that gets a quote with us.

I put in a script to show me the stats of all the unique email addresses, the shortest usernames etc, and this chatgpt function does a good job of giving a correct trust rating for almost all emails we have.

[email protected] would have a low trust rating as its unusual, but it comes down to admin to decide then if it appears real. But very very few customers use an email address that isn't one of the free ones. It's most likely that the free one is a legit email over one that isn't @gmail, @hotmail.

The issue we had was that lots of people just type [email protected]

Or some thing like [email protected]

This function doesn't do anything other than tell a human what it thinks of the email address so that the human can then make a better decision and quicker.

Another addition I was thinking to add was to count the frequency of each character in the username and say if it's for a "free email provider" and the username is made up entirely of the same letter and there's more than x amount of letters, give a low trust rating. To mark something like [email protected] as low trust.
 
As a customer. I do not mind the extra steps when dealing with online.

I'd likely be questioning why there isn't barriers in place.

I'd be careful it isn't more about what you dislike.

Food for thought.
People leave websites due to the cookie content form if they are not bothered to be there that much, or the disable ads pop up.

The fact the someone puts in fake email addresses is proof that people may not already feel comfortable giving an email address to get a quote.

I know I would not give my email address to see prices of a physical item. As soon as I see that give us your email form I go to another site as in just looking to buy some toilet cleaner ffs.
 
1) Performed a regex/pattern validation for nonsense email addresses. i.e asdfghj, jkjkjkjk etc

It is something I'd like to add, but felt that perhaps there's just too many endless possibilities of nonsense? Funny though that the 2 you've given are some sort of variety of the 2 most popular nonsense address that we get.

2) Opened an SMTP connection to the mailserver and performed a VRFY against the supplied address
Apparently this isn't a reliable option as domains will block this to stop spam companies checking emails to send junk to.
 
So if you think it's fake, you don't follow up?
if I think it's fake then yeah i'd not send a follow up. if this function thinks it's fake, i'd look at the email address and decide myself, but the idea is that if the rating of this function is that 0/10 rating would mean we'd not have to double check as it must be fake e.g. [email protected], 1-3 most likely fake, 4-6 a human should check, 7-10 assume real.

in time it may be scoring needs tweaking, but so far using it on existing quotes it's working out well.

it could also be that the system auto sends emails for follow up emails unless the rating is below say 4, and then a human has to check.
do you use a different method to try and contact them
not yet as if the email is fake then the phone number probably is too, but maybe in the future i it was felt to be beneficial.
Does a follow up email take you a long time
nope, a couple clicks
Is entering an email mandatory
yes
 
Back
Top Bottom