ChatGPT made an email validity function - what do you think?

Joined
12 Feb 2006
Posts
17,227
Location
Surrey
i've edited this slightly, but in general this was made by chatGTP.

I wanted a php function that checked not just if the format of an email address was valid, but check for any signs the email was fake, such as too short, fake extensions etc.

I thought i'd share it to see what others thought, along with also if others could see room for improvement.

Ignore any place it echos. this is just for testing so I can see how the emails are getting their rating.

PHP:
function checkEmailValidity($email) {
    $validityRating = 10;
    
    $isShortUsername = 0;
    $isFreeDomain = 0;
    $isUnknownDomain = 0;
    $isShortDomain = 0;
    $isSuperShortUsername = 0;
    $isNonExtension = 0;


    // Remove whitespace and sanitize the email address
    $email = filter_var(trim($email), FILTER_SANITIZE_EMAIL);

    // Validate the email address
    if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
        return $validityRating; // Invalid email address
    }

    // Get the username, domain and extension from the email address
    list($username, $fullurl) = explode('@', $email);
    list($domain, $extension) = explode('.', $fullurl);

    // Popular free email domains list
    $popularFreeEmailDomains = [
        'gmail',
        'yahoo',
        'ymail',
        'hotmail',
        'gmx',
        'googlemail',
        'btinternet',
        'live',
        'outlook',
        'mail',
        'protonmail',
        'mac',
        'me',
        'yandex',
        'aol',
        'icloud',
        'fastmail',
        'qq',
        'rocketmail',
        'zoho',
        'inbox',
        'ntlworld',
        'btopenworld',
        'sky',
    ];
    
   // Popular extensions list
    $popularExtensions = [
        'com',
        'co',
        'uk',
        'net',
        'it',
        'org',
        'info',
        'me',
        'io',
        'gov',
        'edu',
        'biz',
        'dk',
    ];   

    // Check if the domain is from a popular free email service
    if (in_array($domain, $popularFreeEmailDomains)) {
        $isFreeDomain = 1;
    } else {
        $validityRating -= 3;
        $isUnknownDomain = 1;
        echo "<br />unknown domain $validityRating";
    }
    // Check if the extension is not a common one
    if (!in_array($extension, $popularExtensions)) {
        $validityRating -= 2;
        $isNonExtension = 1;
        echo "<br />extension is not common one $validityRating";
    }


    // Check if the username length is shorter than 4 characters
    if (strlen($username) < 4) {
        $validityRating -= 1;
        $isShortUsername = 1;
        echo "<br />short username $validityRating";
    }
    
    // Check if the username length is super short, less than 3 characters
    if (strlen($username) < 3) {
        $validityRating -= 3;
        $isSuperShortUsername = 1;
        echo "<br />super short username $validityRating";
    }
    
    // Check if the domain length is shorter than 4 characters
    if (strlen($domain) < 5 && $isFreeDomain != 1) {
        $validityRating -= 3;
        $isShortDomain = 1;
        echo "<br />domain short and not free domain $validityRating";
    }
    
    // Check if the username is short and free domain
    if ($isShortUsername == 1 && $isFreeDomain == 1) {
        $validityRating -= 5;
        echo "<br />domain short and is free domain $validityRating";
    }
    
    // Check if the username is super short and free domain
    if ($isSuperShortUsername == 1 && $isFreeDomain == 1) {
        $validityRating -= 9;
        echo "<br />username supershort and is free domain $validityRating";
    }
    
    // Check if the username is short and not free domain
    if ($isSuperShortUsername == 1 && $isFreeDomain != 1) {
        $validityRating -= 3;
        echo "<br />username super short and not free domain $validityRating";
    }
    
    
    
        // Check if the username is only numbers
    if (ctype_digit($username)) {
        $validityRating -= 5;
        echo "<br />username is just numbers $validityRating";
    }

    return max(0, $validityRating);
}
 
Soldato
Joined
3 Jun 2005
Posts
3,068
Location
The South
If you deem those conditions meet your requirements then that's ok.
But not to pee on your bonfire though, the conditions are a bit arbitrary and don't mean much when they fall outside of spec, ie - the local-part ("username" in your code) can be 1-64 characters, checking if it's less than 4 characters or that it is made entirely of integers (which is completely valid) is a bit meaningless.
I've got an email address along the lines of '[email protected]' and that function would suggest it's 'fake'.

You're better off querying the mail server attached to the domain for validity and/or checking the address/domain/mail server IP against known spam lists. Alternatively use a service like Hunter.
 
Associate
Joined
29 Oct 2008
Posts
1,005
As suggested above, use an existing service if you need to prove its validity. Or better still send an email to get the user to 'confirm' email address ownership - at which point it has to be real. Use a basic regex check on the email with this too.
 
Soldato
OP
Joined
12 Feb 2006
Posts
17,227
Location
Surrey
But not to pee on your bonfire though, the conditions are a bit arbitrary and don't mean much when they fall outside of spec, ie - the local-part ("username" in your code) can be 1-64 characters, checking if it's less than 4 characters or that it is made entirely of integers (which is completely valid) is a bit meaningless.
I've got an email address along the lines of '[email protected]' and that function would suggest it's 'fake
It's based on the last 5 years of information I've collected for each customer that gets a quote with us.

I put in a script to show me the stats of all the unique email addresses, the shortest usernames etc, and this chatgpt function does a good job of giving a correct trust rating for almost all emails we have.

[email protected] would have a low trust rating as its unusual, but it comes down to admin to decide then if it appears real. But very very few customers use an email address that isn't one of the free ones. It's most likely that the free one is a legit email over one that isn't @gmail, @hotmail.

The issue we had was that lots of people just type [email protected]

Or some thing like [email protected]

This function doesn't do anything other than tell a human what it thinks of the email address so that the human can then make a better decision and quicker.

Another addition I was thinking to add was to count the frequency of each character in the username and say if it's for a "free email provider" and the username is made up entirely of the same letter and there's more than x amount of letters, give a low trust rating. To mark something like [email protected] as low trust.
 
Associate
Joined
24 Jun 2022
Posts
117
Location
Manchester
As a customer. I do not mind the extra steps when dealing with online.

I'd likely be questioning why there isn't barriers in place.

I'd be careful it isn't more about what you dislike.

Food for thought.
 
Joined
1 Oct 2006
Posts
13,900
If you really wanted to validate an email address I'd have a script that:

1) Performed a regex/pattern validation for nonsense email addresses. i.e asdfghj, jkjkjkjk etc.

Or if you wanted to get fancy:

1) Did an MX record lookup for the domain
2) Opened an SMTP connection to the mailserver and performed a VRFY against the supplied address

You should be able to code something that watches the email address field for a complete email address, and goes off to validate it behind the scenes whilst the customer fills out the rest of the form.
 
Soldato
OP
Joined
12 Feb 2006
Posts
17,227
Location
Surrey
As a customer. I do not mind the extra steps when dealing with online.

I'd likely be questioning why there isn't barriers in place.

I'd be careful it isn't more about what you dislike.

Food for thought.
People leave websites due to the cookie content form if they are not bothered to be there that much, or the disable ads pop up.

The fact the someone puts in fake email addresses is proof that people may not already feel comfortable giving an email address to get a quote.

I know I would not give my email address to see prices of a physical item. As soon as I see that give us your email form I go to another site as in just looking to buy some toilet cleaner ffs.
 
Soldato
OP
Joined
12 Feb 2006
Posts
17,227
Location
Surrey
1) Performed a regex/pattern validation for nonsense email addresses. i.e asdfghj, jkjkjkjk etc

It is something I'd like to add, but felt that perhaps there's just too many endless possibilities of nonsense? Funny though that the 2 you've given are some sort of variety of the 2 most popular nonsense address that we get.

2) Opened an SMTP connection to the mailserver and performed a VRFY against the supplied address
Apparently this isn't a reliable option as domains will block this to stop spam companies checking emails to send junk to.
 
Soldato
OP
Joined
12 Feb 2006
Posts
17,227
Location
Surrey
So if you think it's fake, you don't follow up?
if I think it's fake then yeah i'd not send a follow up. if this function thinks it's fake, i'd look at the email address and decide myself, but the idea is that if the rating of this function is that 0/10 rating would mean we'd not have to double check as it must be fake e.g. [email protected], 1-3 most likely fake, 4-6 a human should check, 7-10 assume real.

in time it may be scoring needs tweaking, but so far using it on existing quotes it's working out well.

it could also be that the system auto sends emails for follow up emails unless the rating is below say 4, and then a human has to check.
do you use a different method to try and contact them
not yet as if the email is fake then the phone number probably is too, but maybe in the future i it was felt to be beneficial.
Does a follow up email take you a long time
nope, a couple clicks
Is entering an email mandatory
yes
 
Man of Honour
Joined
19 Oct 2002
Posts
29,525
Location
Surrey
Because it doesn't work for customers. They won't do that. It's a barrier at the end of the day. Creates and extra step for legit customers and I don't like that. I don't have the luxury of being that required
The flip side of this is if I were a customer signing up for a service that didn't send me an email to validate then I would question what other security your company had skipped.
 
Soldato
Joined
3 Jun 2005
Posts
3,068
Location
The South
Apparently this isn't a reliable option as domains will block this to stop spam companies checking emails to send junk to.
Grey and blacklisting are your main issues with poking mailservers but as you say, it still doesn't guarantee the email address/mailbox exists and valid.
I think grabbing the MX record, or at least A/AAAA records, would help with determining if the domain is legit (do this for ones not on your free provider list perhaps) and then you could always manually (in the office) check email addresses (against email address checkers) for ones you're not 100% sure about.

Alternatively follow up with generic templated emails (attempt to make it a bit more personal with their details) and if you get a reply then you know it's more likely to be legit - it could work and may help avoid wrongfully discarding a legitimate quote.
 
Back
Top Bottom