
|
View Full Version : Is my regular expression right?
eagleknight 10-13-2004, 04:06 PM Is this regular expression to check an email address right?
"^[a-zA-Z0-9][\w\.-][a-zA-Z0-9]@[a-zA-Z0-9][\w\.-][a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$"
BigBison 10-13-2004, 06:37 PM $email !~ /^.+\@(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?)$/)
There are many existing regular expressions that do this and have been tested. Try a site like scriptsearch.internet.com -- search for "is valid email address" or "email address validator" and you'll come up with plenty of choices, many of which are rated by others.
The perl script I got the above regular expression from is here:
http://www.cgiscript.net/cgi-script/csNews/csNews.cgi?database=perl%2edb&command=viewone&id=4&op=t
There are a couple handy utilities I know of which assist in developing regular expressions:
http://www.weitz.de/regex-coach/
http://www.regular-expressions.info/regexbuddy.html
Dan L 10-13-2004, 07:14 PM if(!checkdnsrr($email,"MX") { die('Invalid e-mail address.'); }
:)
eagleknight 10-13-2004, 08:34 PM Yes, but this is also for a web class so I had to write it hehe.
Originally posted by eagleknight
Yes, but this is also for a web class so I had to write it hehe.
Then score a high grade by pointing out to the teacher that the language of email addresses is not regular. I.e. you cannot write a single regular expression that can check whether or not an email address is valid (i.e. conforms to the syntax).
BigBison 10-13-2004, 09:52 PM Originally posted by jks
Then score a high grade by pointing out to the teacher that the language of email addresses is not regular. I.e. you cannot write a single regular expression that can check whether or not an email address is valid (i.e. conforms to the syntax).
Want extra credit? Figure out the maximum allowed length, in characters, of an e-mail address. If I were teaching, I'd assign that problem at the beginning of the semester and see if anyone gets it right by the end! :D
Originally posted by BigBison
Want extra credit? Figure out the maximum allowed length, in characters, of an e-mail address. If I were teaching, I'd assign that problem at the beginning of the semester and see if anyone gets it right by the end! :D
There's no real "maximum allowed length" of e-mail addresses.
However, according to RFC2821 (SMTP) mailservers should support at least a length of 64 + 255 +1 = 320 characters. That's a local part, a domain and a @ character.
Email addresses longer than that should be avoided. However, as the RFC states, this is not always possible.
BigBison 10-13-2004, 10:44 PM Every time I look at an email-address validator script, I wonder about setting an upper limit. I gave up before determining whether there is or isn't a theoretical maximum length. RFC 2821 (http://www.faqs.org/rfcs/rfc2821.html) isn't the only place to look, though -- check out sections 2.3.4 - 3.1 of RFC 1035 (http://www.faqs.org/rfcs/rfc1035.html), particularly:
To simplify implementations, the total length of a domain name (i.e., label octets and label length octets) is restricted to 255 octets or less.
Now, there are two things I've never been able to figure out. Before I get to those, I'd like to mention that I vaguely understand the octet notion. I do know this length-in-octets issue is why domain names are case-insensitive.
Mystery #1: Is that 255octets-total.com, or 255octets.255octets.255octets.255octets.255octets.com, ad infinitum?
Mystery #2: Is that 64 characters to the left of the @ sign a hard-and-fast rule?
The reason I bring all this up, is it's always seemed to me that a webform could be used as a DOS attack, by sending ridiculously long email addresses to the server (which then may attempt to look them up) which are nonetheless deemed valid by my website's scripts.
[QUOTE]Originally posted by BigBison
Every time I look at an email-address validator script, I wonder about setting an upper limit. I gave up before determining whether there is or isn't a theoretical maximum length.
This is computers. There's always a theoretical maximum length. Email addresses are finite in length :-)
the only place to look, though -- check out sections 2.3.4 - 3.1
I don't see that in conflict with what I wrote before in any way.
They say that a domain name is restricted to 255 octets. The calculation I used in my previous reply was also 255.
You must note, that an email address it not _just_ the domain name. The part before the @ can be much larger than the domain name in extreme cases. Normally it is not longer than 64 octets, however.
Now, there are two things I've never been able to figure out. Before I get to those, I'd like to mention that I vaguely understand the octet notion. I do know this length-in-octets issue is why domain names are case-insensitive.
That is incorrect. "Octets" have nothing to do with with domain names being case sensitive or not.
An octet is simply 8 bits.
Some times people talk about bytes. Some times they talk about characters (or chars). These are normally also just 8 bits.
However, there are some exceptions (for example a character in a multi-byte character-set would be more than 8 bits; similarly a byte on some ancient computer architectures could be 9 bits, etc.). This is why some RFC writers stick with the notion of "octets", so that everyone agrees on the fact that they are 8 bits.
Mystery #1: Is that 255octets-total.com, or 255octets.255octets.255octets.255octets.255octets.com, ad infinitum?
That is the 255 octets in total. Not ad infinitum.
Mystery #2: Is that 64 characters to the left of the @ sign a hard-and-fast rule?
As I explained in my previous reply, it's not a hard-and-fast rule. The RFC simply states that systems should be able to support _at least_ 64 characters. If someone wants to use more than 64, that's fine. He should just be able to cope with the fact, that some mail servers would reject the email.
The reason I bring all this up, is it's always seemed to me that a webform could be used as a DOS attack, by sending ridiculously long email addresses to the server (which then may attempt to look them up) which are nonetheless deemed valid by my website's scripts.
Then you have bad programming.
Set your limit at for example 320 bytes, and you should be reasonable safe. If you want to be nice to everyone, set it to 1 KB.
Even at 1 KB, you're going to have to submit a _lot_ of web forms at once to be able to take up a substantial amount of memory, now that web servers commonly have hundreds of megabytes of memory.
BigBison 10-14-2004, 12:36 AM Thanks for the elaboration, jks!
|