

Did you mean also Validating Email Addresses, including the comments. This allows for a potentially more pleasing experience, like A state engine for the purpose can both validate and even correct e-mail addresses that would otherwise be considered invalid as it disassembles the e-mail address according to each RFC. The problem with regular expressions is that telling someone that their perfectly valid e-mail address is invalid (a false positive) because your regular expression can't handle it is just rude and impolite from the user's perspective. A regular expression can only act as a rudimentary filter. If you want to get fancy and pedantic, implement a complete state engine. It isn’t even smart enough to handle even RFC 822, let alone RFC 5322. That is no better than all the other non-RFC patterns. There is some danger that common usage and widespread sloppy coding will establish a de facto standard for e-mail addresses that is more restrictive than the recorded formal standard. After all, anybody can put down and that will even parse as legal, but it isn't likely to be the person at the other end.įor PHP, you should not use the pattern given in Validate an E-Mail Address with PHP, the Right Way from which I quote: This is why most mailing lists now use that mechanism to confirm sign-ups. Fixing that requires a fancier kind of validation that involves sending that address a message that includes a confirmation token meant to be entered on the same web page as was the address.Ĭonfirmation tokens are the only way to know you got the address of the person entering it. People sign others up to mailing lists this way all the time. It's also important to understand that validating it per the RFC tells you absolutely nothing about whether that address actually exists at the supplied domain, or whether the person entering the address is its true owner. However, if you are forced to use one of the many less powerful pattern-matching languages, then it’s best to use a real parser. Python and C# can do that too, but they use a different syntax from those first two. in PHP) can correctly parse RFC 5322 without a hitch. The more sophisticated patterns in Perl and PCRE (regex library used e.g.

(Scrape the rendered version, not the markdown, for actual is diagram of finite state machine for above regexp which is more clear than regexp itself The rest of it appears to be consistent with the RFC 5322 grammar and passes several tests using grep -Po, including cases domain names, IP addresses, bad ones, and account names with and without quotes.Ĭorrecting the 00 bug in the IP pattern, we obtain a working and fairly fast regex. One RFC 5322 compliant regex can be found at the top of the page at but uses the IP address pattern that is floating around the internet with a bug that allows 00 for any of the unsigned byte decimal values in a dot-delimited address, which is illegal. RFC 5322 leads to a regex that can be understood if studied for a few minutes and is efficient enough for actual use. Fortunately, RFC 822 was superseded twice and the current specification for email addresses is RFC 5322. The fully RFC 822 compliant regex is inefficient and obscure because of its length.
