Skip to content

Commit 416aeb6

Browse files
committed
Parse domain-literal addresses, give nice error messages, and add an option to permit them, but reject them by default
1 parent 18106ca commit 416aeb6

File tree

9 files changed

+175
-40
lines changed

9 files changed

+175
-40
lines changed

CHANGELOG.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@
33

44
This is a pre-release for version 2.0.0.
55

6-
There are no significant changes to which email addresses are considered valid/invalid, but there are many changes in error messages and internal improvements to the library including the addition of type annotations, and Python 3.7+ is now required.
6+
There are no significant changes to which email addresses are considered valid/invalid with default options, but there are many changes in error messages and internal improvements to the library including the addition of type annotations. New options are added to allow quoted-string local parts and domain-literal addresses, but they are off by default. And Python 3.7+ is now required.
77

88
* Python 2.x and 3.x versions through 3.6, and dnspython 1.x, are no longer supported. Python 3.7+ with dnspython 2.x are now required.
99
* The dnspython package is no longer required if DNS checks are not used, although it will install automatically.
1010
* NoNameservers and NXDOMAIN DNS errors are now handled differently: NoNameservers no longer fails validation, and NXDOMAIN now skips checking for an A/AAAA fallback and goes straight to failing validation.
1111
* Some syntax error messages have changed because they are now checked explicitly rather than as a part of other checks.
12-
* The quoted-string local part syntax (e.g. multiple @-signs, spaces, etc. if surrounded by quotes) is now parsed but not considered valid by default. Better error messages are now given for quoted-string syntax since it can be confusing for a technically valid address to be rejected, and a new allow_quoted_local option is added to allow these addresses if you really need them.
12+
* The quoted-string local part syntax (e.g. multiple @-signs, spaces, etc. if surrounded by quotes) and domain-literal addresses (e.g. @[192.XXX...] or @[IPv6:...]) are now parsed but not considered valid by default. Better error messages are now given for these addresses since it can be confusing for a technically valid address to be rejected, and new allow_quoted_local and allow_domain_literal options are added to allow these addresses if you really need them.
1313
* Some other error messages have changed to not repeat the email address in the error message.
1414
* The library has been reorganized internally into smaller modules.
1515
* The tests have been reorganized and expanded. Deliverability tests now mostly use captured DNS responses so they can be run off-line.

README.md

Lines changed: 17 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,12 @@ Key features:
1414

1515
* Checks that an email address has the correct syntax --- good for
1616
registration/login forms or other uses related to identifying users.
17-
Rejects obsolete email address syntax that you'd find unexpected.
17+
By default, rejects obsolete email address syntax that you'd find unexpected.
1818
* Gives friendly English error messages when validation fails that you
1919
can display to end-users.
2020
* Checks deliverability (optional): Does the domain name resolve?
2121
(You can override the default DNS resolver to add query caching.)
22-
* Supports internationalized domain names and internationalized local parts,
23-
and with an option deprecated quoted-string local parts.
22+
* Supports internationalized domain names and internationalized local parts.
2423
Blocks unsafe characters for your safety.
2524
* Normalizes email addresses (important for internationalized
2625
and quoted-string addresses! see below).
@@ -107,7 +106,7 @@ other information.
107106
The validator doesn't, by default, permit obsoleted forms of email addresses
108107
that no one uses anymore even though they are still valid and deliverable, since
109108
they will probably give you grief if you're using email for login. (See
110-
later in the document about that.)
109+
later in the document about how to allow some obsolete forms.)
111110

112111
The validator checks that the domain name in the email address has a
113112
DNS MX record (except a NULL MX record) indicating that it can receive
@@ -137,6 +136,8 @@ The `validate_email` function also accepts the following keyword arguments
137136

138137
`allow_quoted_local=False`: Set to `True` to allow obscure and potentially problematic email addresses in which the part of the address before the @-sign contains spaces, @-signs, or other surprising characters when the local part is surrounded in quotes (so-called quoted-string local parts). In the object returned by `validate_email`, the normalized local part removes any unnecessary backslash-escaping and even removes the surrounding quotes if the address would be valid without them. You can also set `email_validator.ALLOW_QUOTED_LOCAL` to `True` to turn this on for all calls by default.
139138

139+
`allow_domain_literal=False`: Set to `True` to allow bracketed IPv4 and "IPv6:"-prefixd IPv6 addresses in the domain part of the email address. No deliverability checks are performed for these addresses. In the object returned by `validate_email`, the normalized domain will use the condensed IPv6 format, if applicable. The object's `domain_address` attribute will hold the parsed `ipaddress.IPv4Address` or `ipaddress.IPv6Address` object if applicable. You can also set `email_validator.ALLOW_DOMAIN_LITERAL` to `True` to turn this on for all calls by default.
140+
140141
`allow_empty_local=False`: Set to `True` to allow an empty local part (i.e.
141142
`@example.com`), e.g. for validating Postfix aliases.
142143

@@ -291,10 +292,12 @@ and conversion from Punycode to Unicode characters.
291292
3.1](https://tools.ietf.org/html/rfc6532#section-3.1) and [RFC 5895
292293
(IDNA 2008) section 2](http://www.ietf.org/rfc/rfc5895.txt).)
293294

294-
Normalization is also applied to quoted-string local parts if you have
295-
allowed them by the `allow_quoted_local` option. Unnecessary backslash
296-
escaping is removed and even the surrounding quotes are removed if they
297-
are unnecessary.
295+
Normalization is also applied to quoted-string local parts and domain
296+
literal IPv6 addresses if you have allowed them by the `allow_quoted_local`
297+
and `allow_domain_literal` options. In quoted-string local parts, unnecessary
298+
backslash escaping is removed and even the surrounding quotes are removed if
299+
they are unnecessary. For IPv6 domain literals, the IPv6 address is
300+
normalized to condensed form.
298301

299302
Examples
300303
--------
@@ -369,6 +372,7 @@ are:
369372
| `ascii_local_part` | If set, the local part, which is composed of ASCII characters only. |
370373
| `domain` | The canonical internationalized Unicode form of the domain part of the email address. If the returned string contains non-ASCII characters, either the [SMTPUTF8](https://tools.ietf.org/html/rfc6531) feature of your mail relay will be required to transmit the message or else the email address's domain part must be converted to IDNA ASCII first: Use `ascii_domain` field instead. |
371374
| `ascii_domain` | The [IDNA](https://tools.ietf.org/html/rfc5891) [Punycode](https://www.rfc-editor.org/rfc/rfc3492.txt)-encoded form of the domain part of the given email address, as it would be transmitted on the wire. |
375+
| `domain_address` | If domain literals are allowed and if the email address contains one, an `ipaddress.IPv4Address` or `ipaddress.IPv6Address` object. |
372376
| `smtputf8` | A boolean indicating that the [SMTPUTF8](https://tools.ietf.org/html/rfc6531) feature of your mail relay will be required to transmit messages to this address because the local part of the address has non-ASCII characters (the local part cannot be IDNA-encoded). If `allow_smtputf8=False` is passed as an argument, this flag will always be false because an exception is raised if it would have been true. |
373377
| `mx` | A list of (priority, domain) tuples of MX records specified in the DNS for the domain (see [RFC 5321 section 5](https://tools.ietf.org/html/rfc5321#section-5)). May be `None` if the deliverability check could not be completed because of a temporary issue like a timeout. |
374378
| `mx_fallback_type` | `None` if an `MX` record is found. If no MX records are actually specified in DNS and instead are inferred, through an obsolete mechanism, from A or AAAA records, the value is the type of DNS record used instead (`A` or `AAAA`). May be `None` if the deliverability check could not be completed because of a temporary issue like a timeout. |
@@ -390,13 +394,12 @@ or likely to cause trouble:
390394
domain names without a `.`, are rejected as a syntax error
391395
(except see the `test_environment` parameter above).
392396
* Obsolete email syntaxes are rejected:
393-
The "quoted string" form of the local part of the email address (RFC
394-
5321 4.1.2) is not permitted unless `allow_quoted_local=True` is given
395-
(see above).
396397
The unusual ["(comment)" syntax](https://github.com/JoshData/python-email-validator/issues/77)
397-
is also rejected. The "literal" form for the domain part of an email address (an
398-
IP address in brackets) is rejected. Other obsolete and deprecated syntaxes are
399-
rejected. No one uses these forms anymore.
398+
is rejected. Extremely old obsolete syntaxes are
399+
rejected. Quoted-string local parts and domain-literal addresses
400+
are rejected by default, but there are options to allow them (see above).
401+
No one uses these forms anymore, and I can't think of any reason why anyone
402+
using this library would need to accept them.
400403

401404

402405
Testing

email_validator/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ def caching_resolver(*args, **kwargs):
2626

2727
ALLOW_SMTPUTF8 = True
2828
ALLOW_QUOTED_LOCAL = False
29+
ALLOW_DOMAIN_LITERAL = False
2930
GLOBALLY_DELIVERABLE = True
3031
CHECK_DELIVERABILITY = True
3132
TEST_ENVIRONMENT = False

email_validator/__main__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ def main(dns_resolver=None):
2828

2929
# Set options from environment variables.
3030
options = {}
31-
for varname in ('ALLOW_SMTPUTF8', 'ALLOW_QUOTED_LOCAL', 'GLOBALLY_DELIVERABLE',
32-
'CHECK_DELIVERABILITY', 'TEST_ENVIRONMENT'):
31+
for varname in ('ALLOW_SMTPUTF8', 'ALLOW_QUOTED_LOCAL', 'ALLOW_DOMAIN_LITERAL',
32+
'GLOBALLY_DELIVERABLE', 'CHECK_DELIVERABILITY', 'TEST_ENVIRONMENT'):
3333
if varname in os.environ:
3434
options[varname.lower()] = bool(os.environ[varname])
3535
for varname in ('DEFAULT_TIMEOUT',):

email_validator/exceptions_types.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@ class ValidatedEmail(object):
3636
Unicode from IDNA ascii."""
3737
domain: str
3838

39+
"""If the domain part is a domain literal, the IPv4Address or IPv6Address object."""
40+
domain_address: object
41+
3942
"""If not None, a form of the email address that uses 7-bit ASCII characters only."""
4043
ascii_email: Optional[str]
4144

@@ -118,4 +121,7 @@ def as_constructor(self):
118121

119122
"""Convenience method for accessing ValidatedEmail as a dict"""
120123
def as_dict(self):
121-
return self.__dict__
124+
d = self.__dict__
125+
if d.get('domain_address'):
126+
d['domain_address'] = repr(d['domain_address'])
127+
return d

email_validator/rfc_constants.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@
2727
DOT_ATOM_TEXT_HOSTNAME = re.compile(HOSTNAME_LABEL + r'(?:\.' + HOSTNAME_LABEL + r')*\Z')
2828
DOMAIN_NAME_REGEX = re.compile(r"[A-Za-z]\Z") # all TLDs currently end with a letter
2929

30+
# Domain literal (RFC 5322 3.4.1)
31+
DOMAIN_LITERAL_CHARS = re.compile(r"[\u0021-\u00FA\u005E-\u007E]")
32+
3033
# Quoted-string local part (RFC 5321 4.1.2, internationalized by RFC 6531 section 3.3)
3134
# The permitted characters in a quoted string are the characters in the range
3235
# 32-126, except that quotes and (literal) backslashes can only appear when escaped

email_validator/syntax.py

Lines changed: 63 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
from .exceptions_types import EmailSyntaxError
22
from .rfc_constants import EMAIL_MAX_LENGTH, LOCAL_PART_MAX_LENGTH, DOMAIN_MAX_LENGTH, \
33
DOT_ATOM_TEXT, DOT_ATOM_TEXT_INTL, ATEXT_RE, ATEXT_INTL_RE, ATEXT_HOSTNAME_INTL, QTEXT_INTL, \
4-
DNS_LABEL_LENGTH_LIMIT, DOT_ATOM_TEXT_HOSTNAME, DOMAIN_NAME_REGEX
4+
DNS_LABEL_LENGTH_LIMIT, DOT_ATOM_TEXT_HOSTNAME, DOMAIN_NAME_REGEX, DOMAIN_LITERAL_CHARS
55

66
import re
77
import unicodedata
88
import idna # implements IDNA 2008; Python's codec is only IDNA 2003
9+
import ipaddress
910
from typing import Optional
1011

1112

@@ -272,13 +273,9 @@ def check_dot_atom(label, start_descr, end_descr, is_hostname):
272273
raise EmailSyntaxError("An email address cannot have a period and a hyphen next to each other.")
273274

274275

275-
def validate_email_domain_part(domain, test_environment=False, globally_deliverable=True):
276+
def validate_email_domain_name(domain, test_environment=False, globally_deliverable=True):
276277
"""Validates the syntax of the domain part of an email address."""
277278

278-
# Empty?
279-
if len(domain) == 0:
280-
raise EmailSyntaxError("There must be something after the @-sign.")
281-
282279
# Check for invalid characters before normalization.
283280
# (RFC 952 plus RFC 6531 section 3.3 for internationalized addresses)
284281
bad_chars = set(
@@ -432,3 +429,63 @@ def validate_email_domain_part(domain, test_environment=False, globally_delivera
432429
"ascii_domain": ascii_domain,
433430
"domain": domain_i18n,
434431
}
432+
433+
434+
def validate_email_domain_literal(domain_literal, allow_domain_literal=False):
435+
# This is obscure domain-literal syntax. Parse it and return
436+
# a compressed/normalized address.
437+
# RFC 5321 4.1.3 and RFC 5322 3.4.1.
438+
439+
# Try to parse the domain literal as an IPv4 address.
440+
# There is no tag for IPv4 addresses, so we can never
441+
# be sure if the user intends an IPv4 address.
442+
if re.match(r"^[0-9\.]+$", domain_literal):
443+
try:
444+
addr = ipaddress.IPv4Address(domain_literal)
445+
except ValueError as e:
446+
raise EmailSyntaxError(f"The address in brackets after the @-sign is not valid: It is not an IPv4 address ({e}) or is missing an address literal tag.")
447+
if not allow_domain_literal:
448+
raise EmailSyntaxError("A bracketed IPv4 address after the @-sign is not allowed here.")
449+
450+
# Return the IPv4Address object and the domain back unchanged.
451+
return {
452+
"domain_address": addr,
453+
"domain": f"[{addr}]",
454+
}
455+
456+
# If it begins with "IPv6:" it's an IPv6 address.
457+
if domain_literal.startswith("IPv6:"):
458+
try:
459+
addr = ipaddress.IPv6Address(domain_literal[5:])
460+
except ValueError as e:
461+
raise EmailSyntaxError(f"The IPv6 address in brackets after the @-sign is not valid ({e}).")
462+
if not allow_domain_literal:
463+
raise EmailSyntaxError("A bracketed IPv6 address after the @-sign is not allowed here.")
464+
465+
# Return the IPv6Address object and construct a normalized
466+
# domain literal.
467+
return {
468+
"domain_address": addr,
469+
"domain": f"[IPv6:{addr.compressed}]",
470+
}
471+
472+
if ":" not in domain_literal:
473+
raise EmailSyntaxError("The part after the @-sign in brackets is not an IPv4 address and has no address literal tag.")
474+
475+
# The tag (the part before the colon) has character restrictions,
476+
# but since it must come from a registry of tags (in which only "IPv6" is defined),
477+
# there's no need to check the syntax of the tag. See RFC 5321 4.1.2.
478+
479+
# Check for permitted ASCII characters. This actually doesn't matter
480+
# since there will be an exception after anyway.
481+
bad_chars = set(
482+
safe_character_display(c)
483+
for c in domain_literal
484+
if not DOMAIN_LITERAL_CHARS.match(c)
485+
)
486+
if bad_chars:
487+
raise EmailSyntaxError("The part after the @-sign contains invalid characters in brackets: " + ", ".join(sorted(bad_chars)) + ".")
488+
489+
# There are no other domain literal tags.
490+
# https://www.iana.org/assignments/address-literal-tags/address-literal-tags.xhtml
491+
raise EmailSyntaxError("The part after the @-sign contains an invalid address literal tag in brackets.")

email_validator/validate_email.py

Lines changed: 28 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from typing import Optional, Union
22

33
from .exceptions_types import EmailSyntaxError, ValidatedEmail
4-
from .syntax import validate_email_local_part, validate_email_domain_part, get_length_reason
4+
from .syntax import validate_email_local_part, validate_email_domain_name, validate_email_domain_literal, get_length_reason
55
from .rfc_constants import EMAIL_MAX_LENGTH, QUOTED_LOCAL_PART_ADDR
66

77

@@ -12,6 +12,7 @@ def validate_email(
1212
allow_smtputf8: Optional[bool] = None,
1313
allow_empty_local: bool = False,
1414
allow_quoted_local: Optional[bool] = None,
15+
allow_domain_literal: Optional[bool] = None,
1516
check_deliverability: Optional[bool] = None,
1617
test_environment: Optional[bool] = None,
1718
globally_deliverable: Optional[bool] = None,
@@ -25,12 +26,14 @@ def validate_email(
2526
"""
2627

2728
# Fill in default values of arguments.
28-
from . import ALLOW_SMTPUTF8, ALLOW_QUOTED_LOCAL, GLOBALLY_DELIVERABLE, \
29-
CHECK_DELIVERABILITY, TEST_ENVIRONMENT, DEFAULT_TIMEOUT
29+
from . import ALLOW_SMTPUTF8, ALLOW_QUOTED_LOCAL, ALLOW_DOMAIN_LITERAL, \
30+
GLOBALLY_DELIVERABLE, CHECK_DELIVERABILITY, TEST_ENVIRONMENT, DEFAULT_TIMEOUT
3031
if allow_smtputf8 is None:
3132
allow_smtputf8 = ALLOW_SMTPUTF8
3233
if allow_quoted_local is None:
3334
allow_quoted_local = ALLOW_QUOTED_LOCAL
35+
if allow_domain_literal is None:
36+
allow_domain_literal = ALLOW_DOMAIN_LITERAL
3437
if check_deliverability is None:
3538
check_deliverability = CHECK_DELIVERABILITY
3639
if test_environment is None:
@@ -90,9 +93,24 @@ def validate_email(
9093
ret.smtputf8 = local_part_info["smtputf8"]
9194

9295
# Validate the email address's domain part syntax and get a normalized form.
93-
domain_part_info = validate_email_domain_part(domain_part, test_environment=test_environment, globally_deliverable=globally_deliverable)
94-
ret.domain = domain_part_info["domain"]
95-
ret.ascii_domain = domain_part_info["ascii_domain"]
96+
is_domain_literal = False
97+
if len(domain_part) == 0:
98+
raise EmailSyntaxError("There must be something after the @-sign.")
99+
100+
elif domain_part.startswith("[") and domain_part.endswith("]"):
101+
# Parse the address in the domain literal and get back a normalized domain.
102+
domain_part_info = validate_email_domain_literal(domain_part[1:-1], allow_domain_literal=allow_domain_literal)
103+
ret.domain = domain_part_info["domain"]
104+
ret.ascii_domain = domain_part_info["domain"] # Domain literals are always ASCII.
105+
ret.domain_address = domain_part_info["domain_address"]
106+
is_domain_literal = True # Prevent deliverability checks.
107+
108+
else:
109+
# Check the syntax of the domain and get back a normalized
110+
# internationalized and ASCII form.
111+
domain_part_info = validate_email_domain_name(domain_part, test_environment=test_environment, globally_deliverable=globally_deliverable)
112+
ret.domain = domain_part_info["domain"]
113+
ret.ascii_domain = domain_part_info["ascii_domain"]
96114

97115
# Construct the complete normalized form.
98116
ret.email = ret.local_part + "@" + ret.domain
@@ -148,6 +166,10 @@ def validate_email(
148166
# Validate the email address's deliverability using DNS
149167
# and update the return dict with metadata.
150168

169+
if is_domain_literal:
170+
# There is nothing to check --- skip deliverability checks.
171+
return ret
172+
151173
# Lazy load `deliverability` as it is slow to import (due to dns.resolver)
152174
from .deliverability import validate_email_deliverability
153175
deliverability_info = validate_email_deliverability(

0 commit comments

Comments
 (0)