Skip to content

Commit c1c7924

Browse files
committed
Some README improvements, remove references to obsoleted RFCs, add a test for an obsoleted quoted-string syntax
1 parent 416aeb6 commit c1c7924

File tree

4 files changed

+60
-48
lines changed

4 files changed

+60
-48
lines changed

README.md

Lines changed: 47 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -14,22 +14,23 @@ Key features:
1414

1515
* Checks that an email address has the correct syntax --- good for
1616
registration/login forms or other uses related to identifying users.
17-
By default, rejects obsolete email address syntax that you'd find unexpected.
1817
* Gives friendly English error messages when validation fails that you
1918
can display to end-users.
2019
* Checks deliverability (optional): Does the domain name resolve?
2120
(You can override the default DNS resolver to add query caching.)
2221
* Supports internationalized domain names and internationalized local parts.
23-
Blocks unsafe characters for your safety.
22+
* Rejects addresses with unsafe Unicode characters, obsolete email address
23+
syntax that you'd find unexpected, special use domain names like
24+
`@localhost`, and domains without a dot by default. This is an
25+
opinionated library!
2426
* Normalizes email addresses (important for internationalized
2527
and quoted-string addresses! see below).
2628
* Python type annotations are used.
2729

28-
This library does NOT permit obsolete forms of email addresses by default,
29-
so if you need strict validation against the email specs exactly, use
30-
[pyIsEmail](https://github.com/michaelherold/pyIsEmail) or try
31-
[flanker](https://github.com/mailgun/flanker) if you are parsing the
32-
"To:" line of an email.
30+
This is an opinionated library. You should definitely also consider using
31+
the less-opinionated [pyIsEmail](https://github.com/michaelherold/pyIsEmail) and
32+
[flanker](https://github.com/mailgun/flanker) if they are better for your
33+
use case.
3334

3435
[![Build Status](https://github.com/JoshData/python-email-validator/actions/workflows/test_and_build.yaml/badge.svg)](https://github.com/JoshData/python-email-validator/actions/workflows/test_and_build.yaml)
3536

@@ -57,21 +58,23 @@ account in your application, you might do this:
5758
```python
5859
from email_validator import validate_email, EmailNotValidError
5960

60-
email = "my+address@mydomain.tld"
61-
is_new_account = True # False for login pages
61+
email = "my+address@example.org"
6262

6363
try:
64-
# Check that the email address is valid.
65-
validation = validate_email(email, check_deliverability=is_new_account)
6664

67-
# Take the normalized form of the email address
68-
# for all logic beyond this point (especially
69-
# before going to a database query where equality
70-
# may not take into account Unicode normalization).
65+
# Check that the email address is valid. Turn on check_deliverability
66+
# for first-time validations like on account creation pages (but not
67+
# login pages).
68+
validation = validate_email(email, check_deliverability=False)
69+
70+
# After this point, use only the normalized form of the email address,
71+
# especially before going to a database query.
7172
email = validation.email
73+
7274
except EmailNotValidError as e:
73-
# Email is not valid.
74-
# The exception message is human-readable.
75+
76+
# The exception message is human-readable explanation of why it's
77+
# not a valid (or deliverable) email address.
7578
print(str(e))
7679
```
7780

@@ -108,12 +111,15 @@ that no one uses anymore even though they are still valid and deliverable, since
108111
they will probably give you grief if you're using email for login. (See
109112
later in the document about how to allow some obsolete forms.)
110113

111-
The validator checks that the domain name in the email address has a
112-
DNS MX record (except a NULL MX record) indicating that it can receive
113-
email (or a fallback A-record, see below).
114-
There is nothing to be gained by trying to actually contact an SMTP
115-
server, so that's not done here. For privacy, security, and practicality
116-
reasons servers are good at not giving away whether an address is
114+
The validator optionally checks that the domain name in the email address has
115+
a DNS MX record indicating that it can receive email. (Except a Null MX record.
116+
If there is no MX record, a fallback A/AAAA-record is permitted, unless
117+
a reject-all SPF record is present.) DNS is slow and sometimes unavailable or
118+
unreliable, so consider whether these checks are useful for your use case and
119+
turn them off if they aren't.
120+
There is nothing to be gained by trying to actually contact an SMTP server, so
121+
that's not done here. For privacy, security, and practicality reasons, servers
122+
are good at not giving away whether an address is
117123
deliverable or not: email addresses that appear to accept mail at first
118124
can bounce mail after a delay, and bounced mail may indicate a temporary
119125
failure of a good email address (sometimes an intentional failure, like
@@ -124,11 +130,11 @@ greylisting).
124130
The `validate_email` function also accepts the following keyword arguments
125131
(defaults are as shown below):
126132

127-
`check_deliverability=True`: If true, a DNS query is made to check that a non-null MX record is present for the domain-part of the email address (or if not, an A/AAAA record as an MX fallback can be present but in that case a reject-all SPF record must not be present). Set to `False` to skip this DNS-based check. DNS is slow and sometimes unavailable, so consider whether these checks are useful for your use case. It is recommended to pass `False` when performing validation for login pages (but not account creation pages) since re-validation of a previously validated domain in your database by querying DNS at every login is probably undesirable. You can also set `email_validator.CHECK_DELIVERABILITY` to `False` to turn this off for all calls by default.
133+
`check_deliverability=True`: If true, DNS queries are made to check that the domain name in the email address (the part after the @-sign) can receive mail, as described above. Set to `False` to skip this DNS-based check. It is recommended to pass `False` when performing validation for login pages (but not account creation pages) since re-validation of a previously validated domain in your database by querying DNS at every login is probably undesirable. You can also set `email_validator.CHECK_DELIVERABILITY` to `False` to turn this off for all calls by default.
128134

129-
`dns_resolver=None`: Pass an instance of [dns.resolver.Resolver](https://dnspython.readthedocs.io/en/latest/resolver-class.html) to control the DNS resolver including setting a timeout and [a cache](https://dnspython.readthedocs.io/en/latest/resolver-caching.html). The `caching_resolver` function shown above is a helper function to construct a dns.resolver.Resolver with a [LRUCache](https://dnspython.readthedocs.io/en/latest/resolver-caching.html#dns.resolver.LRUCache). Reuse the same resolver instance across calls to `validate_email` to make use of the cache.
135+
`dns_resolver=None`: Pass an instance of [dns.resolver.Resolver](https://dnspython.readthedocs.io/en/latest/resolver-class.html) to control the DNS resolver including setting a timeout and [a cache](https://dnspython.readthedocs.io/en/latest/resolver-caching.html). The `caching_resolver` function shown below is a helper function to construct a dns.resolver.Resolver with a [LRUCache](https://dnspython.readthedocs.io/en/latest/resolver-caching.html#dns.resolver.LRUCache). Reuse the same resolver instance across calls to `validate_email` to make use of the cache.
130136

131-
`test_environment=False`: DNS-based deliverability checks are disabled and `test` and `subdomain.test` domain names are permitted (see below). You can also set `email_validator.TEST_ENVIRONMENT` to `True` to turn it on for all calls by default.
137+
`test_environment=False`: If `True`, DNS-based deliverability checks are disabled and `test` and `**.test` domain names are permitted (see below). You can also set `email_validator.TEST_ENVIRONMENT` to `True` to turn it on for all calls by default.
132138

133139
`allow_smtputf8=True`: Set to `False` to prohibit internationalized addresses that would
134140
require the
@@ -160,15 +166,15 @@ while True:
160166
This library rejects email addresess that use the [Special Use Domain Names](https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml) `invalid`, `localhost`, `test`, and some others by raising `EmailSyntaxError`. This is to protect your system from abuse: You probably don't want a user to be able to cause an email to be sent to `localhost`. However, in your non-production test environments you may want to use `@test` or `@myname.test` email addresses. There are three ways you can allow this:
161167

162168
1. Add `test_environment=True` to the call to `validate_email` (see above).
163-
2. Set `email_validator.TEST_ENVIRONMENT` to `True`.
164-
3. Remove the special-use domain name that you want to use from `email_validator.SPECIAL_USE_DOMAIN_NAMES`:
169+
2. Set `email_validator.TEST_ENVIRONMENT` to `True` globally.
170+
3. Remove the special-use domain name that you want to use from `email_validator.SPECIAL_USE_DOMAIN_NAMES`, e.g.:
165171

166172
```python
167173
import email_validator
168174
email_validator.SPECIAL_USE_DOMAIN_NAMES.remove("test")
169175
```
170176

171-
It is tempting to use `@example.com/net/org` in tests. These domains are reserved to IANA for use in documentation so there is no risk of accidentally emailing someone at those domains. But beware that this library will reject these domain names if DNS-based deliverability checks are not disabled because these domains do not resolve to domains that accept email. In tests, consider using your own domain name or `@test` or `@myname.test` instead.
177+
It is tempting to use `@example.com/net/org` in tests. They are *not* in this library's `SPECIAL_USE_DOMAIN_NAMES` list so you can, but shouldn't, use them. These domains are reserved to IANA for use in documentation so there is no risk of accidentally emailing someone at those domains. But beware that this library will nevertheless reject these domain names if DNS-based deliverability checks are not disabled because these domains do not resolve to domains that accept email. In tests, consider using your own domain name or `@test` or `@myname.test` instead.
172178

173179
Internationalized email addresses
174180
---------------------------------
@@ -255,17 +261,23 @@ change the user's login information without telling them.)
255261
Normalization
256262
-------------
257263

264+
### Unicode Normalization
265+
258266
The use of Unicode in email addresses introduced a normalization
259267
problem. Different Unicode strings can look identical and have the same
260268
semantic meaning to the user. The `email` field returned on successful
261269
validation provides the correctly normalized form of the given email
262-
address:
270+
address.
271+
272+
For example, the CJK fullwidth Latin letters are considered semantically
273+
equivalent in domain names to their ASCII counterparts. This library
274+
normalizes them to their ASCII counterparts:
263275

264276
```python
265277
valid = validate_email("me@Domain.com")
266-
email = valid.ascii_email
267-
print(email)
268-
# prints: me@domain.com
278+
print(valid.email)
279+
print(valid.ascii_email)
280+
# prints "me@domain.com" twice
269281
```
270282

271283
Because an end-user might type their email address in different (but
@@ -292,6 +304,8 @@ and conversion from Punycode to Unicode characters.
292304
3.1](https://tools.ietf.org/html/rfc6532#section-3.1) and [RFC 5895
293305
(IDNA 2008) section 2](http://www.ietf.org/rfc/rfc5895.txt).)
294306

307+
### Other Normalization
308+
295309
Normalization is also applied to quoted-string local parts and domain
296310
literal IPv6 addresses if you have allowed them by the `allow_quoted_local`
297311
and `allow_domain_literal` options. In quoted-string local parts, unnecessary

email_validator/rfc_constants.py

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,13 @@
22

33
import re
44

5-
# Based on RFC 2822 section 3.2.4 / RFC 5322 section 3.2.3, these
6-
# characters are permitted in email addresses (not taking into
7-
# account internationalization):
5+
# Based on RFC 5322 3.2.3, these characters are permitted in email
6+
# addresses (not taking into account internationalization) separated by dots:
87
ATEXT = r'a-zA-Z0-9_!#\$%&\'\*\+\-/=\?\^`\{\|\}~'
98
ATEXT_RE = re.compile('[.' + ATEXT + ']') # ATEXT plus dots
10-
11-
# A "dot atom text", per RFC 2822 3.2.4:
129
DOT_ATOM_TEXT = re.compile('[' + ATEXT + ']+(?:\\.[' + ATEXT + r']+)*\Z')
1310

14-
# RFC 6531 section 3.3 extends the allowed characters in internationalized
11+
# RFC 6531 3.3 extends the allowed characters in internationalized
1512
# addresses to also include three specific ranges of UTF8 defined in
1613
# RFC 3629 section 4, which appear to be the Unicode code points from
1714
# U+0080 to U+10FFFF.
@@ -20,7 +17,7 @@
2017
DOT_ATOM_TEXT_INTL = re.compile('[' + ATEXT_INTL + ']+(?:\\.[' + ATEXT_INTL + r']+)*\Z')
2118

2219
# The domain part of the email address, after IDNA (ASCII) encoding,
23-
# must also satisfy the requirements of RFC 952/RFC 1123 Section 2.1 which
20+
# must also satisfy the requirements of RFC 952/RFC 1123 2.1 which
2421
# restrict the allowed characters of hostnames further.
2522
ATEXT_HOSTNAME_INTL = re.compile(r"[a-zA-Z0-9\-\." + "\u0080-\U0010FFFF" + "]")
2623
HOSTNAME_LABEL = r'(?:(?:[a-zA-Z0-9][a-zA-Z0-9\-]*)?[a-zA-Z0-9])'
@@ -30,7 +27,7 @@
3027
# Domain literal (RFC 5322 3.4.1)
3128
DOMAIN_LITERAL_CHARS = re.compile(r"[\u0021-\u00FA\u005E-\u007E]")
3229

33-
# Quoted-string local part (RFC 5321 4.1.2, internationalized by RFC 6531 section 3.3)
30+
# Quoted-string local part (RFC 5321 4.1.2, internationalized by RFC 6531 3.3)
3431
# The permitted characters in a quoted string are the characters in the range
3532
# 32-126, except that quotes and (literal) backslashes can only appear when escaped
3633
# by a backslash. When internationalized, UTF8 strings are also permitted except

email_validator/syntax.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -64,11 +64,11 @@ def validate_email_local_part(local: str, allow_smtputf8: bool = True, allow_emp
6464

6565
# Check the local part against the non-internationalized regular expression.
6666
# Most email addresses match this regex so it's probably fastest to check this first.
67-
# (RFC 2822 3.2.4)
67+
# (RFC 5322 3.2.3)
6868
# All local parts matching the dot-atom rule are also valid as a quoted string
6969
# so if it was originally quoted (quoted_local_part is True) and this regex matches,
7070
# it's ok.
71-
# (RFC 5321 4.1.2).
71+
# (RFC 5321 4.1.2 / RFC 5322 3.2.4).
7272
m = DOT_ATOM_TEXT.match(local)
7373
if m:
7474
# It's valid. And since it's just the permitted ASCII characters,
@@ -95,7 +95,7 @@ def validate_email_local_part(local: str, allow_smtputf8: bool = True, allow_emp
9595
if not allow_smtputf8:
9696
# Check for invalid characters against the non-internationalized
9797
# permitted character set.
98-
# (RFC 2822 Section 3.2.4 / RFC 5322 Section 3.2.3)
98+
# (RFC 5322 3.2.3)
9999
bad_chars = set(
100100
safe_character_display(c)
101101
for c in local
@@ -184,7 +184,7 @@ def validate_email_local_part(local: str, allow_smtputf8: bool = True, allow_emp
184184
# don't apply in those cases.)
185185

186186
# Check for invalid characters.
187-
# (RFC 2822 Section 3.2.4 / RFC 5322 Section 3.2.3, plus RFC 6531 section 3.3)
187+
# (RFC 5322 3.2.3, plus RFC 6531 3.3)
188188
bad_chars = set(
189189
safe_character_display(c)
190190
for c in local
@@ -194,7 +194,7 @@ def validate_email_local_part(local: str, allow_smtputf8: bool = True, allow_emp
194194
raise EmailSyntaxError("The email address contains invalid characters before the @-sign: " + ", ".join(sorted(bad_chars)) + ".")
195195

196196
# Check for dot errors imposted by the dot-atom rule.
197-
# (RFC 2822 3.2.4)
197+
# (RFC 5322 3.2.3)
198198
check_dot_atom(local, 'An email address cannot start with a {}.', 'An email address cannot have a {} immediately before the @-sign.', is_hostname=False)
199199

200200
# All of the reasons should already have been checked, but just in case
@@ -255,7 +255,7 @@ def check_unsafe_chars(s, allow_space=False):
255255

256256

257257
def check_dot_atom(label, start_descr, end_descr, is_hostname):
258-
# RFC 2822 3.2.4
258+
# RFC 5322 3.2.3
259259
if label.endswith("."):
260260
raise EmailSyntaxError(end_descr.format("period"))
261261
if label.startswith("."):
@@ -308,7 +308,7 @@ def validate_email_domain_name(domain, test_environment=False, globally_delivera
308308
# Check that before we do IDNA encoding because the IDNA library gives
309309
# unfriendly errors for these cases, but after UTS-46 normalization because
310310
# it can insert periods and hyphens (from fullwidth characters).
311-
# (RFC 952, RFC 2822 3.2.4)
311+
# (RFC 952, RFC 1123 2.1, RFC 5322 3.2.3)
312312
check_dot_atom(domain, 'An email address cannot have a {} immediately after the @-sign.', 'An email address cannot end with a {}.', is_hostname=True)
313313

314314
# Check for RFC 5890's invalid R-LDH labels, which are labels that start

tests/test_syntax.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -306,6 +306,7 @@ def test_domain_literal():
306306
('my\n@example.com', 'The email address contains invalid characters before the @-sign: U+000A.'),
307307
('test@\n', 'The part after the @-sign contains invalid characters: U+000A.'),
308308
('bad"quotes"@example.com', 'The email address contains invalid characters before the @-sign: \'"\'.'),
309+
('obsolete."quoted".atom@example.com', 'The email address contains invalid characters before the @-sign: \'"\'.'),
309310
('11111111112222222222333333333344444444445555555555666666666677777@example.com', 'The email address is too long before the @-sign (1 character too many).'),
310311
('111111111122222222223333333333444444444455555555556666666666777777@example.com', 'The email address is too long before the @-sign (2 characters too many).'),
311312
('me@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.111111111122222222223333333333444444444455555555556.com', 'The email address is too long (4 characters too many).'),

0 commit comments

Comments
 (0)