Skip to content

Commit 90b8b20

Browse files
committed
Add overall length checks and improve the error messages for the local-part and domain length checks
Fixes #35.
1 parent 8243bd2 commit 90b8b20

File tree

3 files changed

+94
-13
lines changed

3 files changed

+94
-13
lines changed

README.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ route to the destination, including your own outbound mail server, all
161161
support the [SMTPUTF8 (RFC 6531)](https://tools.ietf.org/html/rfc6531)
162162
extension. Support for SMTPUTF8 varies.
163163

164-
### How this module works
164+
### If you know ahead of time that SMTPUTF8 is not supported by your mail submission stack
165165

166166
By default all internationalized forms are accepted by the validator.
167167
But if you know ahead of time that SMTPUTF8 is not supported by your
@@ -214,10 +214,13 @@ print(email)
214214
# prints: me@domain.com
215215
```
216216

217-
Because you may get an email address in a variety of forms, you ought to
218-
replace it with its normalized form immediately prior to going into your
219-
database (during account creation), querying your database (during
220-
login), or sending outbound mail.
217+
Because an end-user might type their email address in different (but
218+
equivalent) un-normalized forms at different times, you ought to
219+
replace what they enter with the normalized form immediately prior to
220+
going into your database (during account creation), querying your database
221+
(during login), or sending outbound mail. Normalization may also change
222+
the length of an email address, and this may affect whether it is valid
223+
and acceptable by your SMTP provider.
221224

222225
The normalizations include lowercasing the domain part of the email
223226
address (domain names are case-insensitive), [Unicode "NFC"

email_validator/__init__.py

Lines changed: 78 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -201,20 +201,71 @@ def validate_email(
201201
ret.domain = domain_part_info["domain"]
202202
ret.ascii_domain = domain_part_info["ascii_domain"]
203203

204-
if check_deliverability:
205-
# Validate the email address's deliverability and update the
206-
# return dict with metadata.
207-
deliverability_info = validate_email_deliverability(ret["domain"], ret["domain_i18n"], timeout)
208-
ret.mx = deliverability_info["mx"]
209-
ret.mx_fallback_type = deliverability_info["mx-fallback"]
210-
211204
# Construct the complete normalized form.
212205
ret.email = ret.local_part + "@" + ret.domain
213206

214207
# If the email address has an ASCII form, add it.
215208
if not ret.smtputf8:
216209
ret.ascii_email = ret.ascii_local_part + "@" + ret.ascii_domain
217210

211+
# RFC 3696 + errata 1003 + errata 1690 (https://www.rfc-editor.org/errata_search.php?rfc=3696&eid=1690)
212+
# explains the maximum length of an email address is 254 octets.
213+
#
214+
# If the email address has an ASCII representation, then we assume it may be
215+
# transmitted in ASCII (we can't assume SMTPUTF8 will be used on all hops to
216+
# the destination) and the length limit applies to ASCII characters (which is
217+
# the same as octets). The number of characters in the internationalized form
218+
# may be many fewer (because IDNA ASCII is verbose) and could be less than 254
219+
# Unicode characters, and of course the number of octets over the limit may
220+
# not be the number of characters over the limit, so if the email address is
221+
# internationalized, we can't give any simple information about why the address
222+
# is too long.
223+
#
224+
# In addition, check that the UTF-8 encoding (i.e. not IDNA ASCII and not
225+
# Unicode characters) is at most 254 octets. If the addres is transmitted using
226+
# SMTPUTF8, then the length limit probably applies to the UTF-8 encoded octets.
227+
# If the email address has an ASCII form that differs from its internationalized
228+
# form, I don't think the internationalized form can be longer, and so the ASCII
229+
# form length check would be sufficient. If there is no ASCII form, then we have
230+
# to check the UTF-8 encoding. The UTF-8 encoding could be up to about four times
231+
# longer than the number of characters.
232+
#
233+
# See the length checks on the local part and the domain.
234+
if ret.ascii_email and len(ret.ascii_email) > 254:
235+
if ret.ascii_email == ret.email:
236+
reason = " ({} character{} too many)".format(
237+
len(ret.ascii_email) - 254,
238+
"s" if (len(ret.ascii_email) - 254 != 1) else ""
239+
)
240+
elif len(ret.email) > 254:
241+
# If there are more than 254 characters, then the ASCII
242+
# form is definitely going to be too long.
243+
reason = " (at least {} character{} too many)".format(
244+
len(ret.email) - 254,
245+
"s" if (len(ret.email) - 254 != 1) else ""
246+
)
247+
else:
248+
reason = " (when converted to IDNA ASCII)"
249+
raise EmailSyntaxError("The email address is too long{}.".format(reason))
250+
if len(ret.email.encode("utf8")) > 254:
251+
if len(ret.email) > 254:
252+
# If there are more than 254 characters, then the UTF-8
253+
# encoding is definitely going to be too long.
254+
reason = " (at least {} character{} too many)".format(
255+
len(ret.email) - 254,
256+
"s" if (len(ret.email) - 254 != 1) else ""
257+
)
258+
else:
259+
reason = " (when encoded in bytes)"
260+
raise EmailSyntaxError("The email address is too long{}.".format(reason))
261+
262+
if check_deliverability:
263+
# Validate the email address's deliverability and update the
264+
# return dict with metadata.
265+
deliverability_info = validate_email_deliverability(ret["domain"], ret["domain_i18n"], timeout)
266+
ret.mx = deliverability_info["mx"]
267+
ret.mx_fallback_type = deliverability_info["mx-fallback"]
268+
218269
return ret
219270

220271

@@ -234,8 +285,16 @@ def validate_email_local_part(local, allow_smtputf8=True, allow_empty_local=Fals
234285
}
235286

236287
# RFC 5321 4.5.3.1.1
288+
# We're checking the number of characters here. If the local part
289+
# is ASCII-only, then that's the same as bytes (octets). If it's
290+
# internationalized, then the UTF-8 encoding may be longer, but
291+
# that may not be relevant. We will check the total address length
292+
# instead.
237293
if len(local) > 64:
238-
raise EmailSyntaxError("The email address is too long before the @-sign.")
294+
raise EmailSyntaxError("The email address is too long before the @-sign ({} character{} too many).".format(
295+
len(local) - 64,
296+
"s" if (len(local) - 64 != 1) else ""
297+
))
239298

240299
# Check the local part against the regular expression for the older ASCII requirements.
241300
m = re.match(DOT_ATOM_TEXT + "\\Z", local)
@@ -314,6 +373,12 @@ def validate_email_domain_part(domain):
314373
try:
315374
ascii_domain = idna.encode(domain, uts46=False).decode("ascii")
316375
except idna.IDNAError as e:
376+
if "Domain too long" in str(e):
377+
# We can't really be more specific because UTS-46 normalization means
378+
# the length check is applied to a string that is different from the
379+
# one the user supplied. Also I'm not sure if the length check applies
380+
# to the internationalized form, the IDNA ASCII form, or even both!
381+
raise EmailSyntaxError("The email address is too long after the @-sign.")
317382
raise EmailSyntaxError("The domain name %s contains invalid characters (%s)." % (domain, str(e)))
318383

319384
# We may have been given an IDNA ASCII domain to begin with. Check
@@ -329,6 +394,11 @@ def validate_email_domain_part(domain):
329394
raise EmailSyntaxError("The domain name %s is not valid IDNA (%s)." % (ascii_domain, str(e)))
330395

331396
# RFC 5321 4.5.3.1.2
397+
# We're checking the number of bytes (octets) here, which can be much
398+
# higher than the number of characters in internationalized domains,
399+
# on the assumption that the domain may be transmitted without SMTPUTF8
400+
# as IDNA ASCII. This is also checked by idna.encode, so this exception
401+
# is never reached.
332402
if len(ascii_domain) > 255:
333403
raise EmailSyntaxError("The email address is too long after the @-sign.")
334404

tests/test_main.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -233,6 +233,14 @@ def test_email_valid(email_input, output):
233233
('\nmy@example.com', 'The email address contains invalid characters before the @-sign: \n.'),
234234
('m\ny@example.com', 'The email address contains invalid characters before the @-sign: \n.'),
235235
('my\n@example.com', 'The email address contains invalid characters before the @-sign: \n.'),
236+
('11111111112222222222333333333344444444445555555555666666666677777@example.com', 'The email address is too long before the @-sign (1 character too many).'),
237+
('111111111122222222223333333333444444444455555555556666666666777777@example.com', 'The email address is too long before the @-sign (2 characters too many).'),
238+
('me@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.111111111122222222223333333333444444444455555555556.com', 'The email address is too long after the @-sign.'),
239+
('my.long.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.11111111112222222222333333333344444.info', 'The email address is too long (2 characters too many).'),
240+
('my.long.address@λ111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.11111111112222222222333333.info', 'The email address is too long (when converted to IDNA ASCII).'),
241+
('my.long.address@λ111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444.info', 'The email address is too long (at least 1 character too many).'),
242+
('my.λong.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.111111111122222222223333333333444.info', 'The email address is too long (when encoded in bytes).'),
243+
('my.λong.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444.info', 'The email address is too long (at least 1 character too many).'),
236244
],
237245
)
238246
def test_email_invalid(email_input, error_msg):

0 commit comments

Comments
 (0)