Skip to content

Commit 6a048fd

Browse files
committed
Add an asynchronous method so DNS queries can be run asynchronously
1 parent 5cf49cf commit 6a048fd

12 files changed

+459
-76
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
In Development
22
--------------
33

4+
* The library now includes an asynchronous version of the main method named validate_email_async, which can be called with await, that runs DNS-based deliverability checks asychronously.
45
* A new option to parse `My Name <address@domain>` strings, i.e. a display name plus an email address in angle brackets, is now available. It is off by default.
56
* When a domain name has no MX record but does have an A or AAAA record, if none of the IP addresses in the response are globally reachable (i.e. not Private-Use, Loopback, etc.), the response is treated as if there was no A/AAAA response and the email address will fail the deliverability check.
67
* When a domain name has no MX record but does have an A or AAAA record, the mx field in the object returned by validate_email incorrectly held the IP addresses rather than the domain itself.

README.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ Key features:
1717
can display to end-users.
1818
* Checks deliverability (optional): Does the domain name resolve?
1919
(You can override the default DNS resolver to add query caching.)
20+
* Can be called asynchronously with `await`.
2021
* Supports internationalized domain names (like `@ツ.life`),
2122
internationalized local parts (like `ツ@example.com`),
2223
and optionally parses display names (e.g. `"My Name" <me@example.com>`).
@@ -83,6 +84,9 @@ This validates the address and gives you its normalized form. You should
8384
checking if an address is in your database. When using this in a login form,
8485
set `check_deliverability` to `False` to avoid unnecessary DNS queries.
8586

87+
See below for examples for caching DNS queries and calling the library
88+
asynchronously with `await`.
89+
8690
Usage
8791
-----
8892

@@ -163,6 +167,30 @@ while True:
163167
validate_email(email, dns_resolver=resolver)
164168
```
165169

170+
### Asynchronous call
171+
172+
The library has an alternative, asynchronous method named `validate_email_async` which must be called with `await`. This method uses an [asynchronous DNS resolver](https://dnspython.readthedocs.io/en/latest/async.html) so that multiple DNS-based deliverability checks can be performed in parallel.
173+
174+
Here how to use it. In this example, `import ... as` is used to alias the async method to the usual method name `validate_email`.
175+
176+
```python
177+
from email_validator import validate_email_async as validate_email, \
178+
EmailNotValidError, caching_async_resolver
179+
180+
resolver = caching_async_resolver(timeout=10)
181+
182+
email = "my+address@example.org"
183+
try:
184+
emailinfo = await validate_email(email)
185+
email = emailinfo.normalized
186+
except EmailNotValidError as e:
187+
print(str(e))
188+
```
189+
190+
Note that to create a caching asynchronous resolver, use `caching_async_resolver`. As with the synchronous version, creating a resolver is optional.
191+
192+
When processing batches of email addresses, I found that chunking around 25 email addresses at a time (using e.g. `asyncio.gather()`) resulted in the highest performance. I tested on a residential Internet connection with valid addresses.
193+
166194
### Test addresses
167195

168196
This library rejects email addresses that use the [Special Use Domain Names](https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml) `invalid`, `localhost`, `test`, and some others by raising `EmailSyntaxError`. This is to protect your system from abuse: You probably don't want a user to be able to cause an email to be sent to `localhost` (although they might be able to still do so via a malicious MX record). However, in your non-production test environments you may want to use `@test` or `@myname.test` email addresses. There are three ways you can allow this:

email_validator/__init__.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,14 @@
33
# Export the main method, helper methods, and the public data types.
44
from .exceptions_types import ValidatedEmail, EmailNotValidError, \
55
EmailSyntaxError, EmailUndeliverableError
6-
from .validate_email import validate_email
6+
from .validate_email import validate_email_sync as validate_email, validate_email_async
77
from .version import __version__
88

9-
__all__ = ["validate_email",
9+
__all__ = ["validate_email", "validate_email_async",
1010
"ValidatedEmail", "EmailNotValidError",
1111
"EmailSyntaxError", "EmailUndeliverableError",
12-
"caching_resolver", "__version__"]
12+
"caching_resolver", "caching_async_resolver",
13+
"__version__"]
1314

1415
if TYPE_CHECKING:
1516
from .deliverability import caching_resolver
@@ -21,6 +22,13 @@ def caching_resolver(*args, **kwargs):
2122
return caching_resolver(*args, **kwargs)
2223

2324

25+
def caching_async_resolver(*args, **kwargs):
26+
# Lazy load `deliverability` as it is slow to import (due to dns.resolver)
27+
from .deliverability import caching_async_resolver
28+
29+
return caching_async_resolver(*args, **kwargs)
30+
31+
2432
# These global attributes are a part of the library's API and can be
2533
# changed by library users.
2634

email_validator/__main__.py

Lines changed: 78 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -5,26 +5,88 @@
55
# python -m email_validator test@example.org
66
# python -m email_validator < LIST_OF_ADDRESSES.TXT
77
#
8-
# Provide email addresses to validate either as a command-line argument
9-
# or in STDIN separated by newlines. Validation errors will be printed for
10-
# invalid email addresses. When passing an email address on the command
11-
# line, if the email address is valid, information about it will be printed.
12-
# When using STDIN, no output will be given for valid email addresses.
8+
# Provide email addresses to validate either as a single command-line argument
9+
# or on STDIN separated by newlines.
10+
#
11+
# When passing an email address on the command line, if the email address
12+
# is valid, information about it will be printed to STDOUT. If the email
13+
# address is invalid, an error message will be printed to STDOUT and
14+
# the exit code will be set to 1.
15+
#
16+
# When passsing email addresses on STDIN, validation errors will be printed
17+
# for invalid email addresses. No output is given for valid email addresses.
18+
# Validation errors are preceded by the email address that failed and a tab
19+
# character. It is the user's responsibility to ensure email addresses
20+
# do not contain tab or newline characters.
1321
#
1422
# Keyword arguments to validate_email can be set in environment variables
1523
# of the same name but upprcase (see below).
1624

25+
import itertools
1726
import json
1827
import os
1928
import sys
20-
from typing import Any, Dict, Optional
29+
from typing import Any, Dict
2130

22-
from .validate_email import validate_email, _Resolver
23-
from .deliverability import caching_resolver
31+
from .deliverability import caching_async_resolver
2432
from .exceptions_types import EmailNotValidError
2533

2634

27-
def main(dns_resolver: Optional[_Resolver] = None) -> None:
35+
def main_command_line(email_address, options, dns_resolver):
36+
# Validate the email address passed on the command line.
37+
38+
from . import validate_email
39+
40+
try:
41+
result = validate_email(email_address, dns_resolver=dns_resolver, **options)
42+
print(json.dumps(result.as_dict(), indent=2, sort_keys=True, ensure_ascii=False))
43+
return True
44+
except EmailNotValidError as e:
45+
print(e)
46+
return False
47+
48+
49+
async def main_stdin(options, dns_resolver):
50+
# Validate the email addresses pased line-by-line on STDIN.
51+
# Chunk the addresses and call the async version of validate_email
52+
# for all the addresses in the chunk, and wait for the chunk
53+
# to complete.
54+
55+
import asyncio
56+
57+
from . import validate_email_async as validate_email
58+
59+
dns_resolver = dns_resolver or caching_async_resolver()
60+
61+
# https://stackoverflow.com/a/312467
62+
def split_seq(iterable, size):
63+
it = iter(iterable)
64+
item = list(itertools.islice(it, size))
65+
while item:
66+
yield item
67+
item = list(itertools.islice(it, size))
68+
69+
CHUNK_SIZE = 25
70+
71+
async def process_line(line):
72+
email = line.strip()
73+
try:
74+
await validate_email(email, dns_resolver=dns_resolver, **options)
75+
# If the email was valid, do nothing.
76+
return None
77+
except EmailNotValidError as e:
78+
return (email, e)
79+
80+
chunks = split_seq(sys.stdin, CHUNK_SIZE)
81+
for chunk in chunks:
82+
awaitables = [process_line(line) for line in chunk]
83+
errors = await asyncio.gather(*awaitables)
84+
for error in errors:
85+
if error is not None:
86+
print(*error, sep='\t')
87+
88+
89+
def main(dns_resolver=None):
2890
# The dns_resolver argument is for tests.
2991

3092
# Set options from environment variables.
@@ -37,24 +99,14 @@ def main(dns_resolver: Optional[_Resolver] = None) -> None:
3799
if varname in os.environ:
38100
options[varname.lower()] = float(os.environ[varname])
39101

40-
if len(sys.argv) == 1:
41-
# Validate the email addresses pased line-by-line on STDIN.
42-
dns_resolver = dns_resolver or caching_resolver()
43-
for line in sys.stdin:
44-
email = line.strip()
45-
try:
46-
validate_email(email, dns_resolver=dns_resolver, **options)
47-
except EmailNotValidError as e:
48-
print(f"{email} {e}")
102+
if len(sys.argv) == 2:
103+
return main_command_line(sys.argv[1], options, dns_resolver)
49104
else:
50-
# Validate the email address passed on the command line.
51-
email = sys.argv[1]
52-
try:
53-
result = validate_email(email, dns_resolver=dns_resolver, **options)
54-
print(json.dumps(result.as_dict(), indent=2, sort_keys=True, ensure_ascii=False))
55-
except EmailNotValidError as e:
56-
print(e)
105+
import asyncio
106+
asyncio.run(main_stdin(options, dns_resolver))
107+
return True
57108

58109

59110
if __name__ == "__main__":
60-
main()
111+
if not main():
112+
sys.exit(1)

email_validator/deliverability.py

Lines changed: 53 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
from .exceptions_types import EmailUndeliverableError
66

77
import dns.resolver
8+
import dns.asyncresolver
89
import dns.exception
910

1011

@@ -25,30 +26,73 @@ def caching_resolver(*, timeout: Optional[int] = None, cache: Any = None, dns_re
2526
}, total=False)
2627

2728

28-
def validate_email_deliverability(domain: str, domain_i18n: str, timeout: Optional[int] = None, dns_resolver: Optional[dns.resolver.Resolver] = None) -> DeliverabilityInfo:
29+
def caching_async_resolver(*, timeout: Optional[int] = None, cache=None, dns_resolver=None):
30+
if timeout is None:
31+
from . import DEFAULT_TIMEOUT
32+
timeout = DEFAULT_TIMEOUT
33+
resolver = dns_resolver or dns.asyncresolver.Resolver()
34+
resolver.cache = cache or dns.resolver.LRUCache() # type: ignore
35+
resolver.lifetime = timeout # type: ignore # timeout, in seconds
36+
return resolver
37+
38+
39+
async def validate_email_deliverability(
40+
domain: str,
41+
domain_i18n: str,
42+
timeout: Optional[int] = None,
43+
dns_resolver: Optional[dns.resolver.Resolver] = None,
44+
async_loop: Optional[bool] = None
45+
) -> DeliverabilityInfo:
2946
# Check that the domain resolves to an MX record. If there is no MX record,
3047
# try an A or AAAA record which is a deprecated fallback for deliverability.
3148
# Raises an EmailUndeliverableError on failure. On success, returns a dict
3249
# with deliverability information.
3350

51+
# When async_loop is None, the caller drives the coroutine manually to get
52+
# the result synchronously, and consequently this call must not yield execution.
53+
# It can use 'await' so long as the callee does not yield execution either.
54+
# Otherwise, if async_loop is not None, there is no restriction on 'await' calls'.
55+
3456
# If no dns.resolver.Resolver was given, get dnspython's default resolver.
35-
# Override the default resolver's timeout. This may affect other uses of
36-
# dnspython in this process.
57+
# Use the asyncresolver if async_loop is not None.
3758
if dns_resolver is None:
59+
if not async_loop:
60+
dns_resolver = dns.resolver.get_default_resolver()
61+
else:
62+
dns_resolver = dns.asyncresolver.get_default_resolver()
63+
64+
# Override the default resolver's timeout. This may affect other uses of
65+
# dnspython in this process.
3866
from . import DEFAULT_TIMEOUT
3967
if timeout is None:
4068
timeout = DEFAULT_TIMEOUT
41-
dns_resolver = dns.resolver.get_default_resolver()
4269
dns_resolver.lifetime = timeout
70+
4371
elif timeout is not None:
4472
raise ValueError("It's not valid to pass both timeout and dns_resolver.")
4573

46-
deliverability_info: DeliverabilityInfo = {}
74+
# Define a resolve function that works with a regular or
75+
# asynchronous dns.resolver.Resolver instance.
76+
async def resolve(qname, rtype):
77+
# When called non-asynchronously, expect a regular
78+
# resolver that returns synchronously. Or if async_loop
79+
# is not None but the caller didn't pass an
80+
# dns.asyncresolver.Resolver, call it synchronously.
81+
if not async_loop or not isinstance(dns_resolver, dns.asyncresolver.Resolver):
82+
return dns_resolver.resolve(qname, rtype)
83+
84+
# When async_loop is not None and if given a
85+
# dns.asyncresolver.Resolver, call it asynchronously.
86+
else:
87+
return await dns_resolver.resolve(qname, rtype)
88+
89+
# Collect successful deliverability information here.
90+
deliverability_info = DeliverabilityInfo()
4791

4892
try:
4993
try:
5094
# Try resolving for MX records (RFC 5321 Section 5).
51-
response = dns_resolver.resolve(domain, "MX")
95+
response = await resolve(domain, "MX")
5296

5397
# For reporting, put them in priority order and remove the trailing dot in the qnames.
5498
mtas = sorted([(r.preference, str(r.exchange).rstrip('.')) for r in response])
@@ -84,11 +128,7 @@ def is_global_addr(address: Any) -> bool:
84128
return ipaddr.is_global
85129

86130
try:
87-
response = dns_resolver.resolve(domain, "A")
88-
89-
if not any(is_global_addr(r.address) for r in response):
90-
raise dns.resolver.NoAnswer # fall back to AAAA
91-
131+
response = await resolve(domain, "A")
92132
deliverability_info["mx"] = [(0, domain)]
93133
deliverability_info["mx_fallback_type"] = "A"
94134

@@ -97,11 +137,7 @@ def is_global_addr(address: Any) -> bool:
97137
# If there was no A record, fall back to an AAAA record.
98138
# (It's unclear if SMTP servers actually do this.)
99139
try:
100-
response = dns_resolver.resolve(domain, "AAAA")
101-
102-
if not any(is_global_addr(r.address) for r in response):
103-
raise dns.resolver.NoAnswer
104-
140+
response = await resolve(domain, "AAAA")
105141
deliverability_info["mx"] = [(0, domain)]
106142
deliverability_info["mx_fallback_type"] = "AAAA"
107143

@@ -118,7 +154,7 @@ def is_global_addr(address: Any) -> bool:
118154
# absence of an MX record, this is probably a good sign that the
119155
# domain is not used for email.
120156
try:
121-
response = dns_resolver.resolve(domain, "TXT")
157+
response = await resolve(domain, "TXT")
122158
for rec in response:
123159
value = b"".join(rec.strings)
124160
if value.startswith(b"v=spf1 "):

0 commit comments

Comments
 (0)