-
Notifications
You must be signed in to change notification settings - Fork 10
Simulate Bot Traffic #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Simulate Bot Traffic #162
Conversation
max_active_runs=1, | ||
catchup=False, | ||
) | ||
def PYCONTW_ETL_BOT_v1(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably a good idea to make the dag id more readable
def PYCONTW_ETL_BOT_v1(): | ||
|
||
@task | ||
def GET_TOP_WEBSITES() -> list[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably a good idea to start using lower case
def GET_TOP_WEBSITES() -> list[str]: | |
def get_top_websties() -> list[str]: |
""" | ||
token = Variable.get("CLOUDFLARE_RADAR_API_TOKEN") | ||
|
||
url = "https://api.cloudflare.com/client/v4/radar/ranking/top" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest we use https://airflow.apache.org/docs/apache-airflow-providers-http/stable/operators.html insetad
|
||
@dag( | ||
default_args=DEFAULT_ARGS, | ||
schedule="@hourly", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just want to make sure we really want to do it hourly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the bot policy, the minimum traffic should exceed 1,000 requests per day across multiple domains, so a higher frequency is required.
return domains | ||
|
||
@task | ||
def REQUEST_EACH_WEBSITE(domains: list[str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def REQUEST_EACH_WEBSITE(domains: list[str]): | |
def ping_each_website(domains: list[str]) -> None: |
looks like we're pinging these sides?
} | ||
resp = requests.get(site_url, headers=headers, timeout=5, allow_redirects=True) | ||
logger.info("GET %s -> %s", site_url, resp.status_code) | ||
except Exception as exc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except Exception as exc: | |
except Exception as exc: |
We probably should catch a narrower exception.
Types of changes
Description
To qualify for Cloudflare's Verified Bot, we need to simulate bot network traffic.
In this flow, we request top 100 domain websites that come from Cloudflare Radar API.
Follow this steps to create Cloudflare API token: