Skip to content

Simulate Bot Traffic #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Simulate Bot Traffic #162

wants to merge 2 commits into from

Conversation

henry410213028
Copy link
Collaborator

Types of changes

  • New feature

Description

To qualify for Cloudflare's Verified Bot, we need to simulate bot network traffic.

In this flow, we request top 100 domain websites that come from Cloudflare Radar API.

Follow this steps to create Cloudflare API token:

  1. Create and login your cloudflare account
  2. Visit https://dash.cloudflare.com/profile/api-tokens
  3. Create token >> Use "Read Cloudflare Radar data" template, Add your email to "Account Resources"

max_active_runs=1,
catchup=False,
)
def PYCONTW_ETL_BOT_v1():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably a good idea to make the dag id more readable

def PYCONTW_ETL_BOT_v1():

@task
def GET_TOP_WEBSITES() -> list[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably a good idea to start using lower case

Suggested change
def GET_TOP_WEBSITES() -> list[str]:
def get_top_websties() -> list[str]:

"""
token = Variable.get("CLOUDFLARE_RADAR_API_TOKEN")

url = "https://api.cloudflare.com/client/v4/radar/ranking/top"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


@dag(
default_args=DEFAULT_ARGS,
schedule="@hourly",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just want to make sure we really want to do it hourly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the bot policy, the minimum traffic should exceed 1,000 requests per day across multiple domains, so a higher frequency is required.

return domains

@task
def REQUEST_EACH_WEBSITE(domains: list[str]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def REQUEST_EACH_WEBSITE(domains: list[str]):
def ping_each_website(domains: list[str]) -> None:

looks like we're pinging these sides?

}
resp = requests.get(site_url, headers=headers, timeout=5, allow_redirects=True)
logger.info("GET %s -> %s", site_url, resp.status_code)
except Exception as exc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except Exception as exc:
except Exception as exc:

We probably should catch a narrower exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants