Skip to content

Stabilized v1.3.1 #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 46 commits into from
Jun 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
4ea4dd5
Update README.md
OSINT-TECHNOLOGIES May 14, 2025
04a61dd
Bumped version
OSINT-TECHNOLOGIES May 14, 2025
a22a12c
Added long query for demonstrating PS process in HTML report
OSINT-TECHNOLOGIES May 14, 2025
f64abac
Added support for PageSearch process listing in HTML report
OSINT-TECHNOLOGIES May 14, 2025
f9455ea
Added support for PageSearch process listing in HTML report
OSINT-TECHNOLOGIES May 14, 2025
9b478c1
Added support for PageSearch process listing
OSINT-TECHNOLOGIES May 14, 2025
df7b197
Updated default report template visual
OSINT-TECHNOLOGIES May 15, 2025
5eb4c95
Fixed error with UnboundVariable (ps_string) when PS is not activated
OSINT-TECHNOLOGIES May 15, 2025
f1813ad
Reworked Dorking function, moved from MS to selenium
OSINT-TECHNOLOGIES May 16, 2025
d8b4191
Added new field for Dorking browser's path
OSINT-TECHNOLOGIES May 16, 2025
ba7ee8e
Delete service/pdf_report_templates/paragraph_report_template.html
OSINT-TECHNOLOGIES May 20, 2025
f0f72c4
Delete service/pdf_report_templates/compromise_report_template.html
OSINT-TECHNOLOGIES May 20, 2025
c214d49
Delete service/pdf_report_templates/monospaced_report_template.html
OSINT-TECHNOLOGIES May 20, 2025
a008fd9
Added config values support for HTML template selection
OSINT-TECHNOLOGIES May 20, 2025
babe4ae
Added values for HTML report style selection
OSINT-TECHNOLOGIES May 20, 2025
a39dfb0
Remaked Default report template into Legacy
OSINT-TECHNOLOGIES May 20, 2025
57d4198
Delete service/pdf_report_templates/default_report_temp.html
OSINT-TECHNOLOGIES May 20, 2025
d06e55f
Added Modern version (default now)
OSINT-TECHNOLOGIES May 20, 2025
ec63c81
Added "Print" button (using printer and PDF generation)
OSINT-TECHNOLOGIES May 22, 2025
6e4a525
Update README.md
OSINT-TECHNOLOGIES May 22, 2025
3eaa6e7
Added extended statistics variables for HTML reporting
OSINT-TECHNOLOGIES May 28, 2025
bb2358b
Added extended statistics variables for HTML reporting
OSINT-TECHNOLOGIES May 28, 2025
5a9d3b5
Added extended statistics variables for HTML reporting
OSINT-TECHNOLOGIES May 28, 2025
4f4bb73
Improved statistics paragraph, added statistics diagram
OSINT-TECHNOLOGIES May 29, 2025
e7c8a30
Added browser mode config value for Dorking
OSINT-TECHNOLOGIES May 31, 2025
565e237
Added browser mode value support
OSINT-TECHNOLOGIES May 31, 2025
11cfb03
Improved visuals, added "Website technical files" paragraph
OSINT-TECHNOLOGIES Jun 1, 2025
415be8b
Added sitemap/robots files content transfering in HTML report
OSINT-TECHNOLOGIES Jun 1, 2025
c70bfde
Improved config parameters handling
OSINT-TECHNOLOGIES Jun 1, 2025
09fbf2b
Added support for delete_txt_files config parameter
OSINT-TECHNOLOGIES Jun 1, 2025
0f7b457
Added config parameter to handle sitemap/robots file deletioin when c…
OSINT-TECHNOLOGIES Jun 1, 2025
6a7f640
Added report comparison function, improved visuals
OSINT-TECHNOLOGIES Jun 1, 2025
97bb1c4
Cleaned-up imports
OSINT-TECHNOLOGIES Jun 1, 2025
ce3d598
Cleaned-up imports
OSINT-TECHNOLOGIES Jun 1, 2025
da94fe7
Bumped version & cleaned imports
OSINT-TECHNOLOGIES Jun 1, 2025
8a481ad
Update poetry.lock
OSINT-TECHNOLOGIES Jun 1, 2025
02dfc59
Cleaned-up imports
OSINT-TECHNOLOGIES Jun 1, 2025
8dddc96
Cleaned-up imports
OSINT-TECHNOLOGIES Jun 1, 2025
a7ea4b8
Added "Show/Hide ToC" button
OSINT-TECHNOLOGIES Jun 12, 2025
66298bb
Added context help, progress bar, minimap and separate columns for li…
OSINT-TECHNOLOGIES Jun 12, 2025
d4a3e9a
Removed empty place in statistics part; removed some placeholders
OSINT-TECHNOLOGIES Jun 13, 2025
30b86ec
Removed placeholders
OSINT-TECHNOLOGIES Jun 13, 2025
9cd1a72
Added undetected_chromedriver requirement
OSINT-TECHNOLOGIES Jun 13, 2025
8f6257c
Update poetry.lock
OSINT-TECHNOLOGIES Jun 13, 2025
25c0b31
Added undetected_chromedriver requirement
OSINT-TECHNOLOGIES Jun 13, 2025
0994f8c
Merge branch 'main' into rolling
OSINT-TECHNOLOGIES Jun 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,9 +170,9 @@ If you have problems with starting installer.sh, you should try to use `dos2unix

# Tasks to complete before new release
- [ ] CLI rework (more fancy and user-friendly)
- [ ] Report storage database rework
- [ ] HTML report rework
- [ ] Report storage database rework (more information to store)
- [ ] HTML report rework (modern style and look; functionality expansion)

# DPULSE mentions in social medias

## Honorable mentions:
Expand Down
25 changes: 10 additions & 15 deletions datagather_modules/crawl_processor.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,17 @@
import sys
import socket
import re
import urllib
from collections import defaultdict
from urllib.parse import urlparse
import whois
import requests
from bs4 import BeautifulSoup
from colorama import Fore, Style

sys.path.append('service')
from logs_processing import logging

try:
import socket
import whois
import re
import requests
import urllib.parse
from colorama import Fore, Style
from urllib.parse import urlparse
from collections import defaultdict
from bs4 import BeautifulSoup
import random
except ImportError as e:
print(Fore.RED + "Import error appeared. Reason: {}".format(e) + Style.RESET_ALL)
sys.exit()

def ip_gather(short_domain):
ip_address = socket.gethostbyname(short_domain)
return ip_address
Expand Down
46 changes: 26 additions & 20 deletions datagather_modules/data_assembler.py
Original file line number Diff line number Diff line change
@@ -1,34 +1,24 @@
import sys
sys.path.append('service')
sys.path.append('pagesearch')
sys.path.append('dorking')
sys.path.append('snapshotting')
from datetime import datetime
import os
from colorama import Fore, Style

sys.path.extend(['service', 'pagesearch', 'dorking', 'snapshotting'])

from logs_processing import logging
from config_processing import read_config
from db_creator import get_dorking_query
import crawl_processor as cp
import dorking_handler as dp
import networking_processor as np
from pagesearch_parsers import subdomains_parser
from logs_processing import logging
from api_virustotal import api_virustotal_check
from api_securitytrails import api_securitytrails_check
from api_hudsonrock import api_hudsonrock_check
from db_creator import get_dorking_query
from screen_snapshotting import take_screenshot
from config_processing import read_config
from html_snapshotting import save_page_as_html
from archive_snapshotting import download_snapshot

try:
import requests
from datetime import datetime
import os
from colorama import Fore, Style
import sqlite3
import configparser
except ImportError as e:
print(Fore.RED + "Import error appeared. Reason: {}".format(e) + Style.RESET_ALL)
sys.exit()

def establishing_dork_db_connection(dorking_flag):
dorking_db_paths = {
'basic': 'dorking//basic_dorking.db',
Expand Down Expand Up @@ -118,6 +108,10 @@ def data_gathering(self, short_domain, url, report_file_type, pagesearch_flag, k
for key in common_socials:
common_socials[key] = list(set(common_socials[key]))
total_socials = sum(len(values) for values in common_socials.values())
total_ports = len(ports)
total_ips = len(subdomain_ip) + 1
total_vulns = len(vulns)

print(Fore.LIGHTMAGENTA_EX + "\n[BASIC SCAN END]\n" + Style.RESET_ALL)
if report_file_type == 'xlsx':
if pagesearch_flag.lower() == 'y':
Expand Down Expand Up @@ -206,7 +200,17 @@ def data_gathering(self, short_domain, url, report_file_type, pagesearch_flag, k
if subdomains[0] != 'No subdomains were found':
to_search_array = [subdomains, social_medias, sd_socials]
print(Fore.LIGHTMAGENTA_EX + "\n[EXTENDED SCAN START: PAGESEARCH]\n" + Style.RESET_ALL)
ps_emails_return, accessible_subdomains, emails_amount, files_counter, cookies_counter, api_keys_counter, website_elements_counter, exposed_passwords_counter, keywords_messages_list = subdomains_parser(to_search_array[0], report_folder, keywords, keywords_flag)
(
ps_emails_return,
accessible_subdomains,
emails_amount,
files_counter,
cookies_counter,
api_keys_counter,
website_elements_counter,
exposed_passwords_counter,
keywords_messages_list
), ps_string = subdomains_parser(to_search_array[0], report_folder, keywords, keywords_flag)
total_links_counter = accessed_links_counter = "No results because PageSearch does not gather these categories"
if len(keywords_messages_list) == 0:
keywords_messages_list = ['No keywords were found']
Expand All @@ -215,11 +219,13 @@ def data_gathering(self, short_domain, url, report_file_type, pagesearch_flag, k
print(Fore.RED + "Cant start PageSearch because no subdomains were detected")
ps_emails_return = ""
accessible_subdomains = files_counter = cookies_counter = api_keys_counter = website_elements_counter = exposed_passwords_counter = total_links_counter = accessed_links_counter = emails_amount = 'No results because no subdomains were found'
ps_string = 'No PageSearch listing provided because no subdomains were found'
keywords_messages_list = ['No data was gathered because no subdomains were found']
pass
elif pagesearch_flag.lower() == 'n':
accessible_subdomains = files_counter = cookies_counter = api_keys_counter = website_elements_counter = exposed_passwords_counter = total_links_counter = accessed_links_counter = emails_amount = keywords_messages_list = "No results because user did not selected PageSearch for this scan"
ps_emails_return = ""
ps_string = 'No PageSearch listing provided because user did not selected PageSearch mode for this scan'
pass

if dorking_flag == 'n':
Expand Down Expand Up @@ -282,7 +288,7 @@ def data_gathering(self, short_domain, url, report_file_type, pagesearch_flag, k
hostnames, cpes, tags, vulns, common_socials, total_socials, ps_emails_return,
accessible_subdomains, emails_amount, files_counter, cookies_counter, api_keys_counter,
website_elements_counter, exposed_passwords_counter, total_links_counter, accessed_links_counter, keywords_messages_list, dorking_status, dorking_file_path,
virustotal_output, securitytrails_output, hudsonrock_output]
virustotal_output, securitytrails_output, hudsonrock_output, ps_string, total_ports, total_ips, total_vulns]

report_info_array = [casename, db_casename, db_creation_date, report_folder, ctime, report_file_type, report_ctime, api_scan_db, used_api_flag]
logging.info(f'### THIS LOG PART FOR {casename} CASE, TIME: {ctime} ENDS HERE')
Expand Down
134 changes: 91 additions & 43 deletions dorking/dorking_handler.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,18 @@
import sys
import random
import time
import os
import logging
from colorama import Fore, Style
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys

sys.path.append('service')
from config_processing import read_config
from logs_processing import logging
from ua_rotator import user_agent_rotator
from proxies_rotator import proxies_rotator

try:
import requests.exceptions
from colorama import Fore, Style
import mechanicalsoup
import re
import requests
import sqlite3
import time
import os
except ImportError as e:
print(Fore.RED + "Import error appeared. Reason: {}".format(e) + Style.RESET_ALL)
sys.exit()
from config_processing import read_config

def proxy_transfer():
proxy_flag, proxies_list = proxies_rotator.get_proxies()
Expand All @@ -27,44 +23,96 @@ def proxy_transfer():
working_proxies = proxies_rotator.check_proxies(proxies_list)
return proxy_flag, working_proxies

def solid_google_dorking(query, dorking_delay, delay_step, proxy_flag, proxies_list, pages=100):
def solid_google_dorking(query, proxy_flag, proxies_list, pages=1):
result_query = []
request_count = 0
try:
browser = mechanicalsoup.StatefulBrowser()
if proxy_flag == 1:
browser.session.proxies = proxies_rotator.get_random_proxy(proxies_list)
else:
config_values = read_config()
options = uc.ChromeOptions()
options.binary_location = r"{}".format(config_values['dorking_browser'])
dorking_browser_mode = config_values['dorking_browser_mode']
if dorking_browser_mode.lower() == 'headless':
options.add_argument("--headless=new")
elif dorking_browser_mode.lower() == 'nonheadless':
pass
browser.open("https://www.google.com/")
browser.select_form('form[action="/search"]')
browser["q"] = str(query)
browser.submit_selected(btnName="btnG")
result_query = []
request_count = 0
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-blink-features=AutomationControlled")
options.add_argument("--disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument(f"user-agent={user_agent_rotator.get_random_user_agent()}")
if proxy_flag == 1:
proxy = proxies_rotator.get_random_proxy(proxies_list)
options.add_argument(f'--proxy-server={proxy["http"]}')
driver = uc.Chrome(options=options)
for page in range(pages):
try:
for link in browser.links():
target = link.attrs['href']
if (target.startswith('/url?') and not target.startswith("/url?q=http://webcache.googleusercontent.com")):
target = re.sub(r"^/url\?q=([^&]*)&.*", r"\1", target)
result_query.append(target)
driver.get("https://www.google.com")
time.sleep(random.uniform(2, 4))
try:
accepted = False
try:
accept_btn = driver.find_element(By.XPATH, '//button[contains(text(), "Принять все") or contains(text(), "Accept all")]')
driver.execute_script("arguments[0].click();", accept_btn)
print(Fore.GREEN + 'Pressed "Accept all" button!' + Style.RESET_ALL)
accepted = True
time.sleep(random.uniform(2, 3))
except:
pass
if not accepted:
iframes = driver.find_elements(By.TAG_NAME, "iframe")
for iframe in iframes:
driver.switch_to.frame(iframe)
try:
accept_btn = driver.find_element(By.XPATH, '//button[contains(text(), "Принять все") or contains(text(), "Accept all")]')
driver.execute_script("arguments[0].click();", accept_btn)
print(Fore.GREEN + 'Pressed "Accept all" button!' + Style.RESET_ALL)
accepted = True
driver.switch_to.default_content()
time.sleep(random.uniform(2, 3))
break
except:
driver.switch_to.default_content()
continue
driver.switch_to.default_content()
if not accepted:
print(Fore.GREEN + "Google TOS button was not found. Seems good..." + Style.RESET_ALL)
except Exception:
print(Fore.RED + f'Error with pressing "Accept all" button. Closing...' + Style.RESET_ALL)
driver.save_screenshot("consent_error.png")
driver.switch_to.default_content()
search_box = driver.find_element(By.NAME, "q")
for char in query:
search_box.send_keys(char)
time.sleep(random.uniform(0.05, 0.2))
time.sleep(random.uniform(0.5, 1.2))
search_box.send_keys(Keys.RETURN)
time.sleep(random.uniform(2.5, 4))
links = driver.find_elements(By.CSS_SELECTOR, 'a')
for link in links:
href = link.get_attribute('href')
if href and href.startswith('http') and 'google.' not in href and 'webcache.googleusercontent.com' not in href:
result_query.append(href)
request_count += 1
if request_count % delay_step == 0:
time.sleep(dorking_delay)
browser.session.headers['User-Agent'] = user_agent_rotator.get_random_user_agent()
browser.follow_link(nr=page + 1)
except mechanicalsoup.LinkNotFoundError:
break
try:
next_button = driver.find_element(By.ID, 'pnnext')
next_button.click()
time.sleep(random.uniform(2, 3))
except:
break
except Exception as e:
logging.error(f'DORKING PROCESSING: ERROR. REASON: {e}')
del result_query[-2:]
logging.error(f'DORKING PROCESSING (SELENIUM): ERROR. REASON: {e}')
continue
driver.quit()
if len(result_query) >= 2:
del result_query[-2:]
return result_query
except requests.exceptions.ConnectionError as e:
print(Fore.RED + "Error while establishing connection with domain. No results will appear. See journal for details" + Style.RESET_ALL)
logging.error(f'DORKING PROCESSING: ERROR. REASON: {e}')
except Exception as e:
logging.error(f'DORKING PROCESSING: ERROR. REASON: {e}')
print(Fore.RED + "Error while running Selenium dorking. See journal for details." + Style.RESET_ALL)
return []

def save_results_to_txt(folderpath, table, queries, pages=10):
def save_results_to_txt(folderpath, table, queries, pages=1):
try:
config_values = read_config()
dorking_delay = int(config_values['dorking_delay (secs)'])
Expand All @@ -80,7 +128,7 @@ def save_results_to_txt(folderpath, table, queries, pages=10):
for i, query in enumerate(queries, start=1):
f.write(f"QUERY #{i}: {query}\n")
try:
results = solid_google_dorking(query, dorking_delay, delay_step, proxy_flag, proxies_list, pages)
results = solid_google_dorking(query, proxy_flag, proxies_list, pages)
if not results:
f.write("=> NO RESULT FOUND\n")
total_results.append((query, 0))
Expand Down
Loading