v1.1

rafabd1 · rafabd1 · commit 4c2b3d5ab99f · 2025-01-24T12:39:04.000-03:00
- Improved report format
diff --git a/README.md b/README.md
@@ -96,7 +96,7 @@ python main.py -F --output-dir ./reports
 
 Here is an example of a generated report:
 
-[Example Report](./example/all_report.md)
+[Example Report](./example/)
 
 ## Contributing
 
diff --git a/example/all_report.md b/example/all_report.md
diff --git a/example/all_report_trends.md b/example/all_report_trends.md
@@ -0,0 +1,41 @@
+# Security Trends Report
+
+Okay, here's a cohesive summary of the key trends and security notes extracted from the provided information:
+
+**Overall Trend: Widespread Vulnerabilities and Systemic Issues**
+
+The data reveals a landscape of widespread vulnerabilities affecting a variety of technologies and vendors, including major players like IBM, Siemens, and Xerox. This highlights a critical need for consistent security practices across the board. The vulnerabilities are not isolated incidents but point to underlying, potentially systemic, issues in software development practices.
+
+**Key Vulnerability Types and Patterns:**
+
+*   **WordPress Plugin Vulnerabilities:**  A significant number of vulnerabilities are concentrated within WordPress plugins. These commonly stem from inadequate input sanitization and output escaping, leading to:
+    *   **Stored Cross-Site Scripting (XSS):**  Malicious scripts injected into a website and executed by users.
+    *   **Local File Inclusion (LFI):**  Attackers gaining access to sensitive files on the server.
+    *   **Cross-Site Request Forgery (CSRF):** Forcing authenticated users to perform actions they didn't intend.
+    *   **SQL Injection:** Injecting malicious SQL code to manipulate or access data.
+
+*   **Input Validation and Sanitization Failures:** Across the board, insufficient input validation and sanitization are primary causes of vulnerabilities. This leads to issues like:
+    *   **Path Traversal:** Attackers accessing files and directories outside of intended paths.
+    *   **Insecure Handling of Inputs:**  Improper processing of user-supplied data, opening doors for various exploits.
+
+*   **Privilege Escalation:** A recurring issue, where attackers exploit vulnerabilities to gain higher access levels than intended.
+
+*   **Cross-Site Scripting (XSS):**  A pervasive issue, often due to inadequate output escaping.
+
+*   **Remote Code Execution (RCE):**  A significant security risk, where attackers can execute arbitrary code on a target system.
+
+**Systemic Issues and Development Practices:**
+
+*   **Recurring Vulnerability Patterns:**  The presence of similar vulnerability types across different products and vendors strongly suggests systemic weaknesses in development methodologies, such as a lack of consistent secure coding practices.
+*   **Lack of Regular Updates and Thorough Code Reviews:** The vulnerabilities found in established software highlight the critical need for regular security updates and thorough code reviews as part of the development process.
+*   **Insufficient Validation or Permission Controls:** A common theme is the failure to adequately validate user inputs and enforce proper access control mechanisms.
+
+**Security Notes:**
+
+*   **Importance of Regular Patching:**  The prevalence of vulnerabilities underscores the necessity for organizations to promptly install security updates and patches.
+*   **Need for Secure Development Practices:**  Vendors and developers should adopt secure coding practices, emphasizing input validation, sanitization, and proper access control mechanisms.
+*   **Code Reviews are Essential:** Thorough code reviews can help identify and mitigate potential vulnerabilities early in the development lifecycle.
+*   **Security Training for Developers:** Investing in security training for developers is crucial to ensure they are aware of common vulnerabilities and know how to prevent them.
+*   **Regular Vulnerability Scanning:** Continuously scanning systems for vulnerabilities is necessary for identifying and fixing weaknesses before they can be exploited.
+
+**In conclusion, the data paints a picture of a vulnerable software ecosystem. Addressing these widespread and systemic issues requires a multi-faceted approach that involves improving security practices, prioritizing updates, and actively monitoring for potential threats.**
diff --git a/example/all_vulnerabilities.json b/example/all_vulnerabilities.json
diff --git a/src/ai_prompts.py b/src/ai_prompts.py
@@ -10,50 +10,28 @@
 """
 
 summarization_prompt = """Analyze the vulnerabilities in this batch and group them by their affected technology/product.
-For each vulnerability, provide a concise technical description.
+For each vulnerability:
+1) Provide a concise technical description (max 200 chars).
+2) Indicate the index of the item no JSON references needed beyond that.
+
+Also, produce a short summary of observed trends or relevant security notes in free text. (call it "trendSummary")
 
 Input data:
 THIS_JSON
 
-Required format:
-{
-    "technologies": [
-        {
-            "name": str,          # Name of the technology/product
-            "items": [
-                {
-                    "index": int,           # Index in the input batch
-                    "description": str      # Technical description (max 200 chars)
-                }
-            ]
-        }
-    ],
-    "trends": [                  # List of observed security trends
-        {
-            "trend": str,        # Description of the trend
-            "impact": str        # Potential security impact
-        }
-    ]
-}
-
-Example response:
+Expected response format (JSON):
 {
     "technologies": [
         {
-            "name": "Apache Server",
+            "name": str,
             "items": [
                 {
-                    "index": 0,
-                    "description": "Memory corruption vulnerability in mod_proxy allows remote attackers to execute arbitrary code via crafted HTTP requests"
+                    "index": int,
+                    "description": str
                 }
             ]
         }
     ],
-    "trends": [
-        {
-            "trend": "Increase in HTTP request smuggling vulnerabilities",
-            "impact": "Allows bypass of security controls and potential RCE"
-        }
-    ]
+    "trendSummary": str
 }
 """
diff --git a/src/scrapers/sources.py b/src/scrapers/sources.py
@@ -109,7 +109,6 @@ def get_full_disclosure_latest(start_date, end_date, use_ai=True, max_items=None
         else:
             current_date = datetime(current_date.year, current_date.month + 1, 1)
 
-    # print_final_progress(Fore.GREEN + f"Collected {len(vulns)}/{max_items if max_items else 'unlimited'} items from [SecLists] full disclosure" + Style.RESET_ALL)
     return vulns
 
 # Exploit-DB source
diff --git a/src/summarizer.py b/src/summarizer.py
@@ -51,40 +51,40 @@ def format_vulnerability_entry(vuln: dict, tech_item: dict) -> str:
     else:
         return f"- [{vuln['title']}]({vuln['link']}) ({vuln['date']}) [{vuln['source']}]\n    - {desc}"
 
-def generate_markdown_report(vulns: List[dict], all_classifications: List[dict], report_type: str) -> str:
+def format_date_str(date_str: str) -> str:
+    try:
+        parsed_date = parse(date_str)
+        return parsed_date.strftime('%d %b %Y')
+    except:
+        return date_str
+
+def generate_markdown_report(vulns: List[dict], all_classifications: List[dict]) -> str:
     report = f"""# Vulnerability Analysis Report
 Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
 Total Vulnerabilities Analyzed: {len(vulns)}
 
 ## Vulnerabilities by Technology
 
 """
-    # Add vulnerabilities grouped by technology
-    all_trends = []
-    for classification in all_classifications:
-        # Add vulnerabilities
+    for idx, classification in enumerate(all_classifications):
         tech_sections = classification.get('technologies', [])
-        for tech in tech_sections:
+        for t_idx, tech in enumerate(tech_sections):
             tech_name = tech['name']
             report += f"### {tech_name}\n\n"
             for item in tech['items']:
                 vuln = vulns[item['index']]
-                entry = format_vulnerability_entry(vuln, item)
-                report += f"{entry}\n\n"
-        
-        # Collect trends
-        all_trends.extend(classification.get('trends', []))
-    
-    # Add trends section
-    if all_trends:
-        report += "\n## Security Trends Analysis\n\n"
-        for trend in all_trends:
-            report += f"### {trend['trend']}\n"
-            report += f"**Impact**: {trend['impact']}\n\n"
-    
+                date_formatted = format_date_str(vuln['date'])
+                source = vuln['source']
+                report += f"{date_formatted} [{source}]\n"
+                report += f"- [{vuln['title']}]({vuln['link']})\n"
+                report += f"    - {item['description']}\n\n"
+            report += "---\n"
     return report
 
-def summarize_vulnerabilities(input_file: str = "./output/all_vulnerabilities.json", output_file: str = "./output/vulnerability_report.md"):
+def summarize_vulnerabilities(
+    input_file: str = "./output/all_vulnerabilities.json",
+    output_file: str = "./output/vulnerability_report.md"
+):
     print(Fore.BLUE + f"[theWatcher] Loading vulnerabilities from {input_file}" + Style.RESET_ALL)
     if not api_key:
         print(Fore.YELLOW + "[theWatcher] No API key found. Skipping summarization." + Style.RESET_ALL)
@@ -96,10 +96,10 @@ def summarize_vulnerabilities(input_file: str = "./output/all_vulnerabilities.js
     batches = batch_vulnerabilities(all_vulns)
     total_batches = len(batches)
     all_classifications = []
+    trends_summaries = []
     requests_count = 0
     current_batch = 0
 
-    # Process all batches first
     for i, batch in enumerate(batches):
         print(Fore.BLUE + f"[theWatcher] Summarizing items in batch {i+1}/{total_batches}" + Style.RESET_ALL)
         if current_batch != i + 1:
@@ -133,9 +133,9 @@ def summarize_vulnerabilities(input_file: str = "./output/all_vulnerabilities.js
                     classification = json.loads(response.text)
                     if isinstance(classification, dict) and 'technologies' in classification:
                         all_classifications.append({
-                            'technologies': classification['technologies'],
-                            'trends': classification.get('trends', [])
+                            'technologies': classification['technologies']
                         })
+                        trends_summaries.append(classification.get('trendSummary', ''))
                         break
                 print(Fore.YELLOW + f"[theWatcher] Retrying batch {i+1}/{total_batches}..." + Style.RESET_ALL)
             except Exception as e:
@@ -144,16 +144,33 @@ def summarize_vulnerabilities(input_file: str = "./output/all_vulnerabilities.js
 
         requests_count += 1
 
-    # Generate final report only once
-    report = generate_markdown_report(all_vulns, all_classifications, 
-                                   'nist' if 'nist' in input_file else 'sources')
-    
-    # Write complete report
+    report = generate_markdown_report(all_vulns, all_classifications)
     os.makedirs(os.path.dirname(output_file), exist_ok=True)
     with open(output_file, 'w', encoding='utf-8') as f:
         f.write(report)
 
+    trends_file = output_file.replace(".md", "_trends.md")
+    print(Fore.BLUE + "[theWatcher] Generating trends report..." + Style.RESET_ALL)
+    final_trends_prompt = (
+        "Sumarize the main trends and security notes from these partial summaries:\n\n"
+        + "\n\n".join(trends_summaries) +
+        "\n\nCreate a cohesive final explanation of key insights."
+    )
+    try:
+        response2 = model.generate_content(
+            final_trends_prompt,
+            generation_config=genai.GenerationConfig(response_mime_type="text/plain")
+        )
+        final_trends = response2.text if response2 else "No trend info."
+    except:
+        final_trends = "No trend info."
+
+    with open(trends_file, 'w', encoding='utf-8') as f:
+        f.write("# Security Trends Report\n\n")
+        f.write(final_trends)
+
     print(Fore.GREEN + f"[theWatcher] Report saved in {output_file}" + Style.RESET_ALL)
+    print(Fore.GREEN + f"[theWatcher] Trends saved in {trends_file}" + Style.RESET_ALL)
 
 def validate_summary_format(summary: dict) -> bool:
     """Validate if summary follows the required format"""