|
| 1 | +# Cloud Provider Configuration Audit Report |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +This report identifies field name mismatches, validation issues, and inconsistencies across all cloud providers in the Clustrix widget system. Similar to the Lambda Cloud issue that was found earlier, several providers have discrepancies between the widget field names, ClusterConfig field names, and the actual cloud provider implementation expectations. |
| 6 | + |
| 7 | +## Key Findings |
| 8 | + |
| 9 | +### 1. AWS Provider Issues |
| 10 | + |
| 11 | +**Field Name Mismatches:** |
| 12 | +- **Widget uses:** `aws_access_key` and `aws_secret_key` |
| 13 | +- **ClusterConfig has:** Both `aws_access_key_id`/`aws_secret_access_key` (standard AWS naming) AND `aws_access_key`/`aws_secret_key` (widget naming) |
| 14 | +- **Provider expects:** `access_key_id` and `secret_access_key` (standard boto3 naming) |
| 15 | + |
| 16 | +**Validation Issues:** |
| 17 | +- The `_test_aws_connectivity()` method uses `config.get("aws_profile")` but the widget has no profile field |
| 18 | +- Test method expects credentials from config but doesn't map widget field names correctly |
| 19 | +- Missing validation for required fields like `aws_access_key_id` vs `aws_access_key` |
| 20 | + |
| 21 | +**DEFAULT_CONFIGS Issues:** |
| 22 | +- Uses `aws_region`, `aws_instance_type`, `aws_cluster_type` - matches widget |
| 23 | +- Missing any credential field examples in defaults |
| 24 | + |
| 25 | +### 2. Azure Provider Issues |
| 26 | + |
| 27 | +**Field Name Mismatches:** |
| 28 | +- **Widget uses:** `azure_subscription_id`, `azure_client_id`, `azure_client_secret` |
| 29 | +- **ClusterConfig has:** Same names - GOOD |
| 30 | +- **Provider expects:** `subscription_id`, `client_id`, `client_secret` (without azure_ prefix) |
| 31 | + |
| 32 | +**Validation Issues:** |
| 33 | +- The `_test_azure_connectivity()` method tries to use `DefaultAzureCredential()` but should use the widget-provided credentials |
| 34 | +- Test method doesn't properly map `azure_subscription_id` -> `subscription_id` etc. |
| 35 | +- Missing `tenant_id` in widget (provider requires it) |
| 36 | + |
| 37 | +**DEFAULT_CONFIGS Issues:** |
| 38 | +- Uses `azure_region`, `azure_instance_type` - matches widget and config |
| 39 | +- Missing credential fields in defaults |
| 40 | + |
| 41 | +### 3. GCP Provider Issues |
| 42 | + |
| 43 | +**Field Name Mismatches:** |
| 44 | +- **Widget uses:** `gcp_project_id`, `gcp_region`, `gcp_instance_type`, `gcp_service_account_key` |
| 45 | +- **ClusterConfig has:** Same names - GOOD |
| 46 | +- **Provider expects:** `project_id`, `service_account_key`, `region` (without gcp_ prefix) |
| 47 | + |
| 48 | +**Validation Issues:** |
| 49 | +- The `_test_gcp_connectivity()` method tries to use default credentials instead of widget-provided service account key |
| 50 | +- No proper validation of service account JSON format in widget |
| 51 | +- Test method doesn't map field names correctly |
| 52 | + |
| 53 | +**DEFAULT_CONFIGS Issues:** |
| 54 | +- Uses `gcp_region`, `gcp_instance_type` - matches widget |
| 55 | +- Missing `gcp_project_id` and credentials in defaults |
| 56 | + |
| 57 | +### 4. Lambda Cloud Provider Issues |
| 58 | + |
| 59 | +**Field Name Mismatches:** |
| 60 | +- **Widget uses:** `lambda_api_key`, `lambda_instance_type` |
| 61 | +- **ClusterConfig has:** Same names - GOOD |
| 62 | +- **Provider expects:** `api_key` (without lambda_ prefix) |
| 63 | + |
| 64 | +**Validation Issues:** |
| 65 | +- The `_test_lambda_connectivity()` method correctly maps `lambda_api_key` to `api_key` - GOOD |
| 66 | +- This is the one provider that was fixed! |
| 67 | + |
| 68 | +**DEFAULT_CONFIGS Issues:** |
| 69 | +- Uses `lambda_instance_type` - matches widget and config |
| 70 | +- Missing `lambda_api_key` in defaults (expected for security) |
| 71 | + |
| 72 | +### 5. HuggingFace Provider Issues |
| 73 | + |
| 74 | +**Field Name Mismatches:** |
| 75 | +- **Widget uses:** `hf_token`, `hf_username`, `hf_hardware`, `hf_sdk` |
| 76 | +- **ClusterConfig has:** Same names - GOOD |
| 77 | +- **Provider expects:** `token`, `username` (without hf_ prefix) |
| 78 | + |
| 79 | +**Validation Issues:** |
| 80 | +- No test connectivity method implemented in widget for HuggingFace |
| 81 | +- Provider expects credentials with different names than ClusterConfig |
| 82 | + |
| 83 | +**DEFAULT_CONFIGS Issues:** |
| 84 | +- Uses `hf_hardware`, `hf_sdk` - matches widget and config |
| 85 | +- Missing credential fields in defaults |
| 86 | + |
| 87 | +## Detailed Analysis |
| 88 | + |
| 89 | +### Widget Test Configuration Methods |
| 90 | + |
| 91 | +The widget has test connectivity methods that attempt to validate cloud provider configurations: |
| 92 | + |
| 93 | +1. `_test_aws_connectivity()` - Uses wrong credential field names |
| 94 | +2. `_test_azure_connectivity()` - Uses DefaultAzureCredential instead of provided credentials |
| 95 | +3. `_test_gcp_connectivity()` - Uses default credentials instead of service account key |
| 96 | +4. `_test_lambda_connectivity()` - Works correctly (recently fixed) |
| 97 | +5. `_test_huggingface_connectivity()` - Not implemented |
| 98 | + |
| 99 | +### ClusterConfig Field Mapping Issues |
| 100 | + |
| 101 | +The ClusterConfig class tries to support both standard and widget naming: |
| 102 | +- AWS: Has both `aws_access_key_id` and `aws_access_key` fields |
| 103 | +- Other providers: Only have widget-style naming |
| 104 | + |
| 105 | +### Provider Implementation Expectations |
| 106 | + |
| 107 | +Each provider's `authenticate()` method expects specific field names: |
| 108 | +- **AWS:** `access_key_id`, `secret_access_key`, `region`, `session_token` |
| 109 | +- **Azure:** `subscription_id`, `client_id`, `client_secret`, `tenant_id`, `region`, `resource_group` |
| 110 | +- **GCP:** `project_id`, `service_account_key`, `region` |
| 111 | +- **Lambda:** `api_key` |
| 112 | +- **HuggingFace:** `token`, `username` |
| 113 | + |
| 114 | +## Recommended Fixes |
| 115 | + |
| 116 | +### 1. Standardize Field Name Mapping |
| 117 | + |
| 118 | +Create a consistent mapping system between widget fields, ClusterConfig fields, and provider expectations: |
| 119 | + |
| 120 | +```python |
| 121 | +PROVIDER_FIELD_MAPPING = { |
| 122 | + "aws": { |
| 123 | + "aws_access_key": "access_key_id", |
| 124 | + "aws_secret_key": "secret_access_key", |
| 125 | + "aws_region": "region" |
| 126 | + }, |
| 127 | + "azure": { |
| 128 | + "azure_subscription_id": "subscription_id", |
| 129 | + "azure_client_id": "client_id", |
| 130 | + "azure_client_secret": "client_secret", |
| 131 | + "azure_region": "region" |
| 132 | + }, |
| 133 | + "gcp": { |
| 134 | + "gcp_project_id": "project_id", |
| 135 | + "gcp_service_account_key": "service_account_key", |
| 136 | + "gcp_region": "region" |
| 137 | + }, |
| 138 | + "lambda": { |
| 139 | + "lambda_api_key": "api_key" |
| 140 | + }, |
| 141 | + "huggingface": { |
| 142 | + "hf_token": "token", |
| 143 | + "hf_username": "username" |
| 144 | + } |
| 145 | +} |
| 146 | +``` |
| 147 | + |
| 148 | +### 2. Fix Test Connectivity Methods |
| 149 | + |
| 150 | +Update each `_test_*_connectivity()` method to: |
| 151 | +1. Use provided credentials from widget fields |
| 152 | +2. Map field names correctly to provider expectations |
| 153 | +3. Handle authentication errors properly |
| 154 | + |
| 155 | +### 3. Add Missing Required Fields |
| 156 | + |
| 157 | +- **Azure:** Add `azure_tenant_id` field to widget |
| 158 | +- **AWS:** Consider adding `aws_session_token` for temporary credentials |
| 159 | +- **All:** Add validation for required vs optional fields |
| 160 | + |
| 161 | +### 4. Implement Missing Test Methods |
| 162 | + |
| 163 | +- Add `_test_huggingface_connectivity()` method |
| 164 | +- Ensure all test methods are actually called from `_on_test_config()` |
| 165 | + |
| 166 | +### 5. Update Configuration Saving/Loading |
| 167 | + |
| 168 | +Ensure the `_save_config_from_widgets()` method properly maps field names when saving configurations. |
| 169 | + |
| 170 | +## Critical Issues |
| 171 | + |
| 172 | +1. **Security Risk:** Some test methods may fail silently, giving users false confidence in invalid configurations |
| 173 | +2. **User Experience:** Configuration that appears to save successfully may fail at runtime due to field name mismatches |
| 174 | +3. **Inconsistency:** Each provider handles field mapping differently, making the system unpredictable |
| 175 | + |
| 176 | +## Priority Recommendations |
| 177 | + |
| 178 | +1. **High Priority:** Fix AWS, Azure, and GCP test connectivity methods |
| 179 | +2. **Medium Priority:** Implement HuggingFace test connectivity |
| 180 | +3. **Medium Priority:** Add missing required fields (azure_tenant_id) |
| 181 | +4. **Low Priority:** Standardize field naming across all providers |
| 182 | + |
| 183 | +This audit reveals that the Lambda Cloud fix was just the tip of the iceberg - similar issues exist across all cloud providers and need systematic resolution. |
0 commit comments