|
2 | 2 |
|
3 | 3 | == Authentication
|
4 | 4 | Currently the only supported authentication mechanism is Kerberos, which is disabled by default.
|
5 |
| -For Kerberos to work a Kerberos KDC is needed, which the users needs to provide. |
| 5 | +For Kerberos to work a Kerberos KDC is needed, which the user needs to provide. |
6 | 6 | The xref:secret-operator:secretclass.adoc#backend-kerberoskeytab[secret-operator documentation] states which kind of Kerberos servers are supported and how they can be configured.
|
7 | 7 |
|
8 | 8 | IMPORTANT: Kerberos is supported starting from HDFS version 3.3.x
|
9 | 9 |
|
10 | 10 | === 1. Prepare Kerberos server
|
11 | 11 | To configure HDFS to use Kerberos you first need to collect information about your Kerberos server, e.g. hostname and port.
|
12 |
| -Additionally you need a service-user, which the secret-operator uses to create create principals for the HDFS services. |
| 12 | +Additionally you need a service-user, which the secret-operator uses to create principals for the HDFS services. |
13 | 13 |
|
14 | 14 | === 2. Create Kerberos SecretClass
|
15 | 15 | Afterwards you need to enter all the needed information into a SecretClass, as described in xref:secret-operator:secretclass.adoc#backend-kerberoskeytab[secret-operator documentation].
|
@@ -69,7 +69,9 @@ include::example$usage-guide/hdfs-regorules.yaml[]
|
69 | 69 | ----
|
70 | 70 |
|
71 | 71 | This rego rule is intended for demonstration purposes and allows every operation.
|
72 |
| -For a production setup you probably want to take a look at our integration tests for a more secure set of rego rules. |
| 72 | +For a production setup you will probably need to have something much more granular. |
| 73 | +We provide a more representative rego rule in our integration tests and in the aforementioned hdfs-utils repository. |
| 74 | +Details can be found below in the <<fine-granular-rego-rules, fine-granular rego rules>> section. |
73 | 75 | Reference the rego rule as follows in your HdfsCluster:
|
74 | 76 |
|
75 | 77 | [source,yaml]
|
@@ -109,6 +111,140 @@ The implication is thus that you cannot add users to the `superuser` group, whic
|
109 | 111 | We have decided that this is an acceptable approach as normal operations will not be affected.
|
110 | 112 | In case you really need users to be part of the `superusers` group, you can use a configOverride on `hadoop.user.group.static.mapping.overrides` for that.
|
111 | 113 |
|
| 114 | +[#fine-granular-rego-rules] |
| 115 | +=== Fine-granular rego rules |
| 116 | + |
| 117 | +The hdfs-utils repository contains a more production-ready rego-rule https://github.com/stackabletech/hdfs-utils/blob/main/rego/hdfs.rego[here]. |
| 118 | +With a few minor differences (e.g. Pod names) it is the same rego rule that is used in this https://github.com/stackabletech/hdfs-operator/blob/main/tests/templates/kuttl/kerberos/12-rego-rules.txt.j2[integration test]. |
| 119 | + |
| 120 | +Access is granted by looking at three bits of information that must be supplied for every rego-rule callout: |
| 121 | + |
| 122 | +* the *identity* of the user |
| 123 | +* the *resource* requested by the user |
| 124 | +* the *operation* which the user wants to perform on the resource |
| 125 | + |
| 126 | +Each operation has an implicit action-level attribute e.g. `create` requires at least read-write permissions. |
| 127 | +This action attribute is then checked against the permissions assigned to the user by an ACL and the operation is permitted if this check is fulfilled. |
| 128 | + |
| 129 | +The basic structure of this rego rule is shown below (you can refer to the full https://github.com/stackabletech/hdfs-utils/blob/main/rego/hdfs.rego[here]). |
| 130 | + |
| 131 | +.Rego rule outline |
| 132 | +[source] |
| 133 | +---- |
| 134 | +package hdfs |
| 135 | +
|
| 136 | +import rego.v1 |
| 137 | +
|
| 138 | +# Turn off access by default. |
| 139 | +default allow := false |
| 140 | +default matches_identity(identity) := false |
| 141 | +
|
| 142 | +# Check access in order of increasing specificity (i.e. identity first). |
| 143 | +# Deny access as "early" as possible. |
| 144 | +allow if { |
| 145 | + some acl in acls |
| 146 | + matches_identity(acl.identity) |
| 147 | + matches_resource(input.path, acl.resource) |
| 148 | + action_sufficient_for_operation(acl.action, input.operationName) |
| 149 | +} |
| 150 | +
|
| 151 | +# Identity checks based on e.g. |
| 152 | +# - explicit matches on the (long) userName or shortUsername |
| 153 | +# - regex matches |
| 154 | +# - the group membership (simple- or regex-matches on long-or short-username) |
| 155 | +matches_identity(identity) if { |
| 156 | + ... |
| 157 | +} |
| 158 | +
|
| 159 | +# Resource checks on e.g. |
| 160 | +# - explicit file- or directory-mentions |
| 161 | +# - inclusion of the file in recursively applied access rights |
| 162 | +matches_resource(file, resource) if { |
| 163 | + ... |
| 164 | +} |
| 165 | +
|
| 166 | +# Check the operation and its implicit action against an ACL |
| 167 | +action_sufficient_for_operation(action, operation) if { |
| 168 | + action_hierarchy[action][_] == action_for_operation[operation] |
| 169 | +} |
| 170 | +
|
| 171 | +action_hierarchy := { |
| 172 | + "full": ["full", "rw", "ro"], |
| 173 | + "rw": ["rw", "ro"], |
| 174 | + "ro": ["ro"], |
| 175 | +} |
| 176 | +
|
| 177 | +
|
| 178 | +# This should contain a list of all HDFS actions relevant to the application |
| 179 | +action_for_operation := { |
| 180 | + "abandonBlock": "rw", |
| 181 | + ... |
| 182 | +} |
| 183 | +
|
| 184 | +acls := [ |
| 185 | + { |
| 186 | + "identity": "group:admins", |
| 187 | + "action": "full", |
| 188 | + "resource": "hdfs:dir:/", |
| 189 | + }, |
| 190 | + ... |
| 191 | +] |
| 192 | +---- |
| 193 | + |
| 194 | +The full file in the hdfs-utils repository contains extra documentary information such as a https://github.com/stackabletech/hdfs-utils/blob/main/rego/hdfs.rego#L186-L204[listing] of HDFS actions that would not typically be subject to an ACL. |
| 195 | +In hdfs-utils there is also a https://github.com/stackabletech/hdfs-utils/blob/main/rego/hdfs_test.rego[test file] to verify the rules, where different asserts are applied to the rules. |
| 196 | +Take the test case below as an example: |
| 197 | + |
| 198 | +[source] |
| 199 | +---- |
| 200 | +test_admin_access_to_developers if { |
| 201 | + allow with input as { |
| 202 | + "callerUgi": { |
| 203 | + "shortUserName": "admin", |
| 204 | + "userName": "admin/test-hdfs-permissions.default.svc.cluster.local@CLUSTER.LOCAL", |
| 205 | + }, |
| 206 | + "path": "/developers/file", |
| 207 | + "operationName": "create", |
| 208 | + } |
| 209 | +} |
| 210 | +---- |
| 211 | + |
| 212 | +This test passes through the following steps: |
| 213 | + |
| 214 | +==== 1. Does the user or group exist in the ACL? |
| 215 | + |
| 216 | +Yes, a match is found on userName via the corresponding group (`admins`, yielded by the mapping `groups_for_user`). |
| 217 | + |
| 218 | +==== 2. Does this user/group have permission to fulfill the specified operation on the given path? |
| 219 | + |
| 220 | +Yes, as this ACL item |
| 221 | + |
| 222 | +[source] |
| 223 | +---- |
| 224 | +{ |
| 225 | + "identity": "group:admins", |
| 226 | + "action": "full", |
| 227 | + "resource": "hdfs:dir:/", |
| 228 | +}, |
| 229 | +---- |
| 230 | + |
| 231 | +matches the resource on |
| 232 | + |
| 233 | +[source] |
| 234 | +---- |
| 235 | +# Resource mentions a folder higher up the tree, which will will grant access recursively |
| 236 | +matches_resource(file, resource) if { |
| 237 | + startswith(resource, "hdfs:dir:/") |
| 238 | + # directories need to have a trailing slash |
| 239 | + endswith(resource, "/") |
| 240 | + startswith(file, trim_prefix(resource, "hdfs:dir:")) |
| 241 | +} |
| 242 | +---- |
| 243 | + |
| 244 | +and the action permission required for the operation `create` (`rw`) is a subset of the ACL grant (`full`). |
| 245 | + |
| 246 | +NOTE: The various checks for `matches_identity` and `matches_resource` are generic, given that the internal list of HDFS actions is comprehensive and the `input` structure is an internal implementation. This means that only the ACL needs to be adapted to specific customer needs. |
| 247 | + |
112 | 248 | == Wire encryption
|
113 | 249 | In case Kerberos is enabled, `Privacy` mode is used for best security.
|
114 | 250 | Wire encryption without Kerberos as well as https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SecureMode.html#Data_confidentiality[other wire encryption modes] are *not* supported.
|
0 commit comments