Skip to content

Commit 2e5cfd7

Browse files
nightkrmaltesander
andauthored
Document AD principal conflicts (#408)
* Document AD isolation requirements See #406 * Add AD conflict troubleshooting guide * Link to the troubleshooting guide on LDAP user conflicts * Detect UPN conflicts as user conflicts * Fix link label * Update docs/modules/secret-operator/pages/troubleshooting.adoc Co-authored-by: Malte Sander <malte.sander.it@gmail.com> --------- Co-authored-by: Malte Sander <malte.sander.it@gmail.com>
1 parent 5af1414 commit 2e5cfd7

File tree

3 files changed

+57
-2
lines changed

3 files changed

+57
-2
lines changed

docs/modules/secret-operator/pages/secretclass.adoc

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,21 @@ Principals will be created dynamically if they do not already exist.
107107

108108
The administrator keytab must have permission to add principals and get their keys. This corresponds to the flags `ae` in `kadm5.acl`.
109109

110+
==== Active Directory
111+
112+
[#ad-principal-conflicts]
113+
===== Principal Conflicts
114+
115+
We recommend that each Active Directory domain should only be used by a single Kubernetes cluster.
116+
117+
This is because each pod, service, and node may be provisioned a principal matching its hostname, and principal names must be unique within a single AD domain.
118+
The Stackable Secret Operator will cache and reuse these credentials within a single Kubernetes cluster, but will not share them across multiple clusters.
119+
120+
If the same AD domain _is_ shared between multiple Kubernetes clusters, the following _must_ be unique across the AD domain:
121+
122+
- The Kubernetes Nodes' names and fully qualified domain names
123+
- The Kubernetes Namespaces' names (only Namespaces that use Kerberos)
124+
110125
==== Reference
111126

112127
[source,yaml]

docs/modules/secret-operator/pages/troubleshooting.adoc

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,10 @@
11
= Troubleshooting
22

3-
== My secret-consuming Pods get stuck `Pending`!
3+
[#general]
4+
== General
5+
6+
[#pod-stuck-pending]
7+
=== My secret-consuming Pods get stuck `Pending`!
48

59
. Does the Pod have any events relating to scheduling? (`kubectl describe pod/$POD_NAME`)
610
. Is the PersistentVolumeClaim being created? It should have the name `$POD_NAME-$VOLUME_NAME`.
@@ -16,3 +20,23 @@
1620
. Does the secret-operator sidecar container named `node-driver-registrar` have any relevant log entries?
1721
. Does the kubelet have any relevant log entries?
1822
. When running on OpenShift also have a look at the xref:openshift.adoc[OpenShift documentation].
23+
24+
[#active-directory]
25+
== Active Directory
26+
27+
[#active-directory-ldap-user-conflict]
28+
=== LDAP user already exists
29+
30+
The Stackable Secret Operator maintains a cache of Active Directory user credentials. This error occurs when a required user is missing from the cache but exists in AD.
31+
32+
This can be caused by a few different root issues:
33+
34+
1. A race condition where multiple Pods require the same identity at the same time, leading to a stale cache.
35+
This is transient, and should resolve itself within a few seconds as the cache is updated and any failed attempts are retried.
36+
2. Trying to reuse a single AD domain across multiple Kubernetes clusters.
37+
This takes care to do safely, please see xref:secretclass.adoc#ad-principal-conflicts[Principal Conflicts] for more information.
38+
3. Deleting a user from the cache but not from AD (including deleting the whole cache).
39+
When deleting users from the cache, make sure to also delete the corresponding AD users. The Secret Operator should then automatically recreate them.
40+
4. The to-be-created principal conlicts with an existing unrelated principal.
41+
This has to be resolved manually by an administrator.
42+

rust/krb5-provision-keytab/src/active_directory.rs

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,8 @@ pub enum Error {
6060
CreateLdapUser { source: ldap3::LdapError },
6161

6262
#[snafu(display(
63-
"LDAP user already exists, either delete it manually or add it to the password cache ({password_cache_ref})"
63+
"LDAP user already exists but is missing from the password cache ({password_cache_ref}) (hint: see {link})",
64+
link = "https://docs.stackable.tech/home/nightly/secret-operator/troubleshooting.html#active-directory-ldap-user-conflict"
6465
))]
6566
CreateLdapUserConflict {
6667
source: ldap3::LdapError,
@@ -76,8 +77,14 @@ pub enum Error {
7677
pub type Result<T, E = Error> = std::result::Result<T, E>;
7778

7879
// Result codes are defined by https://www.rfc-editor.org/rfc/rfc4511#appendix-A.1
80+
const LDAP_RESULT_CODE_CONSTRAINT_VIOLATION: u32 = 19;
7981
const LDAP_RESULT_CODE_ENTRY_ALREADY_EXISTS: u32 = 68;
8082

83+
// Error codes from https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/manage/component-updates/spn-and-upn-uniqueness#symptoms.
84+
// Rendered in LDAP error messages as 8 zero-padded hex digits.
85+
// BEST-EFFORT ONLY. THE SPECIFIC FORMAT IS NOT DOCUMENTED.
86+
const AD_CONSTRAINT_PREFIX_UPN_VALUE_NOT_UNIQUE: &str = "000021C8:";
87+
8188
pub struct AdAdmin<'a> {
8289
ldap: Ldap,
8390
krb: &'a KrbContext,
@@ -293,6 +300,15 @@ async fn create_ad_user(
293300
LDAP_RESULT_CODE_ENTRY_ALREADY_EXISTS => create_user_result
294301
.success()
295302
.context(CreateLdapUserConflictSnafu { password_cache_ref })?,
303+
LDAP_RESULT_CODE_CONSTRAINT_VIOLATION
304+
if create_user_result
305+
.text
306+
.starts_with(AD_CONSTRAINT_PREFIX_UPN_VALUE_NOT_UNIQUE) =>
307+
{
308+
create_user_result
309+
.success()
310+
.context(CreateLdapUserConflictSnafu { password_cache_ref })?
311+
}
296312
_ => create_user_result.success().context(CreateLdapUserSnafu)?,
297313
};
298314
Ok(())

0 commit comments

Comments
 (0)