Skip to content

Document AD principal conflicts #408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions docs/modules/secret-operator/pages/secretclass.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,21 @@ Principals will be created dynamically if they do not already exist.

The administrator keytab must have permission to add principals and get their keys. This corresponds to the flags `ae` in `kadm5.acl`.

==== Active Directory

[#ad-principal-conflicts]
===== Principal Conflicts

We recommend that each Active Directory domain should only be used by a single Kubernetes cluster.

This is because each pod, service, and node may be provisioned a principal matching its hostname, and principal names must be unique within a single AD domain.
The Stackable Secret Operator will cache and reuse these credentials within a single Kubernetes cluster, but will not share them across multiple clusters.

If the same AD domain _is_ shared between multiple Kubernetes clusters, the following _must_ be unique across the AD domain:

- The Kubernetes Nodes' names and fully qualified domain names
- The Kubernetes Namespaces' names (only Namespaces that use Kerberos)

==== Reference

[source,yaml]
Expand Down
26 changes: 25 additions & 1 deletion docs/modules/secret-operator/pages/troubleshooting.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
= Troubleshooting

== My secret-consuming Pods get stuck `Pending`!
[#general]
== General

[#pod-stuck-pending]
=== My secret-consuming Pods get stuck `Pending`!

. Does the Pod have any events relating to scheduling? (`kubectl describe pod/$POD_NAME`)
. Is the PersistentVolumeClaim being created? It should have the name `$POD_NAME-$VOLUME_NAME`.
Expand All @@ -16,3 +20,23 @@
. Does the secret-operator sidecar container named `node-driver-registrar` have any relevant log entries?
. Does the kubelet have any relevant log entries?
. When running on OpenShift also have a look at the xref:openshift.adoc[OpenShift documentation].

[#active-directory]
== Active Directory

[#active-directory-ldap-user-conflict]
=== LDAP user already exists

The Stackable Secret Operator maintains a cache of Active Directory user credentials. This error occurs when a required user is missing from the cache but exists in AD.

This can be caused by a few different root issues:

1. A race condition where multiple Pods require the same identity at the same time, leading to a stale cache.
This is transient, and should resolve itself within a few seconds as the cache is updated and any failed attempts are retried.

Check notice on line 35 in docs/modules/secret-operator/pages/troubleshooting.adoc

View workflow job for this annotation

GitHub Actions / LanguageTool

[LanguageTool] docs/modules/secret-operator/pages/troubleshooting.adoc#L35

Use a comma before “and” if it connects two independent clauses (unless they are closely connected and short). (COMMA_COMPOUND_SENTENCE_2[4]) Suggestions: `, and` URL: https://languagetool.org/insights/post/types-of-sentences/#compound-sentence Rule: https://community.languagetool.org/rule/show/COMMA_COMPOUND_SENTENCE_2?lang=en-US&subId=4 Category: PUNCTUATION
Raw output
docs/modules/secret-operator/pages/troubleshooting.adoc:35:92: Use a comma before “and” if it connects two independent clauses (unless they are closely connected and short). (COMMA_COMPOUND_SENTENCE_2[4])
 Suggestions: `, and`
 URL: https://languagetool.org/insights/post/types-of-sentences/#compound-sentence 
 Rule: https://community.languagetool.org/rule/show/COMMA_COMPOUND_SENTENCE_2?lang=en-US&subId=4
 Category: PUNCTUATION
2. Trying to reuse a single AD domain across multiple Kubernetes clusters.
This takes care to do safely, please see xref:secretclass.adoc#ad-principal-conflicts[Principal Conflicts] for more information.
3. Deleting a user from the cache but not from AD (including deleting the whole cache).
When deleting users from the cache, make sure to also delete the corresponding AD users. The Secret Operator should then automatically recreate them.
4. The to-be-created principal conlicts with an existing unrelated principal.
This has to be resolved manually by an administrator.

18 changes: 17 additions & 1 deletion rust/krb5-provision-keytab/src/active_directory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,8 @@ pub enum Error {
CreateLdapUser { source: ldap3::LdapError },

#[snafu(display(
"LDAP user already exists, either delete it manually or add it to the password cache ({password_cache_ref})"
"LDAP user already exists but is missing from the password cache ({password_cache_ref}) (hint: see {link})",
link = "https://docs.stackable.tech/home/nightly/secret-operator/troubleshooting.html#active-directory-ldap-user-conflict"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

))]
CreateLdapUserConflict {
source: ldap3::LdapError,
Expand All @@ -76,8 +77,14 @@ pub enum Error {
pub type Result<T, E = Error> = std::result::Result<T, E>;

// Result codes are defined by https://www.rfc-editor.org/rfc/rfc4511#appendix-A.1
const LDAP_RESULT_CODE_CONSTRAINT_VIOLATION: u32 = 19;
const LDAP_RESULT_CODE_ENTRY_ALREADY_EXISTS: u32 = 68;

// Error codes from https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/manage/component-updates/spn-and-upn-uniqueness#symptoms.
// Rendered in LDAP error messages as 8 zero-padded hex digits.
// BEST-EFFORT ONLY. THE SPECIFIC FORMAT IS NOT DOCUMENTED.
const AD_CONSTRAINT_PREFIX_UPN_VALUE_NOT_UNIQUE: &str = "000021C8:";

pub struct AdAdmin<'a> {
ldap: Ldap,
krb: &'a KrbContext,
Expand Down Expand Up @@ -293,6 +300,15 @@ async fn create_ad_user(
LDAP_RESULT_CODE_ENTRY_ALREADY_EXISTS => create_user_result
.success()
.context(CreateLdapUserConflictSnafu { password_cache_ref })?,
LDAP_RESULT_CODE_CONSTRAINT_VIOLATION
if create_user_result
.text
.starts_with(AD_CONSTRAINT_PREFIX_UPN_VALUE_NOT_UNIQUE) =>
{
create_user_result
.success()
.context(CreateLdapUserConflictSnafu { password_cache_ref })?
}
_ => create_user_result.success().context(CreateLdapUserSnafu)?,
};
Ok(())
Expand Down
Loading