-
-
Notifications
You must be signed in to change notification settings - Fork 7
Rolling HDFS upgrade #571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Rolling HDFS upgrade #571
Changes from 28 commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
a5e9547
Add upgrade mode with serialized deployments
nightkr fc6cc0d
Use deployedProductVersion to decide upgrade mode (but do not automat…
nightkr 2eb38a8
Upgrade docs
nightkr 38809e2
Remove dummy log message
nightkr a36de0f
Move upgrade readiness check into utils module
nightkr acffa82
Fix test build issue
nightkr 98baaad
Regenerate CRDs
nightkr 5a552d3
Docs
nightkr c1e13a2
s/terminal/shell/g
nightkr e1476a2
Update rust/operator-binary/src/hdfs_controller.rs
nightkr 8af1db6
Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc
nightkr 44b5e59
Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc
nightkr 947931e
Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc
nightkr 5970585
Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc
nightkr 13129b5
Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc
nightkr eb19010
Move upgrade_args to a separate variable
nightkr d5a092a
Merge branch 'feature/upgrade' of github.com:stackabletech/hdfs-opera…
nightkr f0df2b7
Upgrade mode -> compatibility mode
nightkr 49cf9d9
Move rollout tracker into operator-rs
nightkr c582a3a
Update docs/modules/hdfs/pages/usage-guide/upgrading.adoc
nightkr b24c25f
Add note on downgrades
nightkr 1e68f1d
Merge branch 'feature/upgrade' of github.com:stackabletech/hdfs-opera…
nightkr 10e5220
Perform downgrades in order
nightkr 808f926
Add note about status subresource
nightkr a9809ba
Update CRDs
nightkr 0604aa6
s/upgrading_product_version/upgrade_target_product_version/g
nightkr c142421
Switch to main operator-rs
nightkr 6ae8e0b
Update rust/crd/src/lib.rs
nightkr 46eedee
Merge branch 'main' into feature/upgrade
nightkr 2a25ff4
Add guardrail against trying to crossgrade in the middle of another u…
nightkr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
= Upgrading HDFS | ||
|
||
IMPORTANT: HDFS upgrades are experimental, and details may change at any time | ||
|
||
HDFS currently requires a manual process to upgrade. This guide will take you through an example case, upgrading an example cluster (from our xref:getting_started/index.adoc[Getting Started] guide) from HDFS 3.3.6 to 3.4.0. | ||
NickLarsenNZ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
== Preparing for the worst | ||
|
||
Upgrades can fail, and it is important to prepare for when that happens. Apache HDFS supports https://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Downgrade_and_Rollback[two ways to revert an upgrade]: | ||
|
||
Rollback:: Reverts all user data to the pre-upgrade state. Requires taking the cluster offline. | ||
Downgrade:: Downgrades the HDFS software but preserves all changes made by users. Can be performed as a rolling change, keeping the cluster online. | ||
|
||
The Stackable Operator for HDFS supports downgrading but not rollbacks. | ||
|
||
In order to downgrade, revert the `.spec.image.productVersion` field, and then proceed to xref:#finalize[finalizing] once the cluster is downgraded: | ||
|
||
[source,shell] | ||
---- | ||
$ kubectl patch hdfs/simple-hdfs --patch '{"spec": {"image": {"productVersion": "3.3.6"}}}' --type=merge | ||
hdfscluster.hdfs.stackable.tech/simple-hdfs patched | ||
---- | ||
|
||
== Preparing HDFS | ||
|
||
HDFS must be configured to initiate the upgrade process. To do this, put the cluster into upgrade mode by running the following commands in an HDFS superuser environment | ||
(either a client configured with a superuser account, or from inside a NameNode pod): | ||
|
||
// This could be automated by the operator, but dfsadmin does not have good machine-readable output. | ||
// It *can* be queried over JMX, but we're not so lucky for finalization. | ||
NickLarsenNZ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
[source,shell] | ||
---- | ||
$ hdfs dfsadmin -rollingUpgrade prepare | ||
PREPARE rolling upgrade ... | ||
Preparing for upgrade. Data is being saved for rollback. | ||
Run "dfsadmin -rollingUpgrade query" to check the status | ||
for proceeding with rolling upgrade | ||
Block Pool ID: BP-841432641-10.244.0.29-1722612757853 | ||
Start Time: Fri Aug 02 15:49:12 GMT 2024 (=1722613752341) | ||
Finalize Time: <NOT FINALIZED> | ||
|
||
$ # Then run query until the HDFS is ready to proceed | ||
$ hdfs dfsadmin -rollingUpgrade query | ||
QUERY rolling upgrade ... | ||
Preparing for upgrade. Data is being saved for rollback. | ||
Run "dfsadmin -rollingUpgrade query" to check the status | ||
for proceeding with rolling upgrade | ||
Block Pool ID: BP-841432641-10.244.0.29-1722612757853 | ||
Start Time: Fri Aug 02 15:49:12 GMT 2024 (=1722613752341) | ||
Finalize Time: <NOT FINALIZED> | ||
|
||
$ # It is safe to proceed when the output indicates so, like this: | ||
$ hdfs dfsadmin -rollingUpgrade query | ||
QUERY rolling upgrade ... | ||
Proceed with rolling upgrade: | ||
Block Pool ID: BP-841432641-10.244.0.29-1722612757853 | ||
Start Time: Fri Aug 02 15:49:12 GMT 2024 (=1722613752341) | ||
Finalize Time: <NOT FINALIZED> | ||
---- | ||
|
||
== Starting the upgrade | ||
|
||
Once HDFS is ready to upgrade, the HdfsCluster can be updated with the new product version: | ||
|
||
[source,shell] | ||
---- | ||
$ kubectl patch hdfs/simple-hdfs --patch '{"spec": {"image": {"productVersion": "3.4.0"}}}' --type=merge | ||
hdfscluster.hdfs.stackable.tech/simple-hdfs patched | ||
---- | ||
|
||
Then wait until all pods have restarted, are in the Ready state, and running the new HDFS version. | ||
|
||
NOTE: This will automatically enable the NameNodes' compatibility mode, allowing them to start despite the fsImage version mismatch. | ||
|
||
NOTE: Services will be upgraded in order: JournalNodes, then NameNodes, then DataNodes. | ||
|
||
[#finalize] | ||
== Finalizing the upgrade | ||
|
||
Once all HDFS pods are running the new version, the HDFS upgrade can be finalized (from the HDFS superuser environment as described in the preparation step): | ||
|
||
[source,shell] | ||
---- | ||
$ hdfs dfsadmin -rollingUpgrade finalize | ||
FINALIZE rolling upgrade ... | ||
Rolling upgrade is finalized. | ||
Block Pool ID: BP-841432641-10.244.0.29-1722612757853 | ||
Start Time: Fri Aug 02 15:49:12 GMT 2024 (=1722613752341) | ||
Finalize Time: Fri Aug 02 15:58:39 GMT 2024 (=1722614319854) | ||
---- | ||
|
||
// We can't safely automate this, because finalize is asynchronous and doesn't tell us whether all NameNodes have even received the request to finalize. | ||
NickLarsenNZ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
WARNING: Please ensure that all NameNodes are running and available before proceeding. NameNodes that have not finalized yet will crash on launch when taken out of compatibility mode. | ||
|
||
Finally, mark the cluster as upgraded: | ||
|
||
[source,shell] | ||
---- | ||
$ kubectl patch hdfs/simple-hdfs --subresource=status --patch '{"status": {"deployedProductVersion": "3.4.0"}}' --type=merge | ||
hdfscluster.hdfs.stackable.tech/simple-hdfs patched | ||
---- | ||
|
||
NOTE: `deployedProductVersion` is located in the _status_ subresource, which will not be modified by most graphical editors, and `kubectl` requires the `--subresource=status` flag. | ||
|
||
The NameNodes will then be restarted a final time, taking them out of compatibility mode. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.