/mnt/smart
is filling up QUICK! Clean up!
#143
Replies: 3 comments 2 replies
-
98% people! |
Beta Was this translation helpful? Give feedback.
-
🚨 Urgent –
|
What you might notice | Why it happens |
---|---|
Jobs stop, hang, or crash. | Lustre needs “scratch room” to record every change. With zero room, it can’t finish even tiny writes. |
Even simple reads become slow. | The system keeps trying (and failing) to free space. |
Possible downtime for repairs. | If Lustre can’t update its own “index”, we might have to take it offline eventually. |
Think of Lustre like a giant shared whiteboard: when every square is filled, nobody can add anything until space is cleared.
How Lustre differs from our HDisk (/mnt/cold02
):
Lustre (/mnt/smart ) |
HDisk (/mnt/cold02 ) |
|
---|---|---|
Purpose | Fast, temporary workspace for running jobs. | Slower, long-term storage for completed results or files. |
When full | Cluster-wide freeze of reads and writes. | Shared slowdown, but jobs can keep running. |
Ideal Free-space margin | Keep ≥ 15 % free (max. 85% - we are 100%!!). | Keep ≥ 10 % free (max. 90%). |
Both are shared, but Lustre is much more sensitive because it must constantly update its internal table of contents while jobs stream data.
What to do right now ✅
-
Check your space
lfs quota -u $USER /mnt/smart # usage on Lustre df -h /mnt/cold02 # quick look at HDisk
-
Delete or move anything you don’t need
-
Old outputs, temp files, logs, duplicates.
-
Move large, rarely-used files to
/mnt/cold02
or your own personal storage disks:rsync -aP /path/to/my_results /path/to/destination
(⚠️ remember to always have an external back-up of your data!) -
Compress bulky folders:
tar -czf my_results.tar.gz my_results/
What is being done ⌛
From the e-mail received by everyone in the smart-users
list yesterday, and roughly translated to English by me:
Hello,
As shown on the Migration tab of the storage dashboard (https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vSrkhbptpqiLsZXOrGP6vqhSDKvuwYT77YIu01q-aCAJhpXTyatiTT2AiIsfcYjH_LZg3t4Hd1spyUS/pubhtml?pli=1#), we are currently making a full backup of the /users directory, which was outside the intended scratch area.
Transfer speed is quite slow because COLD02 is not yet connected to the high-speed private network (we will address this soon). For now, we must copy the data over the public network, with the usual bandwidth limits. The average rate is approximately 2.5 hours per terabyte, so we estimate the backup will be completed before Friday, July 17th.
If everything looks good, on Monday, July 21st, we will delete /mnt/smart/users, freeing roughly 100 TB on Lustre. That space will then be redistributed to groups according to their quotas.
Beta Was this translation helpful? Give feedback.
-
Thanks for the info, Fran. How reliable is the connection to |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Please, check whether you can remove things there. This is serious!
Beta Was this translation helpful? Give feedback.
All reactions