-
Hi everyone. Idea
Faqwhat is linuxhw/Smart?This is a project to estimate reliability of desktop-class HDD/SSD drives by the analysis of SMART data collected by Linux users at https://linux-hardware.org/. The primary aim of the project is to find drives with longest power-on hours (POH) and minimal number of errors, i.e. maximal MTBF (mean time between failures). what is orbit-db?Peer-to-Peer Databases for the Decentralized Web why orbit-db and linuxhw/Smart?
what is solid-project?
question/feedback
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
codehangen, I have done a research on the topic and tried to train XGBoost models (survival regression, also tried plugging Weibull there). The results were that only had effect on mortality. If
It is completely unsuitable. We cannot trust crowds for this data. Disk vendors will just inject misinformation, so the data would become worse than just bullshit - if someone had done a reserch based on these data and then published it, people can get false impression about reliability of drives, and when one would say "Vendor's N drives are shit, don't buy them" some can say "but according to the study ... that vendor is the best" and give the link, but in the abstracts it won't be written "the data was collected from untrusted source and it is likely they were manipulated by vendors". So, no, crowdsourcing this is an extremily bad idea. And collecting this in a centralized way by the parties who are interested in improved reliability of disks is done for a long time. If you need data for a research, just use the ones by Backblaze. What is needed is parsing the cosole outputs of Also what is needed, is improving my Also you may find https://github.com/KOLANICH-ML/HDDModelDecoder.py and https://github.com/KOLANICH-ML/backblaze_analytics useful. |
Beta Was this translation helpful? Give feedback.
codehangen, I have done a research on the topic and tried to train XGBoost models (survival regression, also tried plugging Weibull there). The results were that only had effect on mortality. If
>
means "expected lifetime is longer", thenHitachi > HGST > Toshiba > WD > Seagate
, and everything else used to be insignificant. Even the number of platters had no effect. Even the year of model (among the studied models, the models studied were limited by the ones present in Backblaze dataset) had no effect. I guess a better model can be created, but it would require a lot of data cleansing (datag
should be finished first), in order to train an embedding to try to use it as features instead of …