"all solution for linux" #4

ghost · 2022-12-14T20:35:29Z

ghost
Dec 14, 2022

Hi everyone.

Idea

I'm thinking of using orbit-db as the database for linuxhw/Smart.
I would like any linux distro to share compatible hardware information, what I say in this topic could solve the problem of compatible hardware with linux

Faq

what is linuxhw/Smart?

This is a project to estimate reliability of desktop-class HDD/SSD drives by the analysis of SMART data collected by Linux users at https://linux-hardware.org/. The primary aim of the project is to find drives with longest power-on hours (POH) and minimal number of errors, i.e. maximal MTBF (mean time between failures).

what is orbit-db?

Peer-to-Peer Databases for the Decentralized Web

why orbit-db and linuxhw/Smart?

You could distribute hardware information to different peers in a decentralized and distributed network. It would not need a vendor or central server. This facilitates access to information and data sharing privately, anonymously or publicly without intermediaries in the way.
"AFAIK orbit-db is not that much permanent (aka would lose all the data if there is no peer remaining)" - this could be a problem for a bugtracker. In my view linuxhw/Smart is a kind of bugtracker, where hardware information is distributed, decentralized. So... using orbit-db in this sense could be an initial problem. But.. there are things like git-bug that can be used as an alternative or complement to orbit-db.
Another interesting option is to use solid-project or IPFS, imagine that each hardware manufacturer has knowledge of their devices created. So, the user could say that the information of their devices is public or private.
"This is a project to estimate the reliability of desktop class HDD/SSD drives by analyzing SMART data collected by Linux users" - the project linuxhw/Smart could benefit from git-bug, IPFS, orbit-db or solid-project

what is solid-project?

Solid is a specification that lets people store their data securely in decentralized data stores called Pods. Pods are like secure personal web servers for data. When data is stored in someone's Pod, they control which people and applications can access it.
Solid-project could be an alternative to orbit-db and git-bug, or it could be a complement.

question/feedback

What do you all think of this idea?
Is it interesting to have a decentralized and distributed database to share hardware information in linux?
What do you all think of the idea of using protocols like solid-project to share hardware information in linux?
What do you all think of the idea of using protocols like IPFS to share hardware information in linux?
have you ever thought of adding git-bug as a cool project to share hardware info on linux?

Answered by KOLANICH

Dec 14, 2022

codehangen, I have done a research on the topic and tried to train XGBoost models (survival regression, also tried plugging Weibull there). The results were that only had effect on mortality. If > means "expected lifetime is longer", then Hitachi > HGST > Toshiba > WD > Seagate, and everything else used to be insignificant. Even the number of platters had no effect. Even the year of model (among the studied models, the models studied were limited by the ones present in Backblaze dataset) had no effect. I guess a better model can be created, but it would require a lot of data cleansing (datag should be finished first), in order to train an embedding to try to use it as features instead of …

View full answer

KOLANICH · 2022-12-14T21:47:40Z

KOLANICH
Dec 14, 2022

codehangen, I have done a research on the topic and tried to train XGBoost models (survival regression, also tried plugging Weibull there). The results were that only had effect on mortality. If > means "expected lifetime is longer", then Hitachi > HGST > Toshiba > WD > Seagate, and everything else used to be insignificant. Even the number of platters had no effect. Even the year of model (among the studied models, the models studied were limited by the ones present in Backblaze dataset) had no effect. I guess a better model can be created, but it would require a lot of data cleansing (datag should be finished first), in order to train an embedding to try to use it as features instead of direct features of drives.

You could distribute hardware information to different peers in a decentralized and distributed network. It would not need a vendor or central server.

It is completely unsuitable. We cannot trust crowds for this data. Disk vendors will just inject misinformation, so the data would become worse than just bullshit - if someone had done a reserch based on these data and then published it, people can get false impression about reliability of drives, and when one would say "Vendor's N drives are shit, don't buy them" some can say "but according to the study ... that vendor is the best" and give the link, but in the abstracts it won't be written "the data was collected from untrusted source and it is likely they were manipulated by vendors".

So, no, crowdsourcing this is an extremily bad idea. And collecting this in a centralized way by the parties who are interested in improved reliability of disks is done for a long time. If you need data for a research, just use the ones by Backblaze.

What is needed is parsing the cosole outputs of smartctl present in this repo into machine-readable feature vectors I can put into my machine learning pipeline.

Also what is needed, is improving my datag library in order to make it capable to build fully automatic datasets cleansing pipelines.

Also you may find https://github.com/KOLANICH-ML/HDDModelDecoder.py and https://github.com/KOLANICH-ML/backblaze_analytics useful.

2 replies

ghost Dec 14, 2022

Hi KOLANICH, thank you for feedback.

ghost Feb 19, 2023

Hi KOLANICH, great feedback!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"all solution for linux" #4

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

"all solution for linux" #4

Uh oh!

Uh oh!

ghost Dec 14, 2022

Idea

Faq

what is linuxhw/Smart?

what is orbit-db?

why orbit-db and linuxhw/Smart?

what is solid-project?

question/feedback

Replies: 1 comment · 2 replies

Uh oh!

KOLANICH Dec 14, 2022

Uh oh!

ghost Dec 14, 2022

Uh oh!

ghost Feb 19, 2023

ghost
Dec 14, 2022

Replies: 1 comment 2 replies

KOLANICH
Dec 14, 2022