Skip to content

Commit 6d206b1

Browse files
idoschdavem330
authored andcommitted
mlxsw: core_thermal: Fix fan speed in maximum cooling state
The cooling levels array is supposed to prevent the system fans from being configured below a 20% duty cycle as otherwise some of them get stuck at 0 RPM. Due to an off-by-one error, the last element in the array was not initialized, causing it to be set to zero, which in turn lead to fans being configured with a 0% duty cycle in maximum cooling state. Since commit 332fdf9 ("mlxsw: thermal: Fix out-of-bounds memory accesses") the contents of the array are static. Therefore, instead of fixing the initialization of the array, simply remove it and adjust thermal_cooling_device_ops::set_cur_state() so that the configured duty cycle is never set below 20%. Before: # cat /sys/class/thermal/thermal_zone0/cdev0/type mlxsw_fan # echo 10 > /sys/class/thermal/thermal_zone0/cdev0/cur_state # cat /sys/class/hwmon/hwmon0/name mlxsw # cat /sys/class/hwmon/hwmon0/pwm1 0 After: # cat /sys/class/thermal/thermal_zone0/cdev0/type mlxsw_fan # echo 10 > /sys/class/thermal/thermal_zone0/cdev0/cur_state # cat /sys/class/hwmon/hwmon0/name mlxsw # cat /sys/class/hwmon/hwmon0/pwm1 255 This bug was uncovered when the thermal subsystem repeatedly tried to configure the cooling devices to their maximum state due to another issue [1]. This resulted in the fans being stuck at 0 RPM, which eventually lead to the system undergoing thermal shutdown. [1] https://lore.kernel.org/netdev/ZA3CFNhU4AbtsP4G@shredder/ Fixes: a421ce0 ("mlxsw: core: Extend cooling device with cooling levels") Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Vadim Pasternak <vadimp@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 04361b8 commit 6d206b1

File tree

1 file changed

+1
-6
lines changed

1 file changed

+1
-6
lines changed

drivers/net/ethernet/mellanox/mlxsw/core_thermal.c

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,6 @@ struct mlxsw_thermal {
105105
struct thermal_zone_device *tzdev;
106106
int polling_delay;
107107
struct thermal_cooling_device *cdevs[MLXSW_MFCR_PWMS_MAX];
108-
u8 cooling_levels[MLXSW_THERMAL_MAX_STATE + 1];
109108
struct thermal_trip trips[MLXSW_THERMAL_NUM_TRIPS];
110109
struct mlxsw_cooling_states cooling_states[MLXSW_THERMAL_NUM_TRIPS];
111110
struct mlxsw_thermal_area line_cards[];
@@ -468,7 +467,7 @@ static int mlxsw_thermal_set_cur_state(struct thermal_cooling_device *cdev,
468467
return idx;
469468

470469
/* Normalize the state to the valid speed range. */
471-
state = thermal->cooling_levels[state];
470+
state = max_t(unsigned long, MLXSW_THERMAL_MIN_STATE, state);
472471
mlxsw_reg_mfsc_pack(mfsc_pl, idx, mlxsw_state_to_duty(state));
473472
err = mlxsw_reg_write(thermal->core, MLXSW_REG(mfsc), mfsc_pl);
474473
if (err) {
@@ -859,10 +858,6 @@ int mlxsw_thermal_init(struct mlxsw_core *core,
859858
}
860859
}
861860

862-
/* Initialize cooling levels per PWM state. */
863-
for (i = 0; i < MLXSW_THERMAL_MAX_STATE; i++)
864-
thermal->cooling_levels[i] = max(MLXSW_THERMAL_MIN_STATE, i);
865-
866861
thermal->polling_delay = bus_info->low_frequency ?
867862
MLXSW_THERMAL_SLOW_POLL_INT :
868863
MLXSW_THERMAL_POLL_INT;

0 commit comments

Comments
 (0)