Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU Temps not showing for AMD Ryzen Threadripper 3970X. #1484

Closed
PetroBHayes opened this issue May 12, 2021 · 24 comments
Closed

CPU Temps not showing for AMD Ryzen Threadripper 3970X. #1484

PetroBHayes opened this issue May 12, 2021 · 24 comments

Comments

@PetroBHayes
Copy link

Operating system: CentOS Linux 7.9.2009
Webmin version: 1.974
Authentic theme version: 19.75 
Kernel and CPU: Linux 3.10.0-1160.25.1.el7.x86_64 on x86_64
Processor information: AMD Ryzen Threadripper 3970X 32-Core Processor, 64 cores
# dmidecode 3.2
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.

Handle 0x0002, DMI type 2, 15 bytes
Base Board Information
        Manufacturer: Micro-Star International Co., Ltd.
        Product Name: Creator TRX40 (MS-7C59)
        Version: 1.0
# sensors
k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +53.4°C  (high = +70.0°C)

iwlwifi-virtual-0
Adapter: Virtual device
temp1:            N/A

max1617-i2c-13-4d
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +39.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)

max1617-i2c-14-4d
Adapter: SMBus PIIX4 adapter port 2 at 0b00
temp1:        +39.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)

max1617-i2c-15-4d
Adapter: SMBus PIIX4 adapter port 3 at 0b00
temp1:        +39.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)

max1617-i2c-16-4d
Adapter: SMBus PIIX4 adapter port 4 at 0b00
temp1:        +39.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)
@jcameron
Copy link
Collaborator

@iliarostovtsev can you take a look at this? It looks like you most recently updated the code to detect CPU temps on Ryzen.

@iliajie
Copy link
Collaborator

iliajie commented May 16, 2021

Fixed - 7fe97a7

@iliajie iliajie closed this as completed May 16, 2021
@iliajie
Copy link
Collaborator

iliajie commented May 16, 2021

temp1: +53.4°C temp1: +39.0°C ..

It looks that you're having encoding issues in your shell.

@PetroBHayes
Copy link
Author

Sorry about the munged characters. Looks like that happens if you use putty through RDP. I've attached a corrected version just in-case. Thanks for looking into it.
sens.txt

@iliajie
Copy link
Collaborator

iliajie commented May 17, 2021

Sorry about the munged characters. Looks like that happens if you use putty through RDP. I've attached a corrected version just in-case. Thanks for looking into it.

The patch above will work. Thanks.

@PetroBHayes
Copy link
Author

PetroBHayes commented Jun 19, 2021

Using the latest version of webmin didn't fix this issue:

Webmin version: 1.979
Authentic theme version: 19.81
Kernel and CPU: Linux 3.10.0-1160.31.1.el7.x86_64 on x86_64
Processor information: AMD Ryzen Threadripper 3970X 32-Core Processor, 64 cores

Here's the "sensors >sens.bin" from my system saved and transferred as binary. Note: the github webUI won't let me upload a *.bin file, so I've renamed it to sens_bin.txt. Looking at the HEX view of the file, the degree symbol is encoded using 2 bytes, 0xC2 and 0xB0. I tried to manually grep sensors using the new line from /proc/linux-lib.pl and it seems to work.

sensors | grep -P "temp(\d+):\s+([\+\-][0-9\.]+).*?°[Cc]\s+.*?[=+].*?\)"
temp1:        +53.0°C  (high = +70.0°C)
temp1:        +38.0°C  (low  =  +0.0°C, high = +85.0°C)
temp1:        +38.0°C  (low  =  +0.0°C, high = +85.0°C)
temp1:        +38.0°C  (low  =  +0.0°C, high = +85.0°C)
temp1:        +38.0°C  (low  =  +0.0°C, high = +85.0°C)

So I'm not sure what's wrong.
Thanks for any help.
sens_bin.txt

@iliajie
Copy link
Collaborator

iliajie commented Jun 20, 2021

What is the full output of sensors command?

@PetroBHayes
Copy link
Author

The attached sens_bin.txt file in the message above is the full output. I'll quote a copy here also and hope characters don't get corrupted.

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +67.5°C  (high = +70.0°C)

iwlwifi-virtual-0
Adapter: Virtual device
temp1:            N/A  

max1617-i2c-13-4d
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +38.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)

max1617-i2c-14-4d
Adapter: SMBus PIIX4 adapter port 2 at 0b00
temp1:        +38.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)

max1617-i2c-15-4d
Adapter: SMBus PIIX4 adapter port 3 at 0b00
temp1:        +38.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)

max1617-i2c-16-4d
Adapter: SMBus PIIX4 adapter port 4 at 0b00
temp1:        +38.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)

Thanks!

@iliajie
Copy link
Collaborator

iliajie commented Jun 22, 2021

Thanks for reporting. Give this patch a try, please -

939c8eb

@PetroBHayes
Copy link
Author

No, it still doesn't work. Looking at that section of code a bit more, I don't think it's the degree character.

                        # New line - new device (disallow, if no either fan or
                        # voltage data)
                        $aa = 0 if (/^\s*$/);

                        # Device has either fan or voltage data (sign of CPU)
                        $aa = 1 if (/fan[\d+]:\s+[0-9]+\s+RPM/i ||
                                    /in[\d+]:\s+[\+\-0-9\.]+\s+V/i);

The thread ripper sensors output doesn't include any fan or voltage data in the output, so the "if ($aa && " in the temp parsing is failing before it ever gets to parse out the degree symbol.

I just tried running sensors-detect again and running scans for every device I could, it still shows the exact same output as the attached file, with just one CPU temp. I know there are a bunch of fans inside that system too, which is a little sad that I'm not able to get info for those either.

@iliajie
Copy link
Collaborator

iliajie commented Jun 22, 2021

.. so, this part:

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +67.5°C  (high = +70.0°C)

iwlwifi-virtual-0
Adapter: Virtual device
temp1:            N/A  

is completely unrelated to CPU?

@iliajie
Copy link
Collaborator

iliajie commented Jun 22, 2021

Does your sensor command provide CPU temps at all?

@PetroBHayes
Copy link
Author

The k10temp-pci-00c3 I believe is the CPU temp sensor. There are other sensors on the motherboard, temp, fan and voltage that would normally be displayed, but are apparently not currently supported for the Super IO chip used by my ThreadRipper motherboard. Here's the output from the sensors-detect that I believe shows the missing kernel driver that I need:

Trying family `VIA/Winbond/Nuvoton/Fintek'...               Yes
Found unknown chip with ID 0xd451

I believe the nct6775 kernel driver may have some support for my system, so I'm currently trying to figure out how to get it to load.

@PetroBHayes
Copy link
Author

Well, I was able to get my fan speed and voltages to show up in the sensors output, but it still won't show a temperature in webmin. :(

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +51.6°C  (high = +70.0°C)

iwlwifi-virtual-0
Adapter: Virtual device
temp1:            N/A

max1617-i2c-13-4d
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +37.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)

max1617-i2c-14-4d
Adapter: SMBus PIIX4 adapter port 2 at 0b00
temp1:        +37.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)

max1617-i2c-15-4d
Adapter: SMBus PIIX4 adapter port 3 at 0b00
temp1:        +37.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)

max1617-i2c-16-4d
Adapter: SMBus PIIX4 adapter port 4 at 0b00
temp1:        +37.0°C  (low  =  +0.0°C, high = +85.0°C)
temp2:          FAULT  (low  =  +0.0°C, high = +85.0°C)

nct6797-isa-0a20
Adapter: ISA adapter
in0:                    +1.25 V  (min =  +0.00 V, max =  +1.74 V)
in1:                    +0.98 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                    +3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                    +3.34 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                    +1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                    +0.15 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                    +0.72 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                    +3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                    +3.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                    +1.84 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                   +0.68 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                   +0.67 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                   +0.99 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                   +0.68 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                   +1.54 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                  2253 RPM  (min =    0 RPM)
fan2:                  1091 RPM  (min =    0 RPM)
fan3:                     0 RPM  (min =    0 RPM)
fan4:                     0 RPM  (min =    0 RPM)
fan5:                     0 RPM  (min =    0 RPM)
fan6:                     0 RPM  (min =    0 RPM)
fan7:                     0 RPM  (min =    0 RPM)
SYSTIN:                 +40.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = CPU diode
CPUTIN:                 +35.0°C  (high = +112.0°C, hyst = +85.0°C)  sensor = thermistor
AUXTIN0:                +41.5°C  (high = +112.0°C, hyst = +85.0°C)  sensor = thermistor
AUXTIN1:                +44.0°C    sensor = thermistor
AUXTIN2:                +44.0°C    sensor = thermistor
AUXTIN3:                 -3.0°C    sensor = thermistor
SMBUSMASTER 0:          +51.5°C
PCH_CHIP_CPU_MAX_TEMP:   +0.0°C
PCH_CHIP_TEMP:           +0.0°C
PCH_CPU_TEMP:            +0.0°C
intrusion0:            ALARM
intrusion1:            ALARM
beep_enable:           disabled

@iliajie
Copy link
Collaborator

iliajie commented Jun 23, 2021

Which line(s) exactly on the output above represent CPU temperature?

@PetroBHayes
Copy link
Author

The k10temp-pci-00c3:temp1 temperature is the main temp reported by the CPU. It doesn't require installing extra drivers. However it apparently has reliability issues: https://wiki.archlinux.org/title/lm_sensors#Troubleshooting

The nct6797-isa-0a20 sensors only showed up on my system when I activated https://elrepo.org/ and installed the nct6775 driver from that repo. These are sensors that are provided by the motherboard and I'm not exactly sure how they relate to the actual CPU temperature. Just from my own evaluation, I can see the number that most closely matches the k10temp is the SMBUSMASTER_0 temperature. The SYSTIN temp say's it's a CPU diode, and seems reasonable as a cpu temp, but is much lower than the k10temp. The CPUTIN temp seems very low and almost never changes, so I'm not sure what that's measuring.

I wish I had a better answer for you as to which temp to read. I think the k10temp sensors is the best one, since it doesn't require extra drivers and would give the broadest support.

@PetroBHayes
Copy link
Author

By modifying the code to ignore the fan and voltage requirement I was able to get the temperature to show up in webmin.

                        $aa = 1 if (/k10temp-pci-00c3$/);
                        # Device has either fan or voltage data (sign of CPU)
                        #$aa = 1 if (/fan[\d+]:\s+[0-9]+\s+RPM/i ||
                        #           /in[\d+]:\s+[\+\-0-9\.]+\s+V/i);

                        # Get odd output like in #1253
                        if ($aa && (
                                /temp(\d+):\s+([\+\-][0-9\.]+)\s+.*?[=+].*?\)/ ||
                                /temp(\d+):\s+([\+\-][0-9\.]+).*?[Cc]\s+.*?[=+].*?\)/
                        )) {
                                # Adjust to start from `0` as all other outputs
                                push(@rvx, { 'core' => (int($1) - 1),
                                             'temp' => $2 });
                                }

@iliajie
Copy link
Collaborator

iliajie commented Jun 23, 2021

The CPUTIN temp seems very low and almost never changes, so I'm not sure what that's measuring.

This the the temperature read from the CPU sensor on the MB. We need CoreTemp data. This sensors command output is just odd in my opinion.

Check this patch - 76591fa - replace the whole file to avoid typos.

@PetroBHayes
Copy link
Author

Yup, that works. It says Core2 for some reason, but I don't care about that.
Capture

Thanks for all the help!

@iliajie
Copy link
Collaborator

iliajie commented Jun 23, 2021

Check this patch out - d10e122

@PetroBHayes
Copy link
Author

Ok, says Core 1 now.

Thanks!

@iliajie
Copy link
Collaborator

iliajie commented Jun 23, 2021

Cool server!
(specs)

@chris001
Copy link

Because of all the non-standardized names/numbers of temp sensors, many temp displaying apps let the user pick which temp sensor to display, in case the app logic guesses "incorrectly"... For an app as an example, on the PC/Windows, the app "Speedfan"...

@iliajie
Copy link
Collaborator

iliajie commented Jun 24, 2021

I was thinking of adding it but it would require an extra configuration option but we already have so many.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants