Can Dolphin be built to run on less-capable CPUs? #1317
Replies: 5 comments 1 reply
-
I don't know, and can't easily verify, but I'd have thought it would be possible. There is assembler code in the VM that may use some newer instructions in some primitives, but not in the compiler so it sounds like that is still being built with the later code generation setting. Have a look at the faulting instruction to see what and where it is. That may give a clue as to what needs to be changed. How did you change the "Enable Enhanced Instruction Set" setting? Did you change the value of If you are using an older machine, it presumably has an old version of Windows on it, in which case you are more likely to have success using Dolphin 7 (release/7.2 branch). In Dolphin 8 I've removed a lot of backwards compat code for versions of Windows prior to 10, and it also uses some more recent APIs that are not in older Windows versions. |
Beta Was this translation helpful? Give feedback.
-
Thanks, deleting the entire EnableEnhancedInstructionSet line in dolphin.props worked, and gave me a working system! I had previously just tried changing the Solution compiler option directly, which likely did not affect other dependencies. This is a Windows 10 system, so no worries there. I built it on my big Windows 11 system and then transferred the 10 changed files to the small laptop. Thanks! Dolphin is awesome!! |
Beta Was this translation helpful? Give feedback.
-
I'm going to change dolphin.props to remove the EnableEnhancedInstructionSet property in the main Dolphin 8 build as you have done. The very minor perf benefit in a few FP operations hardly outweighs being able to run the binaries on a wide range of machines, especially as it turns out this also blocks running emulated on an Apple M series. The consequence will be that Dolphin 8 will be built using the VS default for this, which is SSE2 (as explicitly specified in the Dolphin 7 build). The Intel spec sheet for the Celeron N3450 implies that it does not support any enhanced instructions, but Wikipedia contradicts this stating that from much earlier Celeron generations (Willamette circa 2002) all Celerons support SSE2. Therefore I don't think you'll encounter further invalid instructions (although if that were to be the case, it would most likely be in floating point primitives). If you do, then you can always further downgrade. |
Beta Was this translation helpful? Give feedback.
-
My experience building SIMH code for distribution on Windows is that the default Microsoft CL compiler options result in an executable that will run on any CPU from the last ten+ years or so, and in the case of SIMH which is a command line program with no GUI dependencies, any Windows version from XP through 11. So the Windows philosophy is, "If you're compiling a program then you're probably going to want to run it on any machine in the world." whereas (unfortunately) the Linux philosophy is "If you're compiling a program then you want to optimize it to only run on THIS computer and THIS version of the C library, and if you want to send it to someone else then you'll only ever send them source code and will expect them to compile it for their system themselves." Which makes it virtually impossible to build binary distributions of stuff (without one separate build for each C library version, etc. Ironic that the company that invented "DLL Hell" eventually figured out how to solve the problem (lowest common denominator compiler output, stable APIs, and stuff like .Net Assembly versioning) but the Linux world is still living in that hell :) I will say that I have found that doing a Profile Guided Optimization pass on Windows CL programs is extremely valuable and I usually get at least 30% overall performance improvement in SIMH. This is generally way more that you could get from even the latest CPU specific optimizations. It seems to work especially well with simulators like SIMH and VM implementations which have very tight inner loops, and even a few cycles of difference there can have a significant impact. Combined with whole program optimization and link time code generation. I really like the Microsoft C compiler toolchain. |
Beta Was this translation helpful? Give feedback.
-
Yes, I have tried the PGO, and it does work well, although the benefit wasn't as great as it might be in Dolphin because of the core of assembler code. That code was hand optimised with vtune to minimise cycles for the reason you mention - we wrote a paper about it. However, that was for a really old generation of chips, and they behave very differently now, plus todays C compiler is vastly better than it was in the late 90s. The improvement in the compiler (and some careful observation of Agner Fog's recommendations) allowed the vast majority of the assembler primitives to be rewritten in C++ without slowing things down much. Those do benefit from PGO, although again some of the potential gain isn't realised because the code already has some care taken over condition ordering, inlining, etc. It was a while ago now that I experimented with PGO, but I think the reason I put it aside was that I wanted to set it up to generate PGO data in the CI build from the test run (not very targeted, but a useful start), and then rebuild binaries with PGO. Build work is not high on my list of interesting things to do, so I let myself get distracted by something else. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a small Windows laptop that I want to play with Dolphin on, but it only has a Celeron N3450 CPU and using the 8.2.1 installer gives me a version that fails to start, getting an Illegal Instruction trap before displaying anything. I changed the C++ code generation option in the VS solution project setting for DolphinVM to plain x86 from AVX, and that gets me a VM that allows Dolphin to start up, I can browse code, etc., but when I try to run Smalltalk code I again get an Illegal Instruction trap, this time in DolphinCR8.dll. Is there a way to build it to run on simpler CPUs, or is there hard-coded AVX etc. in there somewhere for performance?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions