# i686 emulator in kernel

## wilsonsamm

Why doesn't this exist yet? I would find this so useful for several reasons:

1: Software built for later processors may run on earlier hardware. For example, I could get a (arch=486) gentoo stage3 running on a 386 computer, because the kernel could trap the illegal instructions and emulate only those. I have had some problems with this before now.

2: Software built for the i686 will run also on other architectures: I'd like to see some people moving away from the legacy x86 architecture. But this can only be done if they have viable ways of migrating. Wine and DOSbox require the x86 ISA of course, but that could be emulated on a reasonably powerful PowerPC or S/390. I'm not talking about running Wine in a Qemu or such, another linux on the Qemu, and then Wine on top of that -- I mean that the x86 application will run on the non-x86 host computer by means of ISA interpretation.

In my mind I am hatching the idea of supporting various (hardware or emulated) CPUs and DSPs for general purpose computing through a userland interface. I don't know, if this would be better of in a kernel though.

----------

## NeddySeagoon

wilsonsamm,

Emulation in this way is horribly slow. As a test case, build your kernel with Floating Point Emulation and try some floating point things both with and without the emulator. Expect time differences of greater than a factor of 100.

There are many emulators for other CPUs/DSPs and sound chips that run on both 32 and 64 bit x86 CPUs. Look at MAME (Multiple Arcade Machine Emulator) for some open source examples. MAME runs the real code from these arcade machines.

I suppose there are no emulators like you suggest around as its easier and more productive to fix the source code, so it will build as native code for the target you have in mind. This does not infer building on the target hardware, cross compilers are wonderful things.

----------

## eccerr0r

The reason why the instruction trapping works on a 386 is that i686 code is still mostly the same as 386 code.  The usual add, subtract, branch, etc. instructions still constitute over 90% of the code and can run at full speed.  Note that things like cmpxchg, fetchadd, sse, mmx, etc. instructions could either be impossible or even fall under the second category:

However, if you move it to another architecture (which can include sse, mmx instructions depending on extent) - the opcode sequence can be 0% the same, and will trap on *every* instruction.  Not only that, there are some bit encodings that will match native encoding, and the processor will execute it as usual though it's garbage in the native context.   Provided that it *does* trap on every nonnative instruction, traps are an interrupt.  Interrupts means context swap, which can take hundreds to thousands of cycles to process.  That 1 cycle instruction on x86 has just bloated to hundreds, multiplied by 100% of all code, which will end up with something fairly unusable.

With something like MAME or DOSBox (or QEMU / Bochs / VMWare /...), it will not take a trap every instruction, and at most there's 10's of instructions per instruction to emulate.  It's not nearly as efficient as if the processor can execute some of the code as-is (though some of these "emulators" can pass some code straight to the main cpu and receive almost 0 penalty.)

(I would like to note that CMPXCHG and FetchAdd4 semaphore instructions break the latest glibc, and nobody has backported an emulation for these instructions to the 386, so they will no longer run...  Too many instructions to emulate...?  Or nobody cares about 386's anymore...)

----------

## poly_poly-man

there's em86 for alpha.

----------

## eccerr0r

 *poly_poly-man wrote:*   

> there's em86 for alpha.

 

I think the funnier ones include older Itanium (original and Itanium2) and Itanium's Processor 9000 series and newer  for x86 code.  There are hardware hooks on both, the older ones have microcode to emulate full ia32, the newer ones have hooks but require a software emulator to run ia32 code (though still somewhat slow, is nowhere near traditional emulator speeds)...

I heard the Itaniums also have PA-RISC emulation hooks but I don't know how that works.

----------

## wilsonsamm

 *NeddySeagoon wrote:*   

> wilsonsamm,
> 
> Emulation in this way is horribly slow. 

 

You mean like those terrible applications (*cough* Limewire *cough*) which run on the Java Runtime Engine?  :Wink: 

But I think it would still be possible. Am I right in thinking that many alternative architectures, such as Alpha, M68k, SPARC, z9 etc. now are so old and slow in comparison to modern (x86) CPUs, that emulating for example an i386 for such applications would be unbearably slow, or what?

----------

## wilsonsamm

 *eccerr0r wrote:*   

> However, if you move it to another architecture (which can include sse, mmx instructions depending on extent) - the opcode sequence can be 0% the same, and will trap on *every* instruction.  Not only that, there are some bit encodings that will match native encoding, and the processor will execute it as usual though it's garbage in the native context. 

 

Surely, the instructions which were added on the 486, pentium, and later with mmx, sse2 etc will not be recognised by a vanilla 386 as a valid instruction? In that case, these would be the only ones trapped and emulated. Slow and clunky, yes, but it would work, I feel. If we could let this particular kind of interrupt not be a context switch, but instead just pass the relevant data to the emulator, then it might be better still.

Iff the x86 code wants running on a VAX or Apple Macintosh, then that's when the entire i386 needs emulating. It's a horrible architecture, and does this affect how quick the emulation is?* I am imagining Windows applications running in Linux on the Mac with complete integration on the GUI level  :Smile: 

* My emulation writing experience extends only as far as an PDP-8 emulator in M68k assembly. It was incomplete, but it was faster than I thought it would be...

----------

## NeddySeagoon

wilsonsamm,

A 386 does not have a floating point co-processor. It was an add on extra with both the 386 and 486sx, so you would need the coprocessor or kernel floating point emulation. Most i686 software takes floating point for granted.

Many but not all the extensions for single instruction, multiple data (SIMD) instrucrtions, mmx, etc use the floating point registers.

Later CPUs provide separate registers.

If you really want to run i686 code on a i386 with no i387, start modifying the kernels FPU emulator

----------

## eccerr0r

 *wilsonsamm wrote:*   

> Surely, the instructions which were added on the 486, pentium, and later with mmx, sse2 etc will not be recognised by a vanilla 386 as a valid instruction? In that case, these would be the only ones trapped and emulated. Slow and clunky, yes, but it would work, I feel. If we could let this particular kind of interrupt not be a context switch, but instead just pass the relevant data to the emulator, then it might be better still.
> 
> 

 

Most compilers will emit mostly 386-compatible code when targeting i686.   By that virtue the number of emulated instructions is low for integer code.  I suppose it doesn't matter for float, the context swap/trap overhead is not a big deal compared to the number of instructions it takes to emulate a 387, SSE, or MMX.  In any case, by how instruction emulation works on a 386, you have to go through a context swap trap -- there's no way around it.  It detects invalid instructions as ... invalid instructions just like any other x86 machine does, and then after looking at it with software, decodes it.  In order to prevent such interrupt you have to basically emulate the whole instruction set, which also forces emulation on stuff that could be run directly.

Something similar to what you're asking for is on newer core2 chips - VT.  VT will let newer core2 run privileged instructions (which would trap similar to undefined instructions) without going through a whole trap sequence, allowing virtualizing processors with much less overhead.  This was more of an afterthought than something unique when someone noticed that people want to run complete virtual machines.

But once again, this works because most of the instructions are the same, and can run natively to begin with.

----------

## wilsonsamm

So do you all think it's not worth it, even if it would allow Wine and other to run on non-x86 machines?

----------

## NeddySeagoon

wilsonsamm,

What you are suggesting would allow modern binaries tor run on old hardware at a reduced speed.

For the 386 specifically, you would need a 387 plus the extra emulation. If you don't have a 387 today, you are not going to be able to buy one.  Well, you can but they are space grade parts. You could buy a top of the range PC for less.

New glibc won't run on a 386/387 so one of the key system libraries would need to use the emulation 

The same is true of the 486SX. It has no FPU, so you need its companion 487SX ... which in reality is a 486DX with an extra pin.

People throw out P2 and better systems (i686) so you can often get them for free. It just not worth the effort of implementing the emulation to use programs that will mostly be too slow to use on such hardware, even without any emulation when better hardware is often freely available.

----------

## poly_poly-man

 *NeddySeagoon wrote:*   

> wilsonsamm,
> 
> What you are suggesting would allow modern binaries tor run on old hardware at a reduced speed.
> 
> For the 386 specifically, you would need a 387 plus the extra emulation. If you don't have a 387 today, you are not going to be able to buy one.  Well, you can but they are space grade parts. You could buy a top of the range PC for less.
> ...

 If you need help with anything related to pre-pentium 2 systems, I am the one to talk to..

I don't know if this idea is feasible at all, or if it would be useful for very much... but check out my website, and you can discuss on my mailing list or whatnot.

Some of us treasure our 386's...

----------

## eccerr0r

The problem with emulation of new instructions:

For a i386 instruction set:

1GHz P3: 1.00 (by definition)

25MHz 386: around 0.015

25MHz 68030: around 0.0003 (full emulation required)

For a i686 instruction set

1GHz P3: 1.00 (by definition)

25MHz 386: around 0.013

25MHz 68030: around 0.0003 (since it's full emulation, no additional penalty)

For a SSE instruction set or FPU code:

1GHz P3: 1.00 (by definition)

25MHz 386: around 0.002

25MHz 68030: around 0.00005 (assuming also no FPU)

Something that takes one second will take minutes on the slower machines... Is it worth it to emulate?

Also what you mean by "horrible architecture" is really "extensive instruction set" -- which means it'll take many more instructions on the host architecture to figure out what each instruction is, and to actually do them.  For microcoded computers, it's even uglier (RISC is easiest to emulate most often)...

 *wilsonsamm wrote:*   

> In my mind I am hatching the idea of supporting various (hardware or emulated) CPUs and DSPs for general purpose computing through a userland interface. I don't know, if this would be better of in a kernel though.

 

A fully cross platform language is available: That's what interpreted languages are for.  But for binary code, it's virtually impossible (do also remember that IO needs to be emulated, not only CPU instructions).

----------

