# qemu-user vs. qemu-kvm? & qemu-user speed tips?

## NightMonkey

Howdy. I'm using qemu-user to emulate ARM, specifically for SheevaPlugs and Guruplugs, and building a binpkgs repo + kernel for them. But, I see that there is "qemu-kvm". I'm using a Pentium 4 as my "build host" for creating binpkgs for arm. I'm looking to speed up the build process, where the bottleneck *seems* to be disk-io under qemu-arm.  So, my question is:

* Given my setup (P4), is qemu-kvm package a better choice than qemu-user, or does that make no difference?

* Would enabling KVM in the kernel make a speed difference?

* Does the "stacksize" (-s) setting affect disk-io speed, or overall qemu-arm speed (I have 3.5GB RAM)?

* I'm running this under 32-bit kernel/user-space. Would 64-bit substantively help (or hurt?) qemu-user performance?

If there's any other speed tips out there for qemu-arm on x86, I"m all ears.  :Smile:  Thanks in advance!

```
dell-gx620 ~ # cat /proc/cpuinfo

processor       : 0

vendor_id       : GenuineIntel

cpu family      : 15

model           : 4

model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz

stepping        : 1

cpu MHz         : 2793.011

cache size      : 1024 KB

physical id     : 0

siblings        : 2

core id         : 0

cpu cores       : 1

apicid          : 0

initial apicid  : 0

fdiv_bug        : no

hlt_bug         : no

f00f_bug        : no

coma_bug        : no

fpu             : yes

fpu_exception   : yes

cpuid level     : 5

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pebs bts pni dtes64 monitor ds_cpl cid cx16 xtpr

bogomips        : 5586.02

clflush size    : 64

cache_alignment : 128

address sizes   : 36 bits physical, 48 bits virtual

power management:

processor       : 1

vendor_id       : GenuineIntel

cpu family      : 15

model           : 4

model name      : Intel(R) Pentium(R) 4 CPU 2.80GHz

stepping        : 1

cpu MHz         : 2793.011

cache size      : 1024 KB

physical id     : 0

siblings        : 2

core id         : 0

cpu cores       : 1

apicid          : 1

initial apicid  : 1

fdiv_bug        : no

hlt_bug         : no

f00f_bug        : no

coma_bug        : no

fpu             : yes

fpu_exception   : yes

cpuid level     : 5

wp              : yes

flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe lm constant_tsc pebs bts pni dtes64 monitor ds_cpl cid cx16 xtpr

bogomips        : 5585.75

clflush size    : 64

cache_alignment : 128

address sizes   : 36 bits physical, 48 bits virtual

power management:

```

----------

## jormartr

 *NightMonkey wrote:*   

> 
> 
> * Given my setup (P4), is qemu-kvm package a better choice than qemu-user, or does that make no difference?
> 
> 

 

qemu-kvm is the latest version of qemu, you should stick to it.

 *NightMonkey wrote:*   

> 
> 
> * Would enabling KVM in the kernel make a speed difference?
> 
> 

 

You can not benefit from kvm here, because you cannot send instructions from arm to x86 directly, they must be translated.

 *NightMonkey wrote:*   

> 
> 
> * Does the "stacksize" (-s) setting affect disk-io speed, or overall qemu-arm speed (I have 3.5GB RAM)?
> 
> 

 

When launching qemu, set the cache to writeback (ie: drive file=blabla,cache=writeback,...)

 *NightMonkey wrote:*   

> 
> 
> * I'm running this under 32-bit kernel/user-space. Would 64-bit substantively help (or hurt?) qemu-user performance?
> 
> 

 

I don't know what to answer here, but I guess that it won't make a difference.

EDIT::

If you are compiling directly on that image, using tmpfs may help. If not the entire fs, at least, put /tmp and /var/tmp on a qcow/raw file under /dev/shm. I guess that could be done directly under the virtual environment, doing a bind mount, or a symbolic link to /dev/shm/tmp for these folders :\

----------

## Hu

Generally speaking, reducing the number of transitions between guest and host will improve guest performance.  Storing the guest filesystem in a host tmpfs may help some, but you will probably get a much bigger gain by giving the guest enough memory that you can have the guest mount its own tmpfs for scratch work.

----------

## NightMonkey

Thank you!  :Smile: 

----------

## NightMonkey

 *Hu wrote:*   

> Generally speaking, reducing the number of transitions between guest and host will improve guest performance.  Storing the guest filesystem in a host tmpfs may help some, but you will probably get a much bigger gain by giving the guest enough memory that you can have the guest mount its own tmpfs for scratch work.

 

Hrm. I'm using qemu-user with a chroot, not an entire virtual system. So, I get from what you're saying, however, that I've picked possibly the slowest method for compiling ARM packages available.  :Smile: 

----------

## Hu

 *NightMonkey wrote:*   

> Hrm. I'm using qemu-user with a chroot, not an entire virtual system. So, I get from what you're saying, however, that I've picked possibly the slowest method for compiling ARM packages available. 

 Perhaps.  Although my statement may also apply to the scenario you are following, I did not consider it in my comments.  I only looked at the distinction between different ways of running a complete virtual system.  Since you are using qemu-user to dynamically translate ARM code into x86 code for execution on your x86 host kernel, you do not experience the guest->host transitions that can be so costly.  Instead, you experience the delay of compiling ARM code into x86 code.  If I had to guess, I would say that you will get better performance doing this than using qemu to emulate an entire ARM board, with an ARM kernel and ARM user code running on the virtual board.

----------

