# How can I enhance a prgram's performance from the first run?

## VanFanel

Hello there!

I've been observing some multimedia programs have some stalls/pauses the first time they are executed, and if I let them run for a while pass certain points (varying in each program) those pauses go away. 

If I run them a second time, there are no pauses at all and performance is just great!

Perfect examples of this are DosBox and Snes9x. SDLMAME, Kega Fusion and Mednafen are also emulators, but they DON'T suffer from these strange symptoms. 

NONE of the programs affected by those first-run pauses are cpu-intensive: they all use 25% of my CPU at maximum, and they use 3-4MB of RAM in the worst case. NO CPU usage peaks are related to these pauses! 

I've read about lazy-loading and the LDFLAGS to prevent this feature when compiling a program from it's sources, but even if I use the "-z now" LDFLAGS, there's no differece at all.

Optimizing those programs for executable with the "-Os" flag seems to help somehow (I suppose they get most of the executable code loaded in memory from the begining that way), but some pauses can still be noticed when they run for the first time. Same if I copy the elf files into a ramdisk and run them from there: it helps a bit but there are some pauses the first time I run the programs.

I tried to prelink those executables, too, but there's no difference at all.

I've also tried different cpu and I/O schedulers, but there's no difference at all for this problem.

So, how can I REALLY force a program's executable code to be in memory from the first time it's run?

WHY exactly do programs perform better after running for a while?

thanks  :Smile: 

----------

## NeddySeagoon

VanFanel,

You can't force code to load because linux mmaps pages - they are not loaded into RAM until they are needed and they stay there until the RAM is needed.

In theory, this mmapping means you can execute code that won't fit into RAM all at the same time because a mmaped page thats not needed can be flushed from RAM, then reloaded again when its needed. I say in theory, as this would be very slow.

Note: this is a form of swapping but it does not use your swap file. The swap file is only used for dynamically allocated memory, as it has no location on disk.

Programs perform better after a time as more of their code pages are in RAM. The kernel does not have to fetch them, so there is no disk access penalty.

The kernel does 'read ahead' to minimise the waiting for code/data from the disk.

The pause you see are due to disk accesses taking too long. If your hard drive(s) have names like /dev/hda...  what does hdparm /dev/hd... say for them?

Also, in your kernel, what are your preemtion settings ?

```
$ grep PREE /usr/src/linux/.config

CONFIG_TREE_PREEMPT_RCU=y

# CONFIG_PREEMPT_NONE is not set

# CONFIG_PREEMPT_VOLUNTARY is not set

CONFIG_PREEMPT=y
```

improves kernel responsiveness to user interactions.

----------

## VanFanel

Hello, NeddySeagoon!  :Smile: 

Relevant sections:

```
CONFIG_TREE_RCU=y

# CONFIG_TREE_PREEMPT_RCU is not set

# CONFIG_RCU_TRACE is not set
```

```
CONFIG_HPET_TIMER=y

CONFIG_HPET_EMULATE_RTC=y

CONFIG_DMI=y

# CONFIG_IOMMU_HELPER is not set

# CONFIG_IOMMU_API is not set

CONFIG_NR_CPUS=8

# CONFIG_SCHED_SMT is not set

CONFIG_SCHED_MC=y

# CONFIG_PREEMPT_NONE is not set

# CONFIG_PREEMPT_VOLUNTARY is not set

CONFIG_PREEMPT=y

```

```

CONFIG_SCHED_BFS=y

# CONFIG_SCHED_CFS is not set

CONFIG_SCHED_BFS_RR=3

# CONFIG_SCHED_BFS_AUTOISO is not set

CONFIG_EXPERIMENTAL=y

CONFIG_LOCK_KERNEL=y

```

As for my harddisk, it's listed as /dev/sda. My machine is a Mac Mini (2009), so I believe my disks are SATA and hence interfaced as scsi units...(not sure ot that). I believe hdparm is for old disks, correct me if I'm wrong.

regards

----------

## NeddySeagoon

VanFanel,

You are right about drives under the SCSI layer. hdparm is not for them - not even PATA drives.

Everything else looks ok

----------

## VanFanel

According to 

```
dmesg | grep DMA
```

, my hd is configured at UDMA/100, so I believe it's not about the kernel hd setings.

Could it be about the filesystem I'm using? I'm on ext3 as it's the only system I've used so far for my system partition.

I believe it shouldn't be the problem here.

----------

## Link31

 *NeddySeagoon wrote:*   

> You can't force code to load because linux mmaps pages - they are not loaded into RAM until they are needed and they stay there until the RAM is needed.

 

Well, yes you can (but I'm sure you knew that), it's called prefetching :

```
cat /usr/bin/executable_file >/dev/null
```

This forces the kernel to read all pages and it should keep them in cache, if there is enough free memory to do so. You may also need to prefetch all dynamically linked libraries to prevent startup delays.

----------

## NeddySeagoon

Link31,

yes ... the problem as you state is   *Link31 wrote:*   

> ... if there is enough free memory to do so

 which is not something you can control

----------

## VanFanel

And... is there some way to know what libraries is an executable going to use?

Some kind of utility to show the libraries it's linked against?

That way, I could try prefetching every library used by an executable prior to execution...

----------

## Sadako

 *VanFanel wrote:*   

> And... is there some way to know what libraries is an executable going to use?
> 
> Some kind of utility to show the libraries it's linked against?
> 
> That way, I could try prefetching every library used by an executable prior to execution...

 Run ldd (or the nicer lddtree.sh) on the binary in question.

----------

## Link31

 *NeddySeagoon wrote:*   

> yes ... the problem as you state is   *Link31 wrote:*   ... if there is enough free memory to do so which is not something you can control

 

There is always some way to control the memory management, in this case you would need to call mlockall(2) from inside the process after having prefetched it. But this requires root privileges or the CAP_IPC_LOCK capability. It's widely used in real-time programs though.

----------

## VanFanel

Well, I made a script that launched lddtree on a given executable, then prefetches every library used by it. 

No great differences in most programs, but prefetchind the data files of MS-DOS games before running them in DOSBOX sure gave a good boost in the initial run (not very usefull, but hey!  :Very Happy: )

Could it be possible to use the CAP_IPC_LOCK on a binary without modifying the sources? No problems with privileges here. It's my home system I'm experimenting with.

And in case it's not posible, where sould it be done? How can I prefetch the libraries from inside the sources if it's needed to do so? A simple system call with that 

```
cat /usr/lib/<library name> > /dev/null
```

 would do?

----------

## Link31

There is a mlock wrapper here: http://thread.gmane.org/gmane.linux.kernel/764816. Too bad it hasn't been merged yet.

Also I found a mlock-as-nonroot patch in the lkml archives: http://lkml.org/lkml/2004/7/29/52

You may want to try merging these patches in your own kernel if you really need this feature.

----------

