# HOWTO: Central Gentoo Mirror for your Internal Network

## Grimthorn

HOWTO: Central Gentoo Mirror for Internal Network   (supports stages 123)

Synopsis

The default behavior of Gentoos Portage system is very powerful for single installs but quickly becomes redundant when more than two machines are involved. It becomes especially acute when youre dealing with an install base of dozens or more. Using Emerge over the Internet for every machine consumes unnecessary bandwidth, overloads Gentoos mirrors and demands unwelcome access to the Internet for each of those machines.

A more efficient internal infrastructure would be centered on a single point of access to the Gentoo mirrors. This Portage gateway server would be responsible for retrieving updates to the Portage Tree and maintaining a central repository of Gentoo packages (distfiles). In smaller networks all internal machines would draw from the Portage gateway. In larger networks access could be cascaded to secondary and tertiary servers to distribute load or handle complex network structures. 

While some additional admin work is required such as maintaining the subset of Gentoo packages for downstream clients. There are additional benefits to the obvious bandwidth savings and relief for the Gentoo mirrors.

The Gentoo gateway admin has the ability to control the available Gentoo packages (distfiles) inside the network. This would ensure that beta packages do not creep into production machines. Alternatively two secondary servers could stem from the gateway. One would be masked and used by production machines the other unmasked and used by the development and test machines.

Getting a complete backup of the Gentoo packages installed throughout the network would be as simple as copying the ../distfiles/ directory on the Portage gateway server.  This beats running to each machine trying to capture every package that has been downloaded.

Only the Portage gateway server needs access to the Internet. This has many implications beyond the scope of this HowTo but the security benefits are obvious. The Portage gateway server can of course itself be behind a firewall.

Before We Begin

Your Portage gateway server is not meant to be a complete mirror of the 4500+ Gentoo packages available. While over time you will accumulate a comprehensive repository of source packages and their dependencies there is no point in putting load on the mirrors for source you never intend to compile. Someone somewhere has to pay for the hosting of our Gentoo community. Lets not abuse this free service by hoarding data that will likely become obsolete before anyone uses it.

This applies to syncing your Portage Tree as well. You might have noticed Gentoos HowTo on setting up a RSYNC mirror. We will be using the same software for our gateway but you must ignore Gentoos recommendation to sync every 30 minutes. Gentoos policy to sync every 30 minutes is meant for public servers ONLY. There is no harm in automating the sync process for your internal machines and setting the frequency to whatever you like. However, leave the Portage gateways emerge sync as a manual process and update only as needed. You should only require a sync to get new software, correct a bug or patch a security hole. 

Scenarios

Setting up a Portage gateway and its subsequent infrastructure is not very difficult. In fact there are several distinct approaches. Also, the techniques from one setup can be mixed with another. Several possibilities are detailed below but only number three is expanded because it accomplishes everything outlined in the synopsis.

-- Setup #1: Proxy Cache --

If your internal Gentoo machines (Gentoo clients) are behind a proxy firewall they can take advantage of the caching feature built into most proxy servers. The proxy firewall reduces bandwidth usage by caching the results of recent http or ftp requests including Gentoo packages (distfiles). Therefore a recent emerge would have cached any required Gentoo packages on the proxy server. Doing another emerge within a reasonable amount of time would download the cached copy of the package rather than going to a Gentoo mirror. The problem with most proxy caches is that there is an expiry time on cashed content. This means that if you dont emerge soon enough Portage has to get yet another duplicate copy from the Gentoo mirrors. If you have access to your proxy server and can set the expiry times for cached content this is a quick way to setup a pseudo Portage gateway. This setup is best for small networks.

-- Setup #2: NFS or Samba network shares --

Using a network file share for the Portage tree and distfiles is a great solution for small workgroups and college dorms. All it requires is editing the make.conf file and directing each machine to use the share for its Portage tree and Gentoo packages. This offers good flexibility because everyone can update the Portage tree and/or add to the package repository as needed. However, cascading to multiple machines is difficult using shares. Originally I had stated in error that the Gentoo bootable CD-ROM did not support NFS and Samba shares. Revised: [contributed by GTVincent] When started from the x86 1.4_RC4 CDRom, it is possible to start nfsmount and mount /mnt/gentoo/usr/portage/distfiles from another computer after untar-ing a stage-file and creating the /usr/portage/distfiles directory, but before chroot-ing to /mnt/gentoo.

-- Setup #3: Rsync for both the Portage tree and Gentoo packages (distfiles) ---

This setup aims to accomplish everything laid out in the discussion above. As you will see it provides flexibility and control in larger environments. 

Portage uses two methods to keep an updated Portage tree and retrieve current Gentoo packages (distfiles). Rsync (rsync) is used for the tree and wget is used for the packages. Im not aware of the motivations behind these protocol choices but they work very well.

We will continue to use rsync for updating the Portage tree and set your client machines to draw from your designated Portage gateway. This will be accomplished using Gentoos rsync daemon on the gateway. We will drop wget for Gentoo package retrieval (distfiles) and instead instruct the clients emerge to use rsync with the Portage gateway. The bonus is that rsync is already in place, its fast and its configurable.

Procedures

OK, lets get into the nuts and bolts of the setup procedure. Im assuming that you have two or more Gentoo machines and that one of them is built and connected to the Internet (this will be the Portage gateway). Remember that we want to accomplish two things: 1) have your clients (internal machines) update their Portage tree from your Portage gateway, 2) have all your clients (internal machines) download the necessary Gentoo packages (distfiles) from the Portage gateway when they do an emerge.

1.0 Portage gateway setup

To serve the Portage tree and distfiles you need to be running the Gentoo rsync daemon. Gentoo conveniently provides this software in a Portage package (naturally). Do an emerge of the following package and wait for it to compile.

Code listing 1.1

```
#emerge app-admin/gentoo-rsync-mirror
```

Now lets configure the rsync daemon (its not running yet). The rsyncd.conf file should have been created on compile but if it wasnt create one yourself.

Code listing 1.2

```
#nano /etc/rsync/rsyncd.conf
```

Regardless of whether the file exists or not you will want it to look like this:

File Listing 1.1

```
#uid = nobody

#gid = nobody

use chroot = no

max connections = 10

pid file = /var/run/rsyncd.pid

motd file = /etc/rsync/rsyncd.motd

transfer logging = yes

log format = %t %a %m %f %b

syslog facility = local3

timeout = 300

#hosts allow = <your list>

[gentoo-portage]

#For replicating the Portage tree to internal clients

path = /usr/portage

comment = Gentoo Linux Portage tree mirror

exclude = distfiles

[gentoo-packages]

#For distributing Portage packages (distfiles) to internal clients

path = /usr/portage/distfiles

comment = Gentoo Linux Packages mirror

#[gentoo-x86-portage]

#This entry is for backward compatibility and is generally no longer required.

#path = /usr/portage

#comment = Old Gentoo Linux Portage tree
```

Were only going to discuss the important parts but there are many more configuration options. Check the man pages for a great discussion of the rsync daemon (#man rsyncd.conf) or general rsync uses (#man rsync).

The first line of interest is motd file = /etc/rsync/rsyncd.motd. It points to the message of the day that will be displayed every time rsync delivers files. The rsync.motd file is just text so put anything you want in it (server name, IP, admin contact, etc). As always you can just edit it with nano.

I put the #hosts allow = <your list> line in to illustrate the various security settings you can tweak. This option allows you to specify a range of addresses that are allowed to rsync with this machine. If a requesting machine isnt in this range the request is denied. Check the man pages (#man rsyncd.conf) for more discussion about rsync security options.

If a default rsyncd.conf file was created when you emerged than you would have noticed two blocks of options at the bottom of the file. These are rsync modules. They specify what directories to share and where they are located on the locale machine. The sample file above has commented out one default module and added one new module.

The [gentoo-portage] module is responsible for sharing the Portage tree. It is important that the path is properly configured to reference the location of the local Portage tree. Just as important is the exclude property. If the distfiles directory is not excluded than every time an internal machine syncs with the gateway the Gentoo packages will go along for the ride. This is not desirable in most circumstances.

The [gentoo-packages] module is responsible for sharing the Gentoo packages (distfiles). This module is not specified in the default rsyncd.conf files so you will have to create it. It is important that the path is properly configured to reference the location of the local Gentoo packages (distfiles) and not the Portage tree.

The [gentoo-x86-portage] module is there for backward compatibility. Your need for this will depend on how current your install base is, for this setup Ive left it out.

Now that the rsync daemon is configured we can set it up to start when we boot the machine. You may want to adjust the runlevel to suits your needs.

Code listing 1.3:

```
#rc-update add rsyncd default
```

Finally lets get the rsync daemon actually running.

Code listing 1.4:

```
#/etc/init.d/rsyncd start
```

IMPORTANT NOTE: Gentoo has changed the way the rsync daemon is started. You must edit the init script for rsync to work with Gentoo packages. For now this is detailed in a post below but I will update this HowTo ASAP.

Your Portage gateway is ready to go!

2.0 Internal Gentoo Machine Setup (client setup)

Note: Im assuming that you can see your Portage gateway (you should be able to ping it). Ideally you should have DNS setup properly in your /etc/resolv.conf file and a DNS server on your network. The Gentoo install guide details this.

Ok, so youve just booted your client machine from your Gentoo cd-rom, created your partitions, extracted stage 1, chrooted, etc, etc, and you need to emerge sync for the first time. If you built your Portage gateway from Stage 1 than all (or most) of the Gentoo packages you need should be on that machine. Lets go get them.

All you have to do is add two lines to your /etc/make.conf file but before we do that we must prepare a couple variables. Find and uncomment the following lines in your /etc/make.conf file:

File listing 2.1

```
PORTDIR=/usr/portage

DISTDIR=${PORTDIR}/distfiles
```

These variables pass important information to the settings listed below. Make sure their values are correct for your system. Each path must point to the corresponding location of your Portage tree and distfiles directories. Unless youve change the default behavior of your gentoo install the given values are valid.

Ok, now lets tell your machine where to get the Portage tree. You can put the following line anywhere in the /etc/make.conf file but grouping it with the other rsync options is ideal.

File listing 2.2

```
SYNC=rsync://<your Portage gateways IP or DNS here>/gentoo-portage
```

-- The SYNC command overrides the default location Portage looks for the Portage tree.

-- rsync:// instructs your machine to use the rsync protocol. 

-- <your gateway address> If you have DNS working than put in the name of your server otherwise use its IP. Do NOT include the greater or less than signs (<>).

-- /gentoo-portage Recall that this is the name of the module you specified in the rsyncd.conf file on your Portage gateway. The module contains a path that points to the gateways local Portage tree.

Exiting the file and doing an emerge sync right now would result in a successfully updated Portage tree but lets finish the rest of configuration.

Now we will tell portage how to download files for an emerge process. It is VERY important that you get the syntax right here. You can put the following line anywhere in the /etc/make.conf file but grouping it with the other fetch commands is ideal. Note: Regardless of what your browser is displaying it should all be on one line.

File listing 2.3

```
FETCHCOMMAND=rsync rsync://<your Portage gateways IP or DNS>/gentoo-packages/\${FILE} ${DISTDIR}
```

Missing even one character in the line above would result in failed emerge process so lets review:

-- The FETCHCOMMAND feature of Portage allows you to specify a wide variety of methods to retrieve Gentoo packages from your Portage gateway. Kudos to the Gentoo folks the flexibility is great!

-- rsync -v is telling Portage to use the rsync program to get the file. The -v is optional as are many other settings you could apply. Check the man pages (#man rsync) for more choices.

-- rsync:// This is telling rsync that you need to reach across the network using the rsync protocol

-- <your gateway address> If you have DNS working than put in the name of your server otherwise use its IP. Do NOT include the greater or less than signs (<>).

-- /gentoo-packages Recall that this is the name of the module you specified in the rsyncd.conf file on your Portage gateway. The module contains a path that points to the gateways local Gentoo package directory.

-- /\${FILE} This variable contains the file name emerge is trying to obtain. Note the forward slash and backslash combination this is important.

--  ${DISTDIR} This variable tells rsync were to put the files on the local (client) machine. There should be a space between it and the ${FILE} variable

-- Note the quotes around everything after the equals sign.

Save your file and exit.

At this point emerging any packages on your client machine will retrieve them from your Portage gateway. 

Final Thoughts

What if my package is not on the Portage gateway? If one of your client machines requests a package that is not available on the Portage gateway obviously the emerge operation will fail. No problem, make a note of what package you want and logon to the Portage gateway. Perform emerge f <needed package name>. This will retrieve the package onto the gateway without compiling it. Now perform the emerge on the client machine again and all is good. 

It may be helpful to have all of the USE flags added to the Portage gateway machine to ensure every dependency package is retrieved (can someone verify this). Emerge ufed, its a very handy tool for editing your USE flags. Tip: [contributed by Me] Putting "cvs" in your server make.conf "features" should enable all useflags, even when new ones get created.

On some occasions you may find that some files required by a clients emerge do not download when you perform the same emerge on the Portage gateway. I dont understand yet why this happens (maybe someone could enlighten me). Regardless, all you have to do is manually grab that file from the Internet using wget on the Portage gateway or download and copy the file to the Portage gateways distfiles directory. The file name should be listed in the failed emerges output.

Of course over time you will accumulate a selection of packages that is comprehensive and customized to your needs. If you are supporting fifty client machines you only have to download a needed package once onto the gateway and all of those clients can emerge it without going to the Internet.

If you have cascaded your Portage gateway to multiple servers you have very good redundancy. If your Portage gateway dies just upgrade one of the secondary servers to gateway status. Keep in mind though that an infrastructure built around segregating packages would not be suitable for this.

If the network share method is working well in your environment then just add the Gentoo rsync daemon to support stage 1 installs. This would give you flexibility and complete support of stages 1-3.

Thats it for this HowTo. I hope it relieves a few headaches and eases some bandwidth woes. A big thanks to the Gentoo people and the forum community!

-- Thank you to GTVincent, Me for their corrections and contributions.Last edited by Grimthorn on Thu Feb 19, 2004 3:47 pm; edited 4 times in total

----------

## jimlynch11

im nominating this for the best 'first post' ever.  nice work man, and welcome to the forums. 

 ill probably be using this in a couple of weeks when i finally convince my parents to get rid of 98 on their laptop, so thank you.

----------

## GTVincent

 *Grimthorn wrote:*   

> 
> 
> -- Setup #2: NFS or Samba network shares --
> 
> [..]
> ...

 

While the entire HowTo is really very comprehensive, this point is not true. When started from the x86 1.4_RC4 CDRom, it is possible to start nfsmount and mount /mnt/gentoo/usr/portage/distfiles from another  computer after untar-ing a stage-file and creating the /usr/portage/distfiles directory, but before chroot-ing to /mnt/gentoo.

Edit: removed the huge letters for your reading pleasure... Glad to be of assistance   :Wink: Last edited by GTVincent on Tue Jun 10, 2003 12:47 am; edited 1 time in total

----------

## Me

Putting "cvs" in your server make.conf "features" should enable all useflags, even when new ones get created.

----------

## Grimthorn

jimlynch11: Thank you! I hope it helps!

GTVincent: Very much appreciated I did not know this. I have corrected the HowTo. 

Me: Great tip! I just applied it to our gateway. I've added it to the HowTo.

Thanks folks!

Take care,

Grim

----------

## Koon

Nice work !

I want to discuss some of the drawbacks of the different solutions (correct me if I'm wrong).

Setup 0 : complete rsync mirror

* 90% of the things you download are useless because you won't ever use them.

* Load on the Gentoo servers

Setup 1 : proxy cache

* work only for packages, not for the tree (or is there a way to proxy/cache rsync ?) 

* problems with expiry times of the packages

Setup 2 : network share of the /usr/portage tree

* it was not meant to do this so there is a potential simultaneous access conflict (two workstations doing emerge sync or downloading the same package at the same time)

* vulnerable setup (centralized)

Setup 3 : partial rsync gateway (your setup)

* maintenance problem : you can't easily feed the tree with new packages from the workstation, that is there is no way for the workstation to contribute to the tree. For new packages AND for every update of every package you have to do emerge -f on the gateway. 

* emerge -p shows available packages which are in fact not available on the gateway. The emerge will fail later when the package will be fetched...

* you have to download every package that could be needed for specific USE flags (using something like the magic 'cvs' feature). That means for me Gnome user I have to get and update KDE packages to support the kde USE flag while I won't ever need them

If there are easy ways of avoiding the drawbacks in your solution, please let me know. I am still looking for the perfect solution for a local (enterprise) portage tree. When I will get your opinion on this, I may post another thread to discuss the specifications for a perfect enterprise-oriented portage gateway.

Question on the implementation chosen : advantages/drawbacks of using rsync instead of wget for the packages ? I think the problems you sometimes get in fetching the packages might come from rsync compatibility problems. You can easily set up an HTTP server to serve the packages using wget ? Or do I miss a point ?

Thank you for your patience !

-K

----------

## hackertype

I just want to point out that there is trouble right here in River City.

I followed the instuctions to set up the mirror on my local network.  I can emerge the entire portage tree just fine, but when it comes to downloading tarballs from the distfiles folder via rsync the client hangs and then has a timeout.

A possible workaround (thanks spyderous) for this is too symlink your distfiles folder to htdocs.  Then run some webserver (I'm emerging apache right now) to allow your clients to wget the tarballs.  

On the client machines change GENTOO_MIRRORS to point to your local half-rsync-half-webserver-gentoo-mirror.

----------

## Grimthorn

Koon,

Why rsync? Mostly for simplicity's sake. It allows for the construction of a Gentoo mirror with the least amount of knowledge, build time and configuration. However, as you point out this is at the expense of automation.

Just to clarify: One doesn't have to use the "cvs" feature. It's only recommended (maybe not even needed...someone confirm?) so that the gateway provides the highest level of availability. If no one on the internal network uses kde than modify the gateway's USE flags to reflect that. Unfortunately this increases administration yet again.

If you want a higher level of automation we'll have to throw out rsync and go back to httpd. You could direct your clients to a php script that automatically retrieves the package from the web if it's not available locally. I think there have been posts on this... I'll track down the thread, post it here, and add it to the HowTo as time permits.

hackertype,

Hmm, something's amiss. What package is causing the problem? Could you post the error? We're currently using this method and have only come across a few problems that were correctable. However, we're mostly servers so perhaps you've uncovered something new. Post some more info and I'll try to figure it out.

Grim

----------

## Grimthorn

Koon,

The link I was referring to is  here.

This would be nice to implement for some of our users. As we continue to roll out Gentoo I will probably put something like this together. I'll post the complete solution here as an add on to the HowTo but if you beat me too it that would be great!  :Wink: 

Take care,

Grim

----------

## hackertype

 *Grimthorn wrote:*   

> Post some more info and I'll try to figure it out.
> 
> Grim

 

Okay then.  My rsync.conf file:

```

#uid = nobody

#gid = nobody

use chroot = no

max connections = 20

pid file = /var/run/rsyncd.pid

motd file = /etc/rsync/rsyncd.motd

transfer logging = yes

log format = %t %a %m %f %b

syslog facility = local3

timeout = 300

[gentoo-x86-portage]

#this entry is for compatibility

path = /opt/gentoo-rsync/portage

comment = Gentoo Linux Portage tree

[gentoo-portage]

#For replicating the Portage tree to internal clients

path = /usr/portage

comment = Gentoo Linux Portage tree mirror

exclude = distfiles 

[gentoo-packages]

#For distributing Portage packages (distfiles) to internal clients

path = /usr/portage/distfiles

comment = Gentoo Linux Packages mirror

```

Rsyncing from gentoo-packages simply won't work, while rsyncing from gentoo-portage works fine.

 *Quote:*   

> 
> 
> [root@www smwiki]# rsync -v rsync://10.0.0.30/gentoo-packages/zip23.tar.gz .
> 
> This is rsync[number].[country].gentoo.org.
> ...

 

This happens with any package I choose.  I can rsync ebuilds out of portage, yet I can't rsync tarballs out of distfiles.

The rsync server is version 2.5.6 and it is running on an imac.

----------

## heijs

Same error here on an AMD Athlon with the same configuration...

I really don't understand the error and I followed the guide perfectly!

----------

## zen_guerrilla

 *Koon wrote:*   

> I am still looking for the perfect solution for a local (enterprise) portage tree. When I will get your opinion on this, I may post another thread to discuss the specifications for a perfect enterprise-oriented portage gateway.

 

Just some tips for implementing gentoo at lans from my experience...

Let's say that you have a local lan of ~20 boxes and u want to install gentoo with the same setup on all the boxes.

1. Share /usr/portage via nfs from 1 box, use autofs to mount that share when needed from the other boxes. 

2. Have "emerge sync" invoked from crontab on the 'server' box once a day & that way have all boxes syncronized at once.

3. Use packages (emerge -b/-k, man make.conf/emerge) to reduce compile times. If your boxes have different arch consider compiling stuff with -march=i686. If you want to use packages for different arch's, set PKGDIR accordingly to eg. /usr/portage/packages-athlon or packages-pentium3.

4. Use distcc to reduce compile times even more.

----------

## Thorbjorn

 *zen_guerrilla wrote:*   

> 1. Share /usr/portage via nfs from 1 box, use autofs to mount that share when needed from the other boxes. 
> 
> 

 

nfs assuming you can hit nfs from client to server ..  rsync over ssh for a more secure and distributed loacal system..

On another note. I was actually doing this very thing, but i Am working on transparent proxy/cache for the package tarballs.  Which will then solve the issue of use flags and having to manually emerge -f  on the server.  Under this scheme your clients will use  htpp://yourserver/gentoo-cache/ for the FETCHCOMMAND.. This will in turn proxy the connect to whatever server for the package and then cache the downloaded tarballs. 

under this situation if you have alot of machines doing a global update you should hit with 90-100% cache accuracy ( depending on the package and USE flags) and obtain maybe a better reduction of bandwidth on the external link because your only fetching what is needed.. not everything you _might_ need ( under features=cvs ). Anyhow im doin the proxy cache setup right now, and I will post here with my configs and all that. 

Great howto By the way!.

----------

## Thorbjorn

here is my mod_proxy with caching support. 

tis is the first time i setup mod_proxy in apache, but it is working fine for me. I need to tweek with the cache settings a bit maybee but I would exspect you all would need to. One more thing i want to do is get the ftp proxy working. My initial attempots at gettin that goin were not succsefful. I get a segfault on child processes when trying to proxy ftp.. I am not sure if this is due to the way the mod_proxy was compiled or not, and not alot of info on google about it. Anyone know whats goin on there ? 

My approach was to setup an internal caching proxy on a virtual host as not to interupt anything else runnign on this box.  I assum you know how to uncoomment the LoadModule directives in your apache configs ( mod_proxy is a std module) and I assume you are using apache 1.3.x ( for 2.0 your config would be a bit different). I also assume you know how to setup a virt host, but you can slap this in a location directive if you want to.

Here is my apache virtual host: 

```

<VirtualHost  yourwhatever >

     # we dont want to proxy just do ProxyPass This is a security measure:

    ProxyRequests Off

     

    # where the cache files live ( apache needs to be albe to write here ) 

    CacheRoot "/somedir/cache/httpd"

    CacheSize 102400

    CacheGcInterval 8

    CacheMaxExpire 168

    CacheLastModifiedFactor 0.1

    CacheDefaultExpire 72

     

     #setup the site we want to proxy

    ProxyPass /gentoo/ http://csociety-ftp.ecn.purdue.edu/pub/gentoo/

     # setup some restrictions on who can connect 

    <Directory proxy:*>

        Order deny,allow

        Deny from all

        Allow from  <your domain>, <yourip>

    </Directory>

</VirtualHost>

```

and Viola any request to yourvirt/gentoo/ is transparently proxied, and all downloads from said URL are cached on your local server... 

Each of the seven Cache directives, controls how the server handles caching. Setting the CacheRoot enables caching on the server. This directory must be writable by the user running the server (usually "nobody"). The CacheSize sets the desired space usage in kilobytes. You will probably want to set this higher than the default of 5, based on your available disk space, to allow the greatest number of documents to be stored locally, thus allowing local cache access by the clients. Garbage collection, which enforces the cache size, is set in hours by the CacheGcInterval. If unspecified, the cache size will grow until disk space runs out. CacheMaxExpire specifies the maximum number of hours for which cached documents will be retained without checking the host server. If the origin server for a document did not send an expiry date, then the CacheLastModifiedFactor will be used to estimate one by multiplying the factor by the time the document was last modified. If the protocol used to retrieve a document does not support expiry times (FTP, for example), the CacheDefaultExpire directive specifies the number of hours until it expires.

now just edit your make.conf to point to yourserver like so:

```

  GENTOO_MIRRORS="http://myproxy/gentoo" 

```

and all is good. sit back and enjoy not sucking up all your bandwidth building  new gcc on emerge -u world on the 12 cluster nodes in your garage.. er.. or whatever you may have  :Wink: 

Edit  Added the info about the cache vars.

----------

## zen_guerrilla

 *Thorbjorn wrote:*   

>  *zen_guerrilla wrote:*   1. Share /usr/portage via nfs from 1 box, use autofs to mount that share when needed from the other boxes. 
> 
>  
> 
> Which will then solve the issue of use flags and having to manually emerge -f  on the server.

 

Actually you don't need to invoke 'emerge -f' or set flags on the server, u just emerge something on eg box1, which will download the sources in the /usr/portage of the 'server' & thus when box{2,3,4,5...} needs the same sources it uses them the same way. Fetching sources & emerge sync'ing only once (on the 'server') for the whole lan. Plus if you need packages just create them on a box & then 'emerge -k' on the others. Plus u don't need a dedicated box for that solution, the 'server' can work also as a workstation at the same time.

The proxy/rsync approach is quite sophisticated & could do the job on a large (enterprise) lan, but for 10-20 boxes I think my way is easier to setup/admin & scales better.

----------

## Koon

OK, after a bit of research, I think there is only 3 good solutions to the problem, the best depending on what you exactly need.

1/ NFS Share the tree and distfiles

(+) no need to emerge sync on the workstations

(-) risk (tiny) of access while the distfile is downloaded causing problems

This solution is best on small networks and when people auto-administrate their machines.

Question : is NFS the base way to share ?

2/ Rsync the tree, cache the distfiles

(The solution described here, with or without the proxy/cache setup)

This solution is best on medium networks and when administration is done only by a small team of admins. 

Question : IMHO proxy/cache is better than doing emerge -f... Any pros to the emerge -f ?

3/ PortageSQL (see breakmygentoo)

(+) Accounting of what's installed, where...

(-) not available yet !

This solution is best on large networks, since accounting becomes quickly necessary.

I think I will finally go for the solution 1 since we only have half a dozen boxen and 3 of them are directly administered by their primary users (they use portage directly).

Maybe we should describe all solutions in the HOWTO so that it becomes a definitive guide describing all the options you have for using portage in a LAN, with their relative pros and cons.

-K

----------

## GurliGebis

Well, my installation works like this:

I have the server serving /usr/portage/distfiles over nfs, so the clients mount it in their /usr/portage/distfiles . The clients fetches the distfiles from the webservers like normally, but since they all have /usr/portage/distfiles mounted from the server, the file only needs to be downloaded once.

The server also runs the rsync daemon, so the clients can rsync against it, and thereby save bandwidth.

The server rsyncs once a day.

----------

## narksunamun

Hi,

Here is my solution to mirror Portage Tree in a centralized network of Gentoo Linux computers. I've a server and serveral clients. On the server, I maintain the Portage tree for all comp. The first portage tree in located in /usr/portage_client. An rsync daemon runs on the server and shares /usr/portage_client. Each client has only the server has its Gentoo Portage Tree mirror in /etc/make.conf. The second portage tree is located in /usr/portage on the server. This directory is the location of is used when I want to get the last Gentoo portage tree from an official Gentoo mirror by an emerge rsync.

When I want to update the portage tree in /usr/portage_client with the most recent portage tree in /usr/portage, I do it in three steps :

1. I stop the rsyncd daemon which runs on the server

2. I copy the content of /usr/portage in /usr/portage_client

3. I start the rsyncd daemon

Each client runs an emerge rsync each day at 2 a.m.

The server has the most recent portage tree.

That's all !!!

----------

## Grimthorn

hackertype, heijs

I've finally been able to recreate your error messages. Here's the scoop:

Shortly after I posted the HowTo, Gentoo updated the init.d script that handles the rsync daemon. If you had been diligently doing emerge sync and emerge -u system than you would have picked up this new script. Essentially they've added a parameter that tells rsync to compress the files before transferring them. Unfortunately when rsync tries to compress an already compressed file (Gentoo packages) it craps out.

To fix the problem:

Edit your rsyncd init.d script as follows:

```
nano -w /etc/init.d/rsyncd
```

Find this line:

```
RSYNC_OPTS="--safe-links --compress --bwlimit=700 --timeout=1800"
```

Make it look like this:

```
RSYNC_OPTS="--safe-links --timeout=1800"
```

Explanation:

--compress: Obviously tells rsync to compress the file before sending it. This is what's causing the problem.

--bwlimit=700: This throttles the bandwidth rsync uses. This shouldn't matter on an internal network so I've removed it. If you have bandwidth problems on your internal network than leave this parameter as is.

That's it. Let me know if it works.

Take care,

Grim

----------

## Grimthorn

As Koon suggests it would be great to add all of these configs to the HowTo so that it will become the definitive guide to creating a Gentoo gateway. Time constraints would keep me from implementing and testing every config so any help would be greatly appreciated! To start I've asked a few questions below.

Thorbjorn: Nice setup and great idea. I would like to try this. I think it would work well in our environment.

Koon: Thanks for all your "tire kicking". Your observations have helped focus our attention on the real issues.

zen_guerrilla: Thanks for the tips! Good idea about maintaining different packages for different arch's.  I had not thought of this and it could be useful in our environment.

narksunamun: I'm curious about your setup. Do you maintain two portage trees to avoid a collision between the gateway's update of the portage tree and the client's requests for the portage tree?

GurliGebis: How many clients do you support? Have you had any problems with updating the packages while a client is requesting them?

----------

## Thorbjorn

 *Koon wrote:*   

> 
> 
> Question : IMHO proxy/cache is better than doing emerge -f... Any pros to the emerge -f ?
> 
> 

 

because on all my clients i just emerge away and every thing is cached transparently ( nto the builds but the dist files ) I run many different machines ( workstations servers) and multiple architectures sun/ppc/i386. Caching of builds  makes little sense since just about no 2 machines on my network build the same thng the same way.  The proxy rsync purpose is strictly to limit my bandwidth usage.  Case in pooint this morning i had 10 machines update gzip. Only the proxy grabbed that file external and the rest grabbed it from the cache.  I didnt have to change anything in how i do my emegers from my workstations. I.E. its transparent.  Whereas the emerge -f has 2 weaknesses. #1 you gotta manually do this on the "server" #2 you have to have cvs in your FEATURES  so you can fetch all the posible depandant files before you emerge on your "clients". The transparent proxy/cache just works no need to muck with it. 

phew long rant.

EDIT: OMG i thought you were arguing that proxy/cache was NOT better thena emerge -f .. lol rack this rant up to knee-jerk  :Razz: 

----------

## Thorbjorn

 *GurliGebis wrote:*   

> Well, my installation works like this:
> 
> I have the server serving /usr/portage/distfiles over nfs, so the clients mount it in their /usr/portage/distfiles . The clients fetches the distfiles from the webservers like normally, but since they all have /usr/portage/distfiles mounted from the server, the file only needs to be downloaded once.
> 
> The server also runs the rsync daemon, so the clients can rsync against it, and thereby save bandwidth.
> ...

 

this is  a good way to go adn i have a server at home that doesthis, but i dont have the ability to have a single NFS mount across all the different internal nets.  So i use the proxy/cache to achive basically the same thing.  ( also theres always the NFS and Security issues that come along with running portmap adn nfs if your concerned about that )

----------

## hackertype

Grimthorn

Yes that was the problem.  The fix worked for me.  Thanks.

BTW emerge -f is fine with me.  I would rather rsync for both distfiles and ebuild sources.

----------

## Koon

 *Grimthorn wrote:*   

> As Koon suggests it would be great to add all of these configs to the HowTo so that it will become the definitive guide to creating a Gentoo gateway. Time constraints would keep me from implementing and testing every config so any help would be greatly appreciated!

 

We should try first to determine the list of setups we recommend. For example, is there a point to talk about "emerge -f" updates if everyone agrees the proxy/cache is better for distfiles...

For my own setup I am changing my mind every two days : now I would rather go the Thorbjorn/Narksunamun way, which works even when the gateway is down, rather than the true NFS sharing way...

 *Grimthorn wrote:*   

> narksunamun : I'm curious about your setup. Do you maintain two portage trees to avoid a collision between the gateway's update of the portage tree and the client's requests for the portage tree?

 

Maybe it's a way to control which trees are made available to the final workstations ? I was considering something like this (a way to certify a validated tree and then publish it to the other workstations...)

 *Thorbjorn wrote:*   

> OMG i thought you were arguing that proxy/cache was NOT better thena emerge -f .. lol rack this rant up to knee-jerk

 

hehehe... I was on your side, in fact. Waiting for Grimthorm to defend the emerge-f option   :Smile: 

-K

----------

## cwng

Hello, this is a very good guide. I used it and it works ... but I decided against using rsync to fetch distfiles.  I prefered a http method in GENTOO_MIRRORS so that if a source tar is not in the gateway, emerge will failover and use an alternative mirror.

To that effect, I emerged 'mini_httpd' (apache is an overkill, unless you actually already have apache set up).

----------

## GurliGebis

Currently I only have 4 machines behind the server/router.

There is no problems with accessing the files, since I let one machine download the files, and then install it on the other machine after it is done.

I think (99.99% sure) having several machines writing to the same file would corrupt it.

----------

## GurliGebis

And about the rsync problem, I have the server doing it at midnight, and the clients doing it ever 4 hour. (Not hitting midnight)  :Smile: 

----------

## hackertype

When I emerge on a client machine it would be nice if the client would rsync the binary from my rsync mirror.  Most of my machines are the same arch. and binaries are compatible.

Will changing the PKGDIR to point to the rsync mirror work?  If not I suppose I could rsync the packages dir on a regular basis. 

Any idears?

----------

## swtaylor

Within my network, i've got the portage rsync mirror (to which a script also pulls down the latest gnome ebuilds from breakmygentoo.net) and applies to my tree because i like to live on the edge  :Wink: 

Since the sources that my machines download tend to have a lot in common, i fired up a vsftpd instance on my rsync server to allow anonymous read-only access to the files within my /usr/portage/distfiles, and then i add it by hand as the first server in the GENTOO_MIRRORS list of all the other machines on my network. This way, any packages this server has already acquired is immediately available at full LAN speed to the rest of my network. Packages not on there just failover to the typical search of mirrors on the net.

The typical day-to-day emerge -U on that system keeps the distfiles fairly current, and i'll execute an emerge --fetchonly on it to pre-cache builds which i don't necessarily want installed on it, but want available for my other machines.

----------

## Macros73

The only problem with using rsync, unless I'm grossly misunderstanding something, is that emerged distfiles will still take up space on the local client.  

On our small network, I have one server configured as an rsync master for 'emerge sync', and as an NFS master for sharing the /usr/portage/distfiles directory.  All clients rw access to distfiles, which I admit is an insecure nightmare waiting to happen.  All clients also have access to the Internet for wget.

This has a couple beautiful effects, however:

1.  The client systems don't store packages locally.  When they emerge a package, it uncompresses from the NFS share.

2.  If the requested package doesn't exist on the NFS share, it is automatically downloaded and added to the NFS share by the client.

The Ideal Solution would be a Portage Proxy that would work something like this:

1.  Client requests package X from server.

2.  Server downloads package X automatically if it is not already available.

3.  Client install the package, deletes work and package files from local system afterwards.

4.  (Optional) Client notifies server package installed for accounting purposes.

----------

## TheQuickBrownFox

I'm converting some labs to gentoo, but I'm still trying to figure out how to automagically deal with config files.   :Rolling Eyes: 

The setup:

One rsync server for the whole campus, also mirroring all the distfiles. (rsync + ftp)

Two NFS/NIS servers. One for undergrads, one for postgrads, each serving different users in different labs.

Different labs have different hardware, but all the hardware in a lab is identical.

So what I'm doing is this:

1. On each server, create a directory, say /home/gentoo (I used /home/gentoo, because /home is where I have space) to have a chrooted lab "image". 

2. Then install the whole lab image (except maybe the kernel) with the USE flags specific to this lab configuration in this directory, using "emerge -b" so that binary packages are built and installed. 

3. Export /home/gentoo/usr/portage (ro) Run "emerge sync && emerge -b world && etc-update" in this chrooted environment when neccessary to update the clients.

4. On the client machines, just mount server:/home/gentoo/usr/portage on /usr/portage. Then all the client needs to do, is "emerge -k world" to sync.

Keeping a whole chrooted distro around may sound like overkill, but when the time comes that you want to quickly install a client machine with a known good configuration, you will be thankfull. Using a previous client is not feasible, because you can't garauntee that it wasn't messed up (or rooted) in some obscure way by someone.

There is one more problem: How do I deal with config files after upgrading packages? Would it be feasible to have another set of ebuilds, like (system, world, local) so that I could "emerge -u local" after "emerge -u world" to fix the configs?

----------

## thinair

I this tip seem to be very good.. I haven't try it for the moment, but I will do it next week.

I would like to know, if you allow that I translat it in french as a tutoriel (howto) for

http://doc.gentoofr.org and probably I will some usefull part.

Naturally I would notice that you are the author of this post (if you want), but to do that I need that you PM me your name (or just let your nick) and an e-mail for contact purpose (in case it's neccessary).

thanx,

thinair

----------

## Koon

So many different setups... all very good for their purposes. It might be difficult to promote some reference setups since there are many many options.

Thinair : before translating, you might prefer to wait for a more comprehensive version of the HOWTO (one which would include description of other setups, like the NFS tree sharing or the proxy/cache of the distfiles...).

We're still in the process of collecting information on people setups, then we'll have to choose what to promote in the HOWTO (that will be the difficult part) then share the work of writing parts of the HOWTO, everything being merged by Grimthorn, since he is the first poster...

So : if you use a different setup to achieve the same purpose, please post your architecture here !

-K

----------

## thinair

I am not currently using centralized rsync... but I am planning to use it for my  small network.

If I found other solution or a more complete way to answer to this problem. I will post it here too.

++

thinair

----------

## Yak

My solution for packages was to setup proftpd on my 'portage gateway' (it's behind a firewall) and put the distfiles directory under the ftp directory and allow anon ftp access from local network. Not too elegant, but was quick to setup and still works like a charm. This way, all there is to setting up a client is adding your portage gateway first in GENTOO_MIRRORS as ftp://yourgateway

I guess this would have the advantage of being quick and easy to type in clients' /etc/make.conf, and fetches files automatically if they are not on your server. Only problem is then that new package is on the client and not the server.. o well there's always nfs and cp -u to get them back to the server   :Laughing: 

(I can laugh b/c I only mess with 2 or 3 not 20+ gentoos) But these kinds of tips saved me DAYS of downloading at 24Kbps    :Twisted Evil:   Good to see it all wrote up so nice in one place.

----------

## savage

Update !  see end of page 3 for a better solution (and one that works with the latest Gentoo!)

Great work Grimthorn!

Here is my modification that works well (automatically downloads files as they are requested if they are not already on my gateway server).

Uses apache with php4 and a simple C program.

Here is the php4 script: (just stuck in /home/httpd/htdocs/getFile.php on the server)

--start getFile.php--

```

<?

$packageSrc = urldecode($_GET["packageSrc"]);

$packageName = urldecode($_GET["packageName"]);

if($packageName != "")

{

  @$fileHandle = fopen("/usr/portage/distfiles/" . $packageName, "r");

  if(!$fileHandle)

  {

    exec("/usr/local/sbin/getPackageFromMirror $packageSrc $packageName");

    @$fileHandle = fopen("/usr/portage/distfiles/" . $packageName, "r");

    if(!$fileHandle)

    {

      print "Unable to get File from remote Server\n";

    }

    else

    {

      fpassthru($fileHandle);

    }

  }

  else

  {

    fpassthru($fileHandle);

  }

}

?>

```

--end getFile.php

Here is the c program.  Compile it with:

gcc -s -o getPackageFromMirror getPackageFromMirror.c

cp getPackageFromMirror /usr/local/sbin/getPackageFromMirror

chown root.root /usr/local/sbin/getPackageFromMirror

chmod 4755 /usr/local/sbin/getPackageFromMirror

^^^^^^^^^

Yes, I know that has security implications, but if you are savy enough to see that, you are probably savy enough to see it is only a problem if you have other users on your gateway box.  If this is your setup, I suggest sudo or something like that.  Basically, the binary needs to be owned by whoever can write a file to your /usr/portage/distfiles directory on your server, and needs to be suid that user.

--start getPackageFromMirror.c

```

#include <stdio.h>

#include <unistd.h>

#include <string.h>

#include <strings.h>

#include <errno.h>

int main(int argc, char* argv[])

{

  extern int errno;

  int i=0;

  char wgetCommand[1024];

  char fileTarget[1024];

  if (argc != 3)

  {

    printf("Usage: %s <packageToRetrieve> <packageName>\n", argv[0]);

    return(-1);

  }

  bzero(wgetCommand,1024);

  bzero(fileTarget,1024);

  snprintf(wgetCommand,1024,argv[1]);

  snprintf(fileTarget,1024,"/usr/portage/distfiles/%s",argv[2]);

  printf("/usr/bin/wget -q -N -O \"%s\" \"%s\"\n", fileTarget,wgetCommand);

  execl("/usr/bin/wget","wget","-q","-N","-O",fileTarget,wgetCommand);

  printf("%s\n",strerror(errno));

}

```

--end getPackageFromMirror.c

///end server setup.

This setup required few modifications to make.conf (on the clients).   Go ahead and setup / leave your setup however with you GENTOO_MIRRORS setting - this will tell the server where to try to get the file from.

The only line you need to modify is the FETCHCOMMAND.  Observer the one I use:

```

FETCHCOMMAND='/usr/bin/wget -t 5  http://[proxy]/getFile.php?packageSrc=\${URI}\&packageName=\${FILE}  -O \${DISTDIR}/\${FILE}'

```

With this setup, it is possible to use the downloads on your proxy box (no need for a seperate cache); also, any files requested by any clients that are not on the local machine are downloaded from the appropriate source.

Hope someone out there finds this cool and useful.  :Smile: Last edited by savage on Wed Feb 04, 2004 3:27 am; edited 3 times in total

----------

## neptune

Hello, nice trick, but I'll try to do simpler.

One box that shares /usr/portage and the others that mount it (eg: /mnt/portage). on the others, modifiy /etc/make.conf to have PORTDIR=/mnt/portage and DISTFILES=${PORTDIR}/distfiles

The box that shares do a rsync every day. the others doesn't need to do it. If one box download a source, the others doesn't need to download it again.

what do you think of it?

----------

## paulwoodwarduk

had to adjust the custom fetchcommand slightly - it'll catch out other "cut 'n paste" monkeys too!

FETCHCOMMAND=rsync rsync://<your Portage gateways IP or DNS>/gentoo-packages/\${FILE} {DISTDIR}

should be:

FETCHCOMMAND=rsync rsync://<your Portage gateways IP or DNS>/gentoo-packages/\${FILE} ${DISTDIR}

had me going for 15 mins!  Is it possible for the original "HOWTO" to be edited, especially with the mod required to /etc/init.d/rsync to remove compression etc?

----------

## icarus1983

 *Thorbjorn wrote:*   

> here is my mod_proxy with caching support. 
> 
> tis is the first time i setup mod_proxy in apache, but it is working fine for me. I need to tweek with the cache settings a bit maybee but I would exspect you all would need to. One more thing i want to do is get the ftp proxy working. My initial attempots at gettin that goin were not succsefful. I get a segfault on child processes when trying to proxy ftp.. I am not sure if this is due to the way the mod_proxy was compiled or not, and not alot of info on google about it. Anyone know whats goin on there ? 
> 
> My approach was to setup an internal caching proxy on a virtual host as not to interupt anything else runnign on this box.  I assum you know how to uncoomment the LoadModule directives in your apache configs ( mod_proxy is a std module) and I assume you are using apache 1.3.x ( for 2.0 your config would be a bit different). I also assume you know how to setup a virt host, but you can slap this in a location directive if you want to.
> ...

 

Yes, that's the best solution so far... I thought... Untill I tried it.

After putting it all together I tested it by emergeing a plugin for gkrellm, gkrellmoon.

It worked, and I browsed my cache server to look for the file and when I found it, it looked like this.

```
 60K -rw-------    1 apache   apache        54K Jul 14 23:32 /home/cache/httpd/2/r/X/utNJZyzeNb@jiS7HhJw
```

the file on the client looks like this:

```
 60K -rw-r--r--    1 root     root          54K Mar 24 17:11 gkrellmoon-0.6.tar.gz
```

Maybe it's me, but if I'm going to be caching portage I'd like real filenames! I guess it's not possible to i.e backup this kind of cache? Is there an option in the apache config to make it cache real names ?

----------

## r_yuan

 *cwng wrote:*   

> [...] I prefered a http method in GENTOO_MIRRORS so that if a source tar is not in the gateway, emerge will failover and use an alternative mirror.
> 
> To that effect, I emerged 'mini_httpd' (apache is an overkill, unless you actually already have apache set up).

 

I have researched a python-based web server. The scripting should be fairly simple. I piggybacked the portage script so that it follows the settings in /etc/make.conf.

```
#!/usr/bin/env python2.2

import SimpleHTTPServer

import portage

import os.path

BaseClass = SimpleHTTPServer.SimpleHTTPRequestHandler

class MyRequestHandler(BaseClass):

    def send_head(self):

        if self.path.startswith('/gentoo') and (len(self.path) > len('/gentoo/')):

            myfile=os.path.basename(self.path)

            try:

                return open(portage.settings["DISTDIR"]+"/"+myfile,'r')

            except IOError:

                # beware of threaded server -- unverified

                #   parallel fetch'es _may_ cause db corruption

                if portage.fetch([myfile], listonly=0, fetchonly=1):

                    self.send_response(200)

                    self.send_header("Content-type", "text/plain")

                    self.end_headers()

                    return open(portage.settings["DISTDIR"]+"/"+myfile,'r')

                else:

                    return BaseClass.send_head(self)

        else:

            return BaseClass.send_head(self)

def test():

    SimpleHTTPServer.test(MyRequestHandler)

test()

## Scripting the Web with Python 

# http://www.w3j.com/6/s3.vanrossum.html

# http://python.org/doc/current/lib/module-SimpleHTTPServer.html

# /usr/lib/python2.2/site-packages/portage.py

# /usr/lib/portage/bin/emerge

# /usr/portage/w3j2.py                # _this_ file [ server]

# /etc/make.conf                      # GENTOO_MIRRORS="http://myproxy:8000/gentoo" [ client ]

# http://forums.gentoo.org/viewtopic.php?t=59134

```

On the server, copy the script into /usr/portage/w3j.py. It should be owned by root and executable. Start the server with 'w3j2.py 80' (for port 80, otherwise it defaults to port 8000). On the clients, insert 'GENTOO_MIRRORS="http://myproxy/gentoo"' in /etc/make.conf.

Caveats:

1. This is quick solution (for those who don't want to install Apache). The performance of this server isn't great.

2. I have not verfied whether or not parallel fetches will cause database corruptions -- please report back to me. I'm not even sure whether or not the server is threaded in any way.

3. Fetches are not streamed ie: the file download must be completed before it is forwarded to the client. I hope transparent proxies work better than this.

----------

## 0problem.dk

 *Thorbjorn wrote:*   

> ...
> 
> Here is my apache virtual host: 
> 
> ```
> ...

 

As you said, it is a bit different on Apache 2.

The above will give you the files you request, but it will not do the caching, which was the primary purpose of the setup.

However, you only need one more line to make it work as intended:

```
CacheEnable disk /gentoo/
```

----------

## Lightspeed

I wanted to setup something similar on my network, but the only machine I have which is permanently running is a Windows Server 2003 box (although it does at least have cygwin installed). So I came up with a solution that works for a Windows machine:

Considering the uncertain current state of the gentoo-cygwin project, I decided to manually schedule the server to download the latest portage tree snapshot from my nearest gentoo mirror every day and unpack it. The portage tree is then shared as read only and mounted by my gentoo machines over samba to /usr/portage. Before the server attempts to update the portage tree it will first make a copy of it, then unshare the main tree directory and share the copy instead. The updated tree is placed in the main directory and the share is moved back to that folder, then the copy is deleted.

For the distfiles, I have a central distfile folder which is served up using Apache (I had originally tried using IIS 6 simply because that is what the rest of my intranet site uses, but for some reason that I still cannot understand all .tar.bz2 files were giving me 404 errors, when others like .tar.gz were downloading fine). The gentoo machines point to this webserver as their primary mirror in /etc/make.conf. The server also has a separate rw shared directory for each gentoo machine, which the gentoo boxes mount using samba into /usr/portage/distfiles. So the distfiles that a gentoo machine downloads, whether from the LAN server or the internet, are still kept on the server, and they don't interfere with other machines emerging at the same time. Every day the server will then check to see if the gentoo machines' personal distfile directories contain anything that the main central distfile folder doesn't have, and if so it copies them across.

You may think that this method of having a separate distfile share on the server for each gentoo box is ridiculously wasteful in space, but I am running the SIS (Single Instance Storage) groveler service that is installed with Windows' RIS (Remote Installation Services) software which will search the distfile directories for identical files, and replace them all with NTFS junction points linking to a single copy of the duplicated file.

----------

## Helena

Just a simple, perhaps basic question. What rsync command is the best?

Let me explain. I've adopted the rsync solution for several reasons. So I have one Gentoo box acting as a "central Gentoo mirror" as Grimthorn calls it (BTW thx for the guide, that's just what I needed). This one has IP address 10.0.0.200 (i'm exploring LAN name resolution, so I must use IP addresses for the moment).

I have set up a separate partition for holding all the Gentoo distribution files, although that is not required; I just find it convenient from an administrator's viewpoint. Since all I want is to distribute updates to local clients (installation is done from LiveCDs) I only mirror "distfiles" and "snapshots" from the source mirror, as suggested by the official guide http://www.gentoo.org/doc/en/source_mirrors.xml.

I also chose to set this mirror up both as an rsync- and FTP-server (not HTTP). For the rsync part I follow Grimthorn's guidelines, for FTP I've setup vsftpd and xinetd. So I've now edited all client's /etc/make.conf to contain the following extra lines:

```
SYNC="rsync://10.0.0.200/gentoo-portage"

GENTOO_MIRRORS="ftp://10.0.0.200/gentoo"

```

For the server itself, I use a slghtly different form, as a prelude to name resolution which I still want to implement om my LAN later (I've already read some threads about this, but any help would still be welcome...)

```
SYNC="rsync://localhost/gentoo-portage"

GENTOO_MIRRORS="ftp://localhost/gentoo"

```

After some experimentation I now use the following rync commands, where /mnt/rsync/gentoorsyncmirror/ points to my local mirror:

```
time rsync --update --verbose --recursive --stats --progress www.ibiblio.org::gentoo/distfiles /mnt/rsync/gentoorsyncmirror/gentoo

```

for the packages, following the official guide,

```
time rsync --update --verbose --recursive --stats --progress ftp.snt.utwente.nl::gentoo-portage /mnt/rsync/gentoorsyncmirror/gentoo-portage

```

for the Portage tree (since ibiblio.org does not mirror the Portage tree I chose the closest mirror), and

```
time rsync --update --verbose --recursive --stats --progress www.ibiblio.org::gentoo/snapshots /mnt/rsync/gentoorsyncmirror/gentoo

```

for the Portage tree snapshots, although I'm not sure why I would need them. Any suggestions for improvement?

Sorry about using the long form of command switches but since it's automated I prefer that.

----------

## brendaniabbatis

This HowTo has some great ideas.  My plan has an interesting twist I thought I would share:

I intend to use my laptop for a Gentoo gateway on my home network, to take advantage of the high bandwidth at work to bring the updates home.  I have no Internet access at home, an easy security solution.

The gateway is the only system connected to the Internet, and only at certain times.  It will update the portage tree by rsync each time it is connected, but a maximum of once per day.

Clients access the portage tree and distfiles by read-only nfs mount.  Distfile fetches are directed to the proxy, but always fail.  The gateway then logs the request and retrieves the needed files the next time it connects to the Internet, and the clients can try again later.

The portage tree is only available when the laptop is home, but I am the only one who maintains the systems so it does not matter.  Alternatively, the tree and distfiles could be copied to clients using rsync, or didn't I see somewhere once an automatic synchronizing filesystem?

The drawback to this is that updates are not immediately available, but so far my experience with Gentoo is that compiling two or three packages a day is as much as I want to do, leaving the rest for tomorrow.

If I am not mistaken, distfiles are unpacked in a directory in /usr/portage, so that has to be set to something else in make.conf (?).  Seems to me that is true anytime nfs is used for the portage tree.

I had a thought as I read this.  Perhaps once a consolidated HowTo of all the good ideas here is done, an ebuild could also be written for each of the various solutions.  Maybe that's a real n00b idea.

----------

## bendy

Hi,

I've been trying to set up a "half and half" solution i.e. I set up an rsync mirror on my lan for the portage tree only, and then share /usr/portage via nfs from the same machine.

If I mount /usr/portage on the client, then run emerge sync, I get the following:

```

root@laptop root # emerge sync

>>> starting rsync with rsync://newserver/gentoo-portage...

This is rsync[number].[country].gentoo.org.

receiving file list ...

70284 files to consider

io timeout after 180 seconds - exiting

rsync error: timeout in data send/receive (code 30) at io.c(103)

rsync: connection unexpectedly closed (1655051 bytes read so far)

rsync error: error in rsync protocol data stream (code 12) at io.c(165)
```

However, if I run emerge sync with /usr/portage unmounted then the sync works but I get all the usual directories written locally to the (unmounted) /usr/portage.

Is this normal behaviour?

bendy.

----------

## flybynite

Setting up your own rsync works the gentoo way for emerge syc.  But what about emerge -u world???

Well, I've been looking for a way to speed up my system.  I have a fast lan but still had to waste bandwidth downloading files multiple times.  All the options listed in this thread seem to have some serious drawbacks.  Since I couldn't find what I wanted -  I tried to make a better option.

Check out tsp-cache  

https://forums.gentoo.org/viewtopic.php?t=110973

tsp-cache...  It is a cache designed for gentoo that works with no changes to client systems... only adding a mirror to make.conf 

Advantages:

1. Bandwidth savings. The file is only downloaded once no matter how many clients request the file.

2. No Portage problems. tsp-cache is transparent to clients. tsp-cache is just listed as one of the many download mirrors in make.conf.  Failure of the tsp-cache host machine will not cripple the clients. They will automatically use the next available mirror like normal.

3. Streaming to clients. tsp-cache will download the file from a mirror while simultaneously streaming the file to multiple clients. Therefore, client emerges will not fail even though the files are not in the cache.  I can even emerge -u world on multiple machines and all file updates only get downloaded once - I don't even have to wait for the first machine to finish downloading the updates before starting the update on my other machines!!!

4. [b]Reports.[/] tsp-cache can create html reports of cache efficiency and bandwidth.

I've got a full HOWTO in the thread https://forums.gentoo.org/viewtopic.php?t=110973

----------

## tiny

Hi!

I'm a gentoo noob so plz go easy on me.  :Very Happy: 

Otherwise I have 6-7 years of experience with Linux.

I recently installed Gentoo at my home and I liked it immediately.

Just the thing I was searching for in Linux.

Ok to cut things short. I'd like to install Gentoo at my work to.

I'm in control of 3 machines all running Linux. One of them could be

my Gentoo gateway, since bandwidth over the day is very poor and 

downloading would require a lot of time. 

But this machine isn't running Gentoo distribution and it needs to stay this 

way (for now). The only thing Im looking for is fast network access to 

ebuild packages. Portage can be on the machine I'm installing Gentoo to.

Question is how can I keep packages on my server machine (gateway)

up to date?

Lets say I put them in apache directories.  What is the simplest method of 

keeping those packages up to date on regular bases?

I was thinking in this way ... simple bash script with wget would do it.

I decide which packages I wont, I DL them and later on I keep

them updated. Only how would I keep them updated so I dont burn a lot of 

bandwith.  Is there posibilty to use rsync from my Linux distribution? 

Or something like that.

Storage space isn't problem. 

Oh BTW, how much space would take if I DL all the packages. I'm not 

gonna, I'm just wondering.

regards,

Tiny

----------

## Helena

 *tiny wrote:*   

> Hi!
> 
> I'm a gentoo noob so plz go easy on me. 
> 
> 

 

Let's face it, we are all n00b's...

 *tiny wrote:*   

> I'm in control of 3 machines all running Linux. One of them could be
> 
> my Gentoo gateway, since bandwidth over the day is very poor and 
> 
> downloading would require a lot of time. 
> ...

 

OK what you want are the files from the "distfiles" directory. I agree to that, since they can be quite large, up to several hundreds of MB. BTW, the total size is 21 GB right now!

 *tiny wrote:*   

> Question is how can I keep packages on my server machine (gateway)
> 
> up to date?
> 
> Lets say I put them in apache directories.  What is the simplest method of 
> ...

 

I started trying to do this from Windows 2000 but soon gave up on the idea. "rsync" would seem to be the most reliable method, and I don't see why it wouldn't work from other Linux distro's, but I don't know for sure. Problem is to limit the distfiles directory size, however.

The nicest solution seems to be the proxy method as described elesewhere in this thread, but I have no experience, I just mirror the whole lot.

----------

## flybynite

 *tiny wrote:*   

> 
> 
> But this machine isn't running Gentoo distribution and it needs to stay this 
> 
> way (for now). The only thing Im looking for is fast network access to 
> ...

 

This sounds like tsp-cache will also work for you.  It only requires apache on the cache machine, it doesn't have to run gentoo.  The other gentoo machines will automatically keep the cache up to date.  tsp-cache will  ensure you only download packages once.  Once in the cache, packages will be delivered at lan speeds.     Check out

https://forums.gentoo.org/viewtopic.php?t=110973

----------

## roofy

DAMN! you beat me to it.

i had the same exact system set up at work, where we have a cluster of 3 portage servers that compile -b pakages and feed then to  300 clients, and it automatically syncs and then scripts control the automatic emerge world/system, i even have notify by email working.....i was going to write a doc on the whole thing...kudos  :Smile: 

----------

## rickj

The original post is a great method, and works well for me. Saves time and net bandwidth.

Just a minor buglet:

 *Quote:*   

> FETCHCOMMAND=rsync rsync://<your Portage gateways IP or DNS>/gentoo-packages/\${FILE} {DISTDIR}

 

seems to be missing a $, mine works as:

```
FETCHCOMMAND=rsync rsync://<your Portage gateways IP or DNS>/gentoo-packages/\${FILE} ${DISTDIR}
```

Thanks for a truly useful HOWTO.

----------

## megalomax

Hi there!

I really LOVE your setup. But all the other suggestions are really confusing me like hell    :Shocked: 

I still try to figure out what to do, if a package is not on the server, thus needs to be downloaded by the server and not by the client...

Is there a way to do it with the original method..? Do I need some bash IF...THEN routines (I only know very basic bash stuff, sorry)...

Or maybe to send the package back to the server when the client downloaded it (by scp or something)

Or would I have to use a different approach???

thanks for your input...

man, I love these forums 

 :Wink: 

----------

## megalomax

Hi again ...

I don't wanna spam here, but I just saw the nice work of savage...

I followed his instructions, but I'm not sure what I get on my machine.

I did some testing, and this is what happend...

machineServer: no distfile of a certain package (<fam> in my case) present

machineClinet: old version of <fam> is present, needs updating...

1) client: emerge -U fam

2) clinet tries to get the package from the server... not present...

3) client downloads the package and installs it

4) package still not present on the server...

Is there something wrong with my make.conf? 

Should I see an erro when the php-Solution fails for some reason? 

I just recently upgradet to apache2, but I don't know if this has something to do with all this...

Is the location of the apache-htdocs dir differernt in this case?

 :Rolling Eyes: 

cheers

----------

## Ateo

This works excellent. I've just cut my stage 1 install times on workstations by 1/3. Thanks for the howto!!!

Gentoo Rocks!

----------

## savage

megalomax - just saw your message;

  am looking into what has to be done in a current gentoo setup - will update origional post and let you know when it works.

----------

## savage

ok folks,

  hollar if I am missing something or making no sense.  I seem to do that sometimes  :Smile: 

This is a way that was posted earlier by me (and now that I am back in Gentoo) it doesn't seem to work.  Here is what is working for me now.  Please give me feedback if you want / need changes.

```

<?

//put in /var/www/localhost/htdocs/getFile.php on server

$packageSrc = trim($_GET["packageSrc"]);

$packageName = strrchr($packageSrc,"/");

$packageName = trim($packageName, "/");

if($packageName != "")

{

  @$fileHandle = fopen("/usr/portage/distfiles/" . $packageName, "r");

  if(!$fileHandle)

  {

    exec(escapeshellcmd("/usr/local/sbin/getPackageFromMirror $packageSrc"));

    @$fileHandle = fopen("/usr/portage/distfiles/" . $packageName, "r");

    if(!$fileHandle)

    {

      print "Unable to get File from remote Server\n";

      exit;

    }

    else

    {

      fpassthru($fileHandle);

      exit;

    }

  }

  else

  {

    fpassthru($fileHandle);

    exit;

  }

}

?>

```

and the C program:

```

/*

put in /usr/local/src

cd to /usr/local/src

gcc -s -o getPackageFromMirror getPackageFromMirror.c

cp -v getPackageFromMirror /usr/local/sbin/

chown -v root.root /usr/local/sbin/getPackageFromMirror

chmod -v 4775 /usr/local/sbin/getPackageFromMirror

*/

#include <stdio.h>

#include <unistd.h>

#include <string.h>

#include <strings.h>

#include <errno.h>

int main(int argc, char* argv[])

{

  extern int errno;

  int i=0;

  char wgetCommand[1024];

  char fileTarget[1024];

  char* tok;

  if ((argc < 2) || (argc > 3))

  {

    printf("Usage: %s <packageToRetrieve> [packageName]\n", argv[0]);

    return(-1);

  }

  memset(wgetCommand,0,1024);

  memset(fileTarget,0,1024);

  snprintf(wgetCommand,1024,argv[1]);

  if(argc == 3)

  {

    snprintf(fileTarget,1024,"/usr/portage/distfiles/%s",argv[2]);

  }

  else

  {

    tok = strrchr(wgetCommand,'/');

    if(tok != NULL)

    {

       snprintf(fileTarget,1024,"/usr/portage/distfiles/%s",tok+1);

    }

  }

  execl("/usr/bin/wget","wget","-q","-N","-O",fileTarget,wgetCommand, NULL);

  //printf("%s\n",strerror(errno));

} 

```

put this on your "proxy box"

and update your "FETCHCOMMAND" in "/etc/make.conf" to be:

```

FETCHCOMMAND="/usr/bin/wget -t 5 -O \${DISTDIR}/\${FILE} http://[proxyBoxHere]/getFile.php?packageSrc=\${URI}"

```

Let me know.

Edit: added exit; s after termination - thanks to Aneurysm9Last edited by savage on Sun Feb 08, 2004 10:31 pm; edited 2 times in total

----------

## Aneurysm9

I'm not sure if it's related to my PHP setup or to your script, but it was adding a newline to the end of the files it was sending me, resulting in failed MD5 checks.  I eventually figured out that adding "exit;" at the end of the main if loop fixed the problem.

----------

## not_registered

I don't know what I'm talking about, but can't you use SQUID to do this somehow?

----------

## savage

Yes!

  You can use squid to do this, but all of the files are stored in a human unreadable format in caching directories (looks like ac3x5rvfdaiwldk) instead of reiserfsprogs-xxxx; etc.  Also, when you are burning hard disk space to store that and your distfiles on a computer, you are shelling out 2x as much hard disk space as if you do it the way above.

  savage

----------

## linkfromthepast

Here's a little perl script I wrote to take care of the package download and serving.  We use this on our internal network of approx 100 nodes.  So far everything seems to work correctly, although I'm sure there is alot of room for improvement.  As far as security is concerned, there is none.  I'm sure alot can be built in, but we filter by address w/ ipfilter so I didn't feel it was necessary.  For some this may be complete garbage, others may use it, but if you find anything in it useful then I think it was worth posting.

File: dist.pl

```

#!/usr/bin/perl

########################################################

#       GENTOO LOCAL MIRROR

#This script provides localmirror functionality for gentoo

#Simply set the mirror on the client machines to the webserver

#that is server this script

########################################################

########################################################

#       OUTLINE

# 1.) Get request for a file, if the file is not present, a 404 error takes place and through .htaccess this script is called

# 2.) Then the script downloads the file from one of the mirrors listed below, saves it in a web accessible directory, and sends

#       a Location tag to wget that emerge is using to redirect it to the file

# 3.) If the file DOES exist, the script simply redirects the browser

#

#

########################################################

########################################################

#       INSTALL

# 1.) Put the .htaccess file in the directory which you want your Gentoo mirror in make.conf to point to

# 2.) Edit .htaccess to point to this script

# 3.) Make sure in apache.conf there is an entry allowing the script to run in that directory

#       EXAMPLE: ScriptAlias /dist /var/www/localhost/htdocs/

########################################################

use CGI;

#used to redirect client to new localtion of file

$address="http:///your.address/dir";

#location of wget used to get files

$wgetlocation = "/usr/bin/wget";

#switches passed to wget

$wgetswitches = "-nc -c -t 5 --passive-ftp ";

#directory to put the new dist files in

$wgetputdir = "/var/www/localhost/htdocs/distfiles";

$distdir = "distfiles/";

#mirrors to use to get the gentoo files from

@mirrors = ("ftp://ibiblio.org/pub/Linux/distributions/gentoo/"," ftp://mirror.iawnet.sandia.gov/pub/gentoo/","ftp://gentoo.ccccom.com"," http://128.213.5.34/gentoo/");

#this is the ENV var which holds the address that was attempting to be accessed

$url = $ENV{"REQUEST_URI"};

#split the input

@parts = split(/\//, $url);

#count the parts

$count = $#parts;

#get the last part, which is the filename

$filename = $parts[$count];

#do we need distdir?

#$url = $mirrors[0].$distdir.$url;

if(!(-e $wgetputdir."/".$filename))

{

#create the url for the file to get with wget

$url = $mirrors[0].$url;

$command = $wgetlocation." ".$wgetswitches." ".$url." -P ".$wgetputdir;

#run the command

open(FILE, "$command |");

$output = <FILE>;

close(FILE);

}

#create a new CGI object for redirect

$query = new CGI;

#redirect

print $query->redirect($address.$filename);

```

File: .htaccess

```

ErrorDocument 404 /dist/dist.pl

```

----------

## La`res

linkfromthepast - Could you be more detailed in the Install intructions.  They seem a little vague to me.  Mind you I'm relitavely new to apache.

----------

## linkfromthepast

Which part are you having problems with?

1.) Perl script goes in a directory on your server which you have set to execute scripts in the apache.conf.  

2.) Change the variables in the Perl script based on your setup.  

3.) Put the .htaccess in the folder where all your dist files will live.  

So when a client requests a file in that directory, and the file does not exist, the .htaccess is used, which calls the script to download the file, then redirects the client to the file to download.  Sorry for the run on  :Smile: 

Also, one thing I've noticed is if you don't have a bandwidth, either Perl or Apache stalls the wget process.  I'm leaning towards Apache because it actually owns the wget process, but not quite sure.  So as long as you can download your dist file in under ~30secs you'll be ok.  Although larger dist files like those for KDE should probably be done manually until the problem is fixed.

If that is still to broad an explanation, please post some specific questions.  Good luck  :Smile: 

----------

## modnemo

I can rsync files no problem using rsync...

```
rsync rsync://192.168.0.1/gentoo-packages/zip23.tar.gz
```

But when I emerge anything I get this error...

```

>>> emerge (1 of 20) net-fs/samba-2.2.8a to /

!!! File system problem. (Bad Symlink?)

!!! Fetching may fail: [Errno 2] No such file or directory: ''
```

any ideas?

----------

## modnemo

OK....so nevermind my previous post...really dumb error.

Make sure (absolutey sure) that if you are typing in the variables in your make.conf file you use ${FILE} and not $(FILE) becasue it doesn't work.

Let me reiterate myself...if you are having weird errors while doing an emerge, but emerge sync works just fine... make sure when using the shell variables use { and not ( 

Thanks for the HOWTO, it was awesome...I was looking for a solution to emerge packages on my Fujitsu Stylistic 1200 tablet, without having to connect it to the internet (wireless support sucks for ADM8211 based cards).  I looked into NFS but its alot of setup and kernel-required options which I didn't install.  I was able to do the portage/rsync server in about 10 steps (and an hour of frustration trying to figure out what I typed wrong   :Very Happy:  ).

----------

## linkfromthepast

modnemo : Which method/setup did you use?

----------

## Merlin-TC

First of all thanks a lot for that guide.

I am using a local rsync server now to sync the tree and use an nfs share for the distfiles.

What I'd like to know is if there is an option in the rsync server that let's me cache the portage tree somehow.

The machine it's on is a k6-3 450 with 256MB ram but the harddist is kinda slow so the bottleneck is the harddrive.

Is there any way to chache at least some parts of the portage tree into ram or to preread it?

----------

## razamatan

what if i want to synchronize /usr/local/portage (a portage overlay)?  i tend to roll my own ebuilds, so i'd like to sync this directory (and *just* this directory) between two machines.

----------

## freshy98

 *linkfromthepast wrote:*   

> Which part are you having problems with?
> 
> 1.) Perl script goes in a directory on your server which you have set to execute scripts in the apache.conf.  
> 
> 2.) Change the variables in the Perl script based on your setup.  
> ...

 

Could you please explain where to put the files? what is /dist? does it need be named that way, or is it an example?

In the Perl script you talk about

```
#used to redirect client to new localtion of file

$address="http:///your.address/dir";
```

while a little bit further you talk about

```
#directory to put the new dist files in

$wgetputdir = "/var/www/localhost/htdocs/distfiles";
```

Isn't the /dir from the address line the /distfiles from the wgetputdir line?

It is very confusing.

Please try to explain a bit more thorough and use examples from your own sytem(s).

Thnx

----------

## arkane

 *razamatan wrote:*   

> what if i want to synchronize /usr/local/portage (a portage overlay)?  i tend to roll my own ebuilds, so i'd like to sync this directory (and *just* this directory) between two machines.

 

rsync --rsh="ssh -C" -plarvz username@servermachine:/usr/local/portage /usr/local/portage

That'd do it... set-up the SSH keys between the machines if you want to automate it.  When you say *just* this directory, you mean not the subdirectories of it?  Because IMHO that'd be pointless....

----------

## razamatan

 *arkane wrote:*   

>  *razamatan wrote:*   what if i want to synchronize /usr/local/portage (a portage overlay)?  i tend to roll my own ebuilds, so i'd like to sync this directory (and *just* this directory) between two machines. 
> 
> rsync --rsh="ssh -C" -plarvz username@servermachine:/usr/local/portage /usr/local/portage
> 
> That'd do it... set-up the SSH keys between the machines if you want to automate it.  When you say *just* this directory, you mean not the subdirectories of it?  Because IMHO that'd be pointless....

 

yes, recursive, cus it'd be pointless otherwise...  :Wink: 

i tried this method, but it doesn't work..

```
rsync --rsh="ssh -C" -uavz username@servermachine:/usr/local/portage/ /usr/local/portage
```

it complains about permissions (writing locally), but i have write perms via group membership..[/code]

----------

## linkfromthepast

You are correct, $address is the actual web address of the directory.  $wgetputdir is the absolute filesystem path for the directory represented by $address.  For example, /var/www/localhost/htdocs/distfiles would be the absolute path.  But Apache is configured for /var/www/localhost/htdocs to be the root.  So http://your.address/dir would be /var/www/localhost/htdocs/dir.  You can change to name to whatever you like.

The overall purpose of this is so that when a client requests a file in http://your.address/dir and the file does not exist, the Perl script downloads the file and tells the client to try again now that the file has been downloaded.  This is what the .htaccess file is for.

----------

## linkfromthepast

Also keep in mind that all web addresses are relative to the root of the web server.

#used to redirect client to new localtion of file

$address="http:///your.address/dir";

#location of wget used to get files

$wgetlocation = "/usr/bin/wget";

#switches passed to wget

$wgetswitches = "-nc -c -t 5 --passive-ftp ";

#directory to put the new dist files in

$wgetputdir = "/var/www/localhost/htdocs/distfiles";

$distdir = "distfiles/";

#mirrors to use to get the gentoo files from

@mirrors = ("ftp://ibiblio.org/pub/Linux/distributions/gentoo/"," ftp://mirror.iawnet.sandia.gov/pub/gentoo/","ftp://gentoo.ccccom.com"," http://128.213.5.34/gentoo/"); 

$address, $distdir are relative paths

$wgetlocation, $wgetputdir are absolute paths

You might notice that $distdir isn't needed, so you don't need to configure it.  I'm not sure why I left it in the script.  

$address should be configured with the web address the client will use for trying to download the file.  You can test this w/ a web browser.

$wgetputdir is the location wget puts the files it downloads.  So if a client requests http://127.0.0.1/gentoo-files/x.y.z.tar.gz then wget will download the file into the $wgetputdir directory.

Example:

$address = http://127.0.0.1/distfiles

$wgetputdir = /var/www/localhost/htdocs/distfiles

One last note, you need to change the @mirrors servers to the servers that are the fastest for you.  These servers may not be the quickest, they're just default servers I plucked from the /etc/make.conf

----------

## freshy98

ok, let me see if I get this right.

instead of /usr/portage/distfiles I now need a /var/www/localhost/htdocs/distfiles which holds the .htaccess.

I think I will make a /var/www/localhost/htdocs/distfiles a symlink to /usr/portage/distfiles.

----------

## freshy98

linkfromthepast, it does not seem to work for me. I editited the perl script to my liking but when I do a emerge -f package for example, it just freezes.

I have this on my portage gateway (192.168.1.20):

```
GENTOO_MIRROS="http://192.168.1.20/distfiles/"

SYNC="rsync://192.168.1.20/gentoo-portage"

PORTDIR=/usr/portage

DISTDIR=/usr/portage/distfiles

PKGDIR=/usr/portage/packages
```

plus I have a symlink that tells /var/www/localhost/htdocs/distfiles is actually /usr/portage/distfiles. Here I also have the .htaccess file.

The perl script is in /dist/dist.pl.

```
$address="http:///192.168.1.20/distfiles";

$wgetputdir = "/var/www/localhost/htdocs/distfiles";

@mirrors = ("ftp.easynet.nl/mirror/gentoo/","ftp.snt.utwente.nl/pub/os/linux/gentoo/","etc,etc");
```

On the client machine I have this in /etc/make.conf:

```
GENTOO_MIRRORS="http://192.168.1.20:8080/distfiles/"

SYNC="rsync://192.168.1.20/gentoo-portage"

PORTDIR=/usr/portage

DISTDIR=${PORTDIR}/distfiles
```

/usr/portage/distfiles is shared via nfs on the portage gateway and mounted on the client in /usr/portage/distfiles.

Can you help me out please?

----------

## linkfromthepast

Have you tried going to the link with a regular web browser to see what the response is (http://192.168.1.20/distfiles)?  Is wget running on the gateway?  Are you sure apache has write permission to the /usr/portage/distfiles directory?  

BTW, I'm not sure if you noticed but $address="http:///192.168.1.20/distfiles";  shoud only have 2 //, its my mistake.  And GENTOO_MIRROS="http://192.168.1.20/distfiles/" , is missing a R.  :Smile: 

Let me know if/when you've tried all that and the responses.

----------

## freshy98

I could not get an index of the symlink, but adding

```
<Directory /var/www/localhost/htdocs/distfiles>

    Options Indexes FollowSymLinks MultiViews

    AllowOverride All

    <IfModule mod_access.c>

      Order allow,deny

      Allow from all

    </IfModule>

</Directory>
```

helped reaching it via lynx from a client machine.

I changed the /// to //. I did notice it but thought it was ok. I also corrected my own mistake  :Smile: 

Gonna try to see if it works now.

----------

## freshy98

It doesn't work yet.

I even added /dist to the commonanpache2.conf file using

```
<Directory /dist>

    Options -Indexes FollowSymLinks MultiViews

    AllowOverride All

    <IfModule mod_access.c>

      Order allow,deny

      Allow from all

    </IfModule>

</Directory>
```

so that apache can reach it.

or should I only set rights to the dir and file?

I looked at chown and chmod manpages but I can make nothing out of it on how to use it.

for me it seems that the .htaccess file can be reached, but that the perl script can't be executed because of file rights.

----------

## linkfromthepast

Be sure you have added a Script Alias in the apache config.

ScriptAlias /dist /var/www/localhost/htdocs/

This is from my config.  I have mapped /var/www/localhost/htdocs/ to http://host/dist and my dist.pl is in /var/www/localhost/htdocs/.

It can get a bit confusing with all the path manipulation and settings.

1.) Make sure your script is setup properly (use it from the command line to verify it works)

2.)  Make sure are directory permissions are correct

3.)  Added ScriptAlias to apache commonapace.conf

4.)  Move .htaccess and dist.pl to desired directories

5.) Test entire setup with web browser

6.) Change make.conf on client machines

To ensure the script is setup correctly, definately try it on the command line first before trying to use it through a browser.

Perhaps I should write a setup guide  :Smile: 

Although if you were to think of what else the script could be used for there is:

1.) File download/serving

2.) Dynamic file creation/compilation

3.) Another one but I can't think of it right now  :Smile: 

All of this through a web browser and by only changing the wget command.

----------

## freshy98

I have the ScriptAlias  :Smile: 

Alls path and stuff are indeed confusing, but I will try again.

I now can access my distfiles symlink in ../htdocs, so that problem is solved. Just dunno I can write there, though.

This "Order allow,deny" is it to be read as read,write allow, deny? If so, then I know how to make it writable  :Smile: 

If I get the mapping part right (third sentence in your last reply) then I should do

```
ScriptAlias /usr/portage/distfiles /var/www/localhost/htdocs/
```

in order to make a sort of symlink from /usr/portage/distfiles in ../htdocs?

I now have a ln -sf type symlink in ../htdocs.

I think I am now starting to understand the /dist part a bit better.

And a guide would indeed be great! including all permissions that need to be set.

----------

## linkfromthepast

As I have my distfiles in my path for the webservers, I never had to change permissions or symlinks because you don't need to store the files elsewhere.  If you would like a machine to be a server and client, then set the make.conf gentoo server to http://localhost/distfiles and you don't have to do any symlinks at all.

I'll see about writing a guide for setup, but like I said as far as permissions are concerned I didn't have to change any besides making the script +x.

----------

## freshy98

I understand about the ysmlink now. I just made a real symlink using ln -sf to create one. gonna remove that one and use the apache way of making a symlink.

was gonna do that tonight but things came up. gonna do that tomorrow.

----------

## freshy98

I am really getting pissed off by now.

Tried about everything and I still can not access http://ip-address/distfiles/ even with a index.html in it. placing a index.html there gives me a error 500, without just a 403.

I do not get it! Setting it up the same way as the apache manual does not work either. /var/www/localhost/htdocs/manual is a symlink to /usr/share/doc/apache-2.0.49/manual.

It has root on it's dir's and files which is the same to /usr/portage/distfiles/index.html but still I can not access it.

getting really sick of it!

[edit]removing the script alias allowed me to open the index.html properly. I noticed that manual has no script alias too  :Wink: . after removing the index.html I could not longer access /distfiles 'cause of a 403. which is normal I guess since there is no index.html anymore[/edit]

[edit2]I can now also download a file after I entered the filename behind /behind/. so if everything is ok, I should be able to write there too[/edit2]

----------

## linkfromthepast

Any progress?

----------

## freshy98

not really. client's still lockup (console is) after I do a emerge -f for example.

I tried lot's of things, but no succes..dunno why.

kinda out of idea's right now

----------

## linkfromthepast

Have you tried downloading the file with a web-browser?

----------

## freshy98

yeah. that was no problem.

----------

## linkfromthepast

So when you try to download a file with a web browser, the mirror goes downloads the file and redirects the web browser so you can download it?  It might be a problem with your version of wget, try updating.

----------

## freshy98

I haven't tried that one yet. I only tried to download a file that was there allready. gonna try it tonight.

----------

## Moriah

All this caching, mirroring, and proxying is great, but it overlooks one critical problem: local configuration control.  

Recently, in April 2004, the protage tree on the public mirrors suffered a bad case of bit-rot.  It became impossible to do emerge operations, either sync or packages, using the public mirrors.  If these nice automatically updating proxies/mirrors/caches blindly went out and grabbed the latest versions of anything referenced by the client machines that they serve, then the tree sickness would be propagated, and nothing would build on the local lan that the local sysadm is personally responsible for.  This is unacceptable!

I would like to see a way to manually invoke an emerge sync on a portage tree that was in an archive, and be able to easily revert back to the pre-emerge-sync if need be.  Ditto for emerging packages, etc.

With LVM comes the ability to make a snapshot of a volume.  Could this be combined with these other methods, such as nfs, to protect the original tree from disasters?

Another idea also comes to mind.  I have been using rsync since November 2003 to perform automatic nightly backups of all my machines over the network to a backup server with a big RAID storage system.  The idea comes from the O'Rielly book "Linux Server Hacks" p 78.  It uses rsync to do a nightly incremental backup to a special directory where backups are mirrored from the original client machine.  Once the rsync operation has brought the mirror up to date, the entire mirror tree is snapshotted with cp -al $mirror $temp.  This makes a hard-linked copy of the mirror tree.  When this copy is finished, it is moved into position by a mv $temp $visible so that it becomes visible on a directory tree that is NFS exported read-only to the clients.  Each client can then see backwards in time for a fair number of days to view the state of its entire filesystem as it existed on that day at the time of the rsync.  The permissions of the rsynced and hard-link-copied files are preserved, so a user cannot see anything he would not have been able to see anyway.  

Although I am not yet using LVM and snapshots to freeze the client's filesystem prior to the rsync operation, I plan to implement that just as soon as I get all the local boxes running genrtoo with LVM.  It would be nice if the 2.6.5 kernel with LVM-2 supported the snapshot operation, but alas, last I heard, that was not yet working, so I have standardized on the 2.4.25 kernel and LVM-1 instead.  I am being forced to use the 2.6.5 kernel for the IPSEC tunnelling server, so that I can support NAT traversal, and likewise for a client to test it with.  Those 2 machines will just have to suffer without a snapshot to freeze them during rsync backups.  The rsync occurs in the middle of the night, but that does not always mean the machines will be idle then.

It seems to me that a similar strategy could be used to manually emerge sync against a mirror directory, then do the hard-link copy snapshot thing to make the copy visible to the rest of the clients.  This would have the advantage that the new view of the portage and distribution trees would not be released to the clients on the lan until after it had passed muster by whatever quality control and configuration management procedures your organization requires.

The problem of automatically fetching a bad copy of something and breaking the portage tree for everybody is now solved by not releasing the updated version until after it has passed local testing.  The disadvantage of not being able to automatically fetch a missing file becomes an advantage, since you are trying to manage and control what is available to the clients, so you know that they all conform to local policy requirements.  Remember that the previous, known-good even if slightly out-of-date version of the tree is still available to all your lan clients, so you are not crippling them during the updating and testing process.

The desire is to make an NFS retrievable version of everything be free from race conditions, and at the same time exercise some control over the mirroring process, so that you can go back to an earlier version of the mirrored trees if you have to.  Of course, you could also serve it via http of ftp or whatever you like as well, or instead.  You could even rsync a copy of it to a local lan client's own disk, and then go out to a public mirror to fetch something for testing prior to including it in the next configuration controlled mirror of everything.

Remember: the hard-link copy operation makes a snapshot using a file-sharing approach.  You do not have more than a single copy of any given file, provided that the file did not actually change from one rsync of the public mirrors to the next.

Has anybody tried anything like this?

If so, what techniques did you use to make sure that all needed packages were indeed captured in the mirror before the snapshot hard link copy oiperation was performed?

Also, since I have not yet set up any kind of local mirror and have no idea of how much filesystem space I should allow for it, how much disk space is prudent for a mirror, and how much is the "churn"?  What percentage of the tree changes from day to day, or week to week?  I need to know this also, as I plan to keep historical archives of the portage and distribution trees just like I am now doing for my lan client machine backups.

----------

## viperlin

Moriah: will you be publishing that post in book form?

----------

## Moriah

No, but if I ever implement it and get it working, I might publish it as an emergable package.   :Smile: 

----------

## Satori80

 *GurliGebis wrote:*   

> Well, my installation works like this:
> 
> I have the server serving /usr/portage/distfiles over nfs, so the clients mount it in their /usr/portage/distfiles . The clients fetches the distfiles from the webservers like normally, but since they all have /usr/portage/distfiles mounted from the server, the file only needs to be downloaded once.

 

I do this  as well, and I rather enjoy this setup as the distfile host has way more disk space than it needs -- whereas my other machines always seem to have too little.

 *Quote:*   

> The server also runs the rsync daemon, so the clients can rsync against it, and thereby save bandwidth.
> 
> The server rsyncs once a day.

 

I want to do this, but I'm not sure what's to prevent rsync from mirroring the entire distfile tree (I'm assuming yours doesn't)? 

Basically, I think you are running the setup I'd like to have, but I don't understand how to set it up so it only grabs the disfiles I need as I emerge them.

Another point I wonder about is if I were to set it up per this howto's instructions (on top of the local portage tree in /usr/portage, rather than the default /opt/gentoo-rsync/portage) how does the hosting machine update its portage cache after an emerge sync?

I'd like to better understand these issues before I go and try it out. I've put enough time into moving these systems from other distros to gentoo, and I'd hate to hose one of them now.  :Wink: 

----------

## GurliGebis

Here is my /etc/rsync/rsyncd.conf :

```
#uid = nobody

#gid = nobody

use chroot = no

max connections = 10

pid file = /var/run/rsyncd.pid

motd file = /etc/rsync/rsyncd.motd

transfer logging = yes

log format = %t %a %m %f %b

syslog facility = local3

timeout = 300

#[gentoo-x86-portage]

#this entry is for compatibility

#path = /opt/gentoo-rsync/portage

#comment = Gentoo Linux Portage tree

[gentoo-portage]

#modern versions of portage use this entry

path = /usr/portage

comment = Gentoo Linux Portage tree mirror

exclude = distfiles
```

As you can see, distfiles is exluded  :Smile: 

All you have to do is emerge gentoo-rsync-mirror, edit your /etc/rsync/rsyncd.conf and /etc/rsync/rsyncd.motd.

Then add rsyncd to your default runlevel, and start it.

happy emerge sync'ing  :Smile: 

EDIT: After doing this, change RSYNC in make.conf on the clients to: rsync://server_ip/gentoo-portage

----------

## flybynite

I must say this is a great thread and is what got me started thinking along these lines, but...

I've created a complete system that includes both a local rsync server and a distfile proxy cache that is really simple and secure.  I've had nothing but good reports about the speed and ease of setup.

I have complete ebuilds for both setups so the install is almost effortless.

The local rsync server:

https://forums.gentoo.org/viewtopic.php?t=180336

The distfile cache for gentoo:

https://forums.gentoo.org/viewtopic.php?t=173226

I've tried all the rest, and these two packages are by far the best system for users with a LAN!!!!!  I even have full confidence these will work fine for large setups such as a university!!!

They are designed with speed and security in mind - Try them!!!

----------

## Parasietje

I used the following solution:

For the portage tree, I have an rsyncd running on my server. I emerge sync every 2 days using a cronjob on the server. All clients pull their portage tree from the server's rsyncd.

For the distfiles, I use an NFS share. Only when two clients attempt to download the same file, a problem will occur. That chance is small indeed.

However, to avoid this, you could add a special rule to your caching proxy to treat requests to gentoo packages differently. It chould cache them all in a different directory and not delete them. Run a cleanup script every now and then on this directory (i.e. when portage-2.0.47.tar.bz2 and portage-2.0.48.tar.bz2 exist, delete the old one).

This is, IMHO the best option. Server rsyncd and caching proxy for the distfiles.

----------

## jkroon

Right, I've seen a lot of solutions now, none of them security aware.

My scenario:  Varsity setup where I pay through my neck for every single byte download (for those familiar with the South African rand, I'm paying R2/MB - approx $0.33 US).

The problem:  All solutions presented so far allows arbitrary users to download *any* file via http.  As such I have two basic requirements:

1)  Files need to be restricted to actual portage distfiles.

2)  Users need to be authenticated

For number 2 the apache solution can be usefull.  However I would like it if any user can download already downloaded files and let the administrators be able to download anything.

For this reason I have concocted torpage (http://www.kroon.co.za/torpage.php).  Whilst lacking in the streaming while downloading it does provide me with all my other requirements.  I use it in conjunction with vsftpd at work (serving over 400 workstations, probably closer to 500).

Current features are basically as described above:

1)  Restricts downloadable files to those referenced in the portage tree.

2)  Optionally require user authentication.

3)  Validate and check file integrity before returning success to client.

4)  Can be easily modified to support SSL using ucspi-ssl instead of ucspi-tcp

Features I would like to add:

1)  progress information from the underlying wget process.

2)  acl to certain sections (ie, to disallow for downloading games ...)

3)  restrictions based on the upstream server (this is low priority as it'll most likely require help from portage which is not there yet).

4)  A need to pass USE flags along with request - should not be too difficult (the only package I've seen yet for which this is an issue is xorg-x11).

5)  Restrictions based on the size of missing distfiles (again the damn bandwidth issue).

At the university I'm running this on a server which has access to the outside, it runs beautifully.  It serves to downstream clients using ftp.  It also provides rsync to clients as described in the first post of this thread.

At home I have a single machine that exports /usr/portage *and* /usr/local/portage ro via nfs, then makes use of torpage to fetch files.  Unfortunately portage-2.0.51 has broken this by first checking whether /usr/portage/distfiles is ro before attempting to download ... (any fixes?).

And to TheQuickBrownFox - ah labs.  Yes, the configs is a *huge* problem.  I've had some people develop me a distributed portage like application.  They have completed it now and it functions on top of portage (or something to that effect - I'm not clued in on all the details yet).  Hopefully it will solve the labs problems for once and for all.

----------

## bschnzl

Hi Folks...

   First I would like to pay tribute to Grimthorn.  That is why I am posting this here, after 4 plus months of inactivity.  SALUTE!

   Introducing R-rud  Proof of Concept Gentoo Central update server reactive unattented package retriever.

   It accepts a filename as an argument.  It will then parse this file for rsync attempts, and return the package name, or an error.  Then it will look for the completion of the rsync transaction.  If it finds it, it will tell you.  It will also tell you it doesn't.  Hopefully this will lead to reactive unattended retrieval of missing files on the Central Gentoo Server.  The downloads are not yet implemented.  I request feedback on the success of identifying which requests require downloading.

    It will build a list of ebuild files from the digest lists in each file under /usr/portage.   It will put this in /usr/portage/pkglist.  It will do this only if a regular text file is not at  /usr/portage/pkglist.  This file is currently 2792 k on my system with an md5sum of 42fe85e0427518a176dfcc91f69b47e9.  I believe it should be the same on every gentoo system, but I don't know, so I don't check.  It will change tomorrow, but change is a general feature of the portage system.   Assuring the pkglist file shouldn't be hard.

    I am posting this for testing.  I also hope it answers jkroon's post.  Here's how:

   First, programs are not security aware, they are well written.  As my code is proof of concept it is not generally well written.  I am no PERL guru.  Feedback is appreciated.

   To restrict files to portage, run rsync as nobody:nobody, and only give nobody access to the portage files.  You might also do something like this (assume your network uses a 10.0.4.0/23 address space):

```
 iptables -A INPUT -s 10.0.4.0/23 -p tcp --dport 873 -m state --state NEW -j ACCEPT 
```

This also assumes the policy on your INPUT chain is DROP, and you handle related and established state elsewhere.

   IMHO Authenticating users in this case is overkill.  These files are available on the open internet, via http.  If they want them, they can use  their browser.   This is certainly up for discussion.

   By using only rsync for gentoo updates, you are avoiding public protocols like http.  You shouldn't have to be a webmaster to run a gentoo update server.  Rsync is simple.  Simple protocols are easier to track, and thus, secure.   While I am in this vein, using NFS for this is just plain wrong.   The protocol is unwieldy, it begs for central user management, and the write issues are unmanageable.  Web-caching techniques require a) an additional server of some sort, and b) http.

    jkroon's feature requests would not be difficult to elegantly implement.   

1)This is meant to be a background process to emerge(1).  emerge already has generally apparent progress indications.

2)Games and the like can be protected by filesystem acls.

3)Given that portage mirrors sync every thirty minutes, upstream host restrictions should not be needed.  Regardless, this will reduce the monitoring to one host.  Are you watching your logs?

4)USE flags are managed on the local host.  If you want to distribute them, make an ebuild with only make.conf, and post it on the appropriate server instance.  This is outside the scope of the update server.  It is a general admin task.  Again, easier said than done, but the path is there.

5)I believe rsync can handle file size restrictions.  If it can't, it shouldn't be too hard to add it to this.

   But alas, pontificating over proof of concept code is spending before the goose lays the golden egg.  Hopefully I will have time to see this to completion.

   Good then...  let's move on.  Here's da code.  It is meant to run as root.

```

#!/usr/bin/perl

#

#  This was written by Bill Scherr IV

#  It is released under the GNU Public Licence

#  as found at http://www.gnu.org/licenses/gpl.txt

#  With all its warnings and benefits...

#

#  PERL code to return package name from 

#  log entry in rsync logs.

#

#  1) Takes an argument of a rsync log file

#  2) obtains the filename from that line

#  3) guesses and returns the package name or an error

#  4) Determines if file exists

#  5) Obtains missing files

#

#

#  This should be cron'd to restart after every sync.  

#  It should also tail the log file, but this is for 

#  proof of concept.

# require File::Temp;

use IO::File;

use File::Temp (tmpnam);

#  Accept a filename from the command line (1)

$target = shift;

$pkgfile = metafind();

#  Initialize a counter (although I don't think I'll need it)

$counter = 0;

print "looking thru $target\n";

print "referring to $pkgfile\n\n";

#  Make sure we are only looking through text

if ( -T $target ) {

        

        #  Initialize Log Line and Success Checker

        my $inline = "";

        $success = "";

        #  Open our checked file!

        $LOGLINE = IO::File->new("< $target") || die "Unable to open $target: $!\n";

        while (defined($inline = $LOGLINE->getline())) {

                

                #  Cut the stuff we don't need to look at

                next if $inline !~ m/rsyncd/;

                

                #  A sign of a successful update; check further

                if ( $found ne "" ) {

                        $getit = success($inline,$success,$found) if ( $inline =~ m/distfiles / );

                        dostuff($found,$pkgpath) if ( $getit ne "gotit" );

                

                #  The Buck Stops Here!

                        $success = "";

                        $found = "";

                        $pkgpath = "";

                        $getit = "";

                }

                        

                # Grab package for success() checker

                $success = $inline;

                

                # print "$success $inline\n" if $inline =~ m/distfiles/;

                #  More cutting

                next if $inline !~ m/distfiles\//;

                #  Obtain a filename...

                chomp($inline);

                

                # remove the initial cruft...

                $inline =~ s/\S+\s+\d+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+distfiles\///;

                # remove the ending junk...             

                $inline =~ s/from\s+\S+\s+\S+\d+\)//;

                #  Initialize Result holder

                $pkgpath = "";

                # Obtain and output package name (2)

                $PKGNAMES = IO::File->new("< $pkgfile") || die "Could not open package list for reading: $!\n";

                while (defined($herestuff = $PKGNAMES->getline())) {

                        

                        if ( $herestuff =~ m/^\/usr/) {

                                $pkgpath = $herestuff;

                                next;

                        } elsif ( $herestuff =~ m/$inline/ ) {

                                $pkgpath =~ s/\/usr\/portage\///;

                                $pkgpath =~ s/\/files\/digest\S+//;

                                $herestuff =~ s/\S+\s+\S+\s//;

                                $herestuff =~ s/\s+\d+//;

                                $found = $herestuff;

                                print "-------\nfound $found \nfrom $pkgpath \n";

                                last;

                        } else {

                                # clean up for the next run!

                                $herestuff = "";

                                next;

                        }

                }

                if ( $found eq "" ) {

                        print "-------\nDid not find $inline\n";

                }

                # clean up for the next run!

                $inline = "";

        }

}

sub metafind

{

        # On my system, this generated a 3773K file...

        #

        # This will build a list of all the ebuild files 

        # on the Gentoo System...

        # 

        print "Building package list!\n\n";

        $test2 = "/usr/portage/pkglist"; 

        

        unless ( -T $test2 ) {

                $templist =  tmpnam(); 

                

                unlink $test2;

                my $CMD = "find /usr/portage/ -name digest\\* -exec ls {} >> $templist \\; -exec cat {} >> $templist \\;";

                system($CMD);

                open(OUTLIST, "> $test2") || die "Could not open package list for writing: $!\n";

                open(NEWLIST, "< $templist") || die "Could not open new package list for reading: $!\n";

                        while (<NEWLIST>) {

                                print OUTLIST $_;

                        }

                close(NEWLIST);

                close(OUTLIST);

                unlink $templist;

        }

        return $test2;

}

sub success

{

        # Determine the success of an rsync operation from a log entry

        my ($logdret,$logdreq,$pkgreq) = @_;

        my $BINGO = "";

        

        # get daemon[pid] from log entry

        $logdret =~ s/\S+\s+\d+\s+\S+\s+\S+\s//;

        $logdret =~ s/\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+//;

        $logdreq =~ s/\S+\s+\d+\s+\S+\s+\S+\s//;

        $logdreq =~ s/\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+//;

        

        #  print "$logdret\n$logdreq\n\n";

        if ( $logdret ne $logdreq ) {

                $BINGO = "donthaveit";

        } else {

                

                $logdret =~ s/\[\d+\]\://;

                # print "$logdret\n";

                if ( $logdret ne "rsyncd" ) {

                        $BINGO = "gotit";

                } else {

                        $BINGO = "donthaveit";

                }

        

        }

        print "You have transferred the $pkgreq package successfully\n\n" if ( $BINGO eq "gotit" );

        

        #       print "$BINGO\n";

        return $BINGO;

        #

}

sub dostuff

{

        # Do stuff

        

        my ($pkgname,$category) = @_;

        # $pkgpath =~ s/\S+\///;

        # my $emchek = "emerge -p $pkgpath";

        # my $emfind .= system($emchek);

        # $emfind =~ s/\S+\s//;

        print "We should download $category-$pkgname now:\n";

}

```

And here is a shell script that removes the package list, sync's, and rebuilds the package list.

```

#!/bin/sh

#

#  Shell script to do stuff associated with emerge on a central portage server

#  Bill Scherr IV 

# 

#  released under the GNU Public Licence

#  as found at http://www.gnu.org/licenses/gpl.txt

#  With all its warnings and benefits...

#

#  Let's go...

PATH=/usr/bin

EMERGE_BIN=`which emerge`

PKG_LIST="/usr/portage/pkglist"

if [ ! -x $EMERGE_BIN ]

then 

        echo "can't find emerge, stopping"

        exit 1

fi

if [ -e $PKG_LIST ]

then

        rm -f $PKG_LIST

fi

$EMERGE_BIN sync

/usr/bin/find /usr/portage/ -name digest\* -exec ls {} >> $PKG_LIST \; -exec cat {} >> $PKG_LIST \;

exit 0

```

Obviously, this still requires some work.  I believe the concept is ready for testing.  

Enjoy!

----------

## jkroon

 *bschnzl wrote:*   

> First, programs are not security aware, they are well written.  As my code is proof of concept it is not generally well written.  I am no PERL guru.  Feedback is appreciated.

 

Aye, this is true, to a certain extent.  Any security aware application should be well written, but not any well written program is security aware.

 *Quote:*   

> To restrict files to portage, run rsync as nobody:nobody, and only give nobody access to the portage files.  You might also do something like this (assume your network uses a 10.0.4.0/23 address space):
> 
> ```
>  iptables -A INPUT -s 10.0.4.0/23 -p tcp --dport 873 -m state --state NEW -j ACCEPT 
> ```
> ...

 

Ok, this doesn't help where you don't have a clue who has what IP addresses and is sitting on a hostile network where is doesn't help to restrict to specific subnets (btw, I don't feel like selectively opening portage for 50 or so IPs over a class B network  :Smile: ).

 *Quote:*   

> IMHO Authenticating users in this case is overkill.  These files are available on the open internet, via http.  If they want them, they can use  their browser.   This is certainly up for discussion.

 

I truly wish that was the case.  Unfortunately we pay for bandwidth - a lot - and we cannot afford to allow anyone to download on our account.  By authenticating we can keep track of who downloads what, who initiates what downloads and in general do some basic accounting.  Also, at the university where I've deployed this, a user cannot simply point his browser to the file and get it, since all students is restricted to using 100MB/year, or paying R2.00 per additional meg.  That is a lot of money considering UK users pay about 30 pounds (approx R330) per month for unlimited ADSL access (iirc) compared to about R1000 (about 91 pounds) we pay per month for an ADSL line capped at 3GB.

 *Quote:*   

> By using only rsync for gentoo updates, you are avoiding public protocols like http.  You shouldn't have to be a webmaster to run a gentoo update server.  Rsync is simple.  Simple protocols are easier to track, and thus, secure.   While I am in this vein, using NFS for this is just plain wrong.   The protocol is unwieldy, it begs for central user management, and the write issues are unmanageable.

 

Ah, but not all distfiles are available via rsync.  And in some cases only available on an international link from SA - which means that at home I'll get next to zero rate download rates (We have a 3GB per month cap on our ADSL - which only goes upto 512kbps anyway - after which all - especially international - traffic gets shaped like you wouldn't believe).  There are however huge local ftp mirrors which we can use which still provides us with reasonable download rates (depending on the mirror anything from 3 or 4 KB/s right upto about 30KB/s).  These mirrors don't provide rsync however - or at least, I haven't managed to locate one recently that does.

About the NFS, it's simply used to negate the need to emerge sync every machine seperately, and it's exported read-only.  What is the biggest network you worked on yet?  I'm not going to sync 400 machines seperately.  No way.

 *Quote:*   

> 1)This is meant to be a background process to emerge(1).  emerge already has generally apparent progress indications.

 

As is torpage - it hooks into the FETCH and RESUME commands.  The problem however is streaming that progress indicator back.  I have an idea to do this but just need to make sure that it'll always "do the right thing (tm)" without killing off the network.

 *Quote:*   

> 2)Games and the like can be protected by filesystem acls.

 

How?  I want to protect the download, and since the download on the server will always happen as the same user which will require write access to /usr/portage/distfiles filesystems acls cannot do what I want - or at least, not in any way that I know of.  Not unless you go and predictively create immutable empty files with all the filenames of all the games in /usr/portage/distfiles and that imho is a very ugly hack.

 *Quote:*   

> 3)Given that portage mirrors sync every thirty minutes, upstream host restrictions should not be needed.  Regardless, this will reduce the monitoring to one host.  Are you watching your logs?

 

I'm not in charge of our local rsync mirror - which btw, only syncs once a day since it isn't an official rsync mirror.  And actually, restricting based on upstream ftp mirror does make sense, when your in an environment where bandwidth (esp international) is at a premium.  Can't remember exactly what I wanted to do with this, but it can be usefull.

 *Quote:*   

> 4)USE flags are managed on the local host.  If you want to distribute them, make an ebuild with only make.conf, and post it on the appropriate server instance.  This is outside the scope of the update server.  It is a general admin task.  Again, easier said than done, but the path is there.

 

Not the way torpage is implemented where I tell the server that I'm looking for a specific package - it then initiates a emerge -f --nodeps =package-category/package-name-version, waits for it to terminate with success/failure and then reports back to the client.  The new version I suspect will actually initiate the wget itself, merely using emerge -pf --nodeps to get a list of alternate mirrors (optional).  It will use /usr/portage/mirrors to get a base list of mirrors.  Additionally, each portage request will only fetch one file.  I want a central torpage/distfiles server that can download for any client system - irrelevant of what the client's USE flags is.  That was the orriginal problem and why I wanted to send the USE flags from the client to the server - so that the server could calculate which distfiles to download.  I suspect you understood I wanted to force a set of use flags down on the clients, nope, I wanted to make the server flexible enough to handle all possible clients.

 *Quote:*   

> 5)I believe rsync can handle file size restrictions.  If it can't, it shouldn't be too hard to add it to this.

 

Again the new version should be able to do this as I'm now looking at the digest files and only fetching one file at a time.

Jaco

----------

## bschnzl

Hi all...

   My goal in posting R-rud was to get some help in testing, and maybe direction.  The concept is to limit communications of the internal clients to the rsync server, as Grimthorn suggested.  The client downloads are controlled by normal emerge configuration settings.  The server would use emerge -f.  Any protocol could be used to pull the files to the local mirror.  Hopefully, this will be a drop-in to an already running automatically syncing server, as specified in Grimthorns Howto.  The learning curve would be easy.

   The command line specified file should be the rsync log.  That file should match the rsyncd configuration.  This location is configurable, or not, and effected by syslog.conf.  For now, it is specified at run time.  Non distfiles requests are filtered.  As the distfiles are available from a specified rsync branch, as recommended in Grimthorn's Howto, the errors are limited to old packages. Those pointers should get updated by a later sync.

  R-rud and the package list file could also be useful in equery, or emerge for identifying which file belongs to which package.  I have looked through the reference list file, and noted that some files appear more than once.  When they do, however, they usually have the same root package name.  The code grabs this common package name, and sends it to the downloader.  As you can see, the downloader, dostuff(), has some commented code that invokes emerge, with the local system parameters in place.  Right now this code merely reports, it does not download.  Other than generating a file list, no system changes will be made by running this code.  If a similar file already exists, someone please clue me in.

   Before I get in any more trouble, some other assumptions should be enumerated.  First, judging from the mirror hosts I connect to, a single rsync server should be able to handle tens of thousands of internal updating clients.  rsync is very lite, and appropriate for files that are distributed over the open internet.  This server would not participate in any central user management.  It would probably live on a semi-trusted screened subnet, as it connects to the internet automatically.  A normal user would log in and become root to deal with maintenance issues.  This would serve to reduce the target profile of this box.  Keep it simple where you can!

   Second, bandwidth between the updating clients and this server would be provided by the owning organization (i.e. an ethernet connection).  I thought that was understood in deploying a local mirror.  rsync logging contains size and ip address info, if accounting is an issue.  Besides, rsync has some of the lowest overhead seen on any service.

   Third, syncs are cron'd.  I am using fcron, which uses roots crontab as the system crontab.  Regular users are not given root!  RPC is vulnerable regardless of how the filesystems are exported.  It should not extend to systems that will connect to the internet without direct user initiation (yes, this includes SMB and NCP too).  Configurations and network equipment can protect RPC from ip ranges, but what of boxes that connect using those RPC services?  The real issue in dumping RPC is keeping all those distfiles on each local box.  Are they really needed there?  Do we fix RPC, or buy bigger drives?  Which has a better chance of success, Right Now?

   My understanding is that portage package sets are easily expanded.  Thus, the possibility exists for insertion of individual files on the updating clients from the portage server.  This would provide functionality on the order of ZenWorks or SMS.    I have not tested this, but I don't see it being feasible on web-cache solutions.

   Rsync was not my first choice.  Of the choices offered by the community, rsync is the simplest.  If someone really wants to do bad stuff, he a) probably won't be doing it from his own machine, nullifying the money barrier, and b) can find the stuff we are serving here on the network elsewhere.  Besides, it's not like nobody (the user that rsyncd should run as) is a member of the portage group.  Our systems are complex enough.  Ya gotta keep it simple where you can.

   Of course, currently, all R-rud does is issue a report on a logfile.  I was hoping folks would run it and let me know if it missed any missed files, downloaded any files that were already there, or choked on problems.  All I can do at this point is add you to the comments / README file!

Thanks for your help...

----------

## yetano

 *jkroon wrote:*   

> Current features are basically as described above:
> 
> 1)  Restricts downloadable files to those referenced in the portage tree.
> 
> ...
> ...

 

What about RSYNC_EXCLUDEFROM (see make.conf(5))? This way the section isn't in the local portage tree at all, thus no file referenced there can be downloaded.

----------

## Russel-Athletic

I have one question.

I want the solution with nfs share (only 2 clients) but i don't know which direcotries i have to share. i want to do a emerge sync only on the server maschine and emerge -u world manually.

Ok there is /usr/portage and /usr/portage/distfiles but something else? Something in /var ?

----------

## jkroon

You see.  That is why we have forums.  I haven't read that particular man page in a long time.  That'll solve part of the problem.  Satisfactorily (spelling ...) for now.  And probably for a long time to come.

Next question, we maintain some of our own ebuilds, some of these are queued for inclusion or we're pushing to get them in, but some of these will never get included.  We have a seperate rsync section for this called gentoo-local-portage to which we can manually rsync.  Since we are running the sync's out of cron'ed scripts I suppose I can just add an additional rsync command to those scripts, but it would be nice if portage has some way of doing this for me, ie, if I can ask it to sync /usr/local/portage against rsync://some.server/gentoo-local-portage whenever emerge sync gets called  :Smile: .  Any ideas?  And this time I promise that I have completely read and understood man 5 make.conf and has not seen such an option.

----------

## yetano

 *Russel-Athletic wrote:*   

> I have one question.
> 
> I want the solution with nfs share (only 2 clients) but i don't know which direcotries i have to share. i want to do a emerge sync only on the server maschine and emerge -u world manually.
> 
> Ok there is /usr/portage and /usr/portage/distfiles but something else? Something in /var ?

 

AFAIK sharing /usr/portage (including all subdirectories) should be fine. With the default setup portage uses /var only for hostspecific (e.g. /var/lib/portage/world) and temporary files (e.g. /var/tmp/portage/*).

@jkroon

My first guess was moving /usr/local/portage to a subdirectory of /usr/portage and then add this to PORTDIR_OVERLAY, but after a small test I had to find out that PORTDIR_OVERLAY isn't working the way make.conf(5) implies.

 *make.conf(5) wrote:*   

> Defines the directories in which user made ebuilds may be stored and not overwritten when 'emerge --sync' is run. [...]

 

```
# /etc/make.conf

PORTDIR="/var/portage/sys"

PORTDIR_OVERLAY="/var/portage/sys/dummy"
```

During 'emerge sync' /var/portage/sys/dummy is deleted with all files in it and at the end emerge complains about PORTDIR_OVERLAY not being a valid path (anymore). Well, at least none of the files has been overwritten  :Laughing: .

Nevertheless, /usr/local/portage could still be moved to a subdirectory of /usr/portage, but should be added to the file pointed to by RSYNC_EXCLUDEFROM (actually the name of the subdirectory, not the full path and don't use '/local' as this is excluded in any case). This way 'emerge sync' on the clients will include your own ebuilds, but won't touch them on the server. No need for a 'gentoo-local-portage', though the solution you already mentioned might be the better one.

Basically, the problem is that emerge can sync with one server/module only, unless SYNC (and PORTDIR) is changed every time.

----------

## jkroon

I think I'll just go with the additional rsync command in our nightly scripts - it is imho the cleaner solution.  Also make it easier to distinguish which is custom ebuilds and which are portage ebuilds.  It doesn't solves the problem for all but a small number of desktops, which isn't that big a problem - we can probably just write a small script that does both emerge sync and rsync -rav --delete rsync://out.rsync.server/gentoo-local-portage /usr/local/portage in one go.

----------

## jkroon

Ok, torpage 0.2.0 is available for testing.  It still lacks ACLs but who cares?  For most of us this is simply a caching solution and EXCLUDEFROM can be used to exclude games-* (which is what I wanted to have ACLs for).

The new version instructs the server to fetch a specific file for a specific package and version, the server then manually parses the appropriate digest files to check whether these files are actually in those packages, and then does some nifty parsing and sourcing of files to determine the upstream mirrors that portage would have fetched from and then manually fetches and does the checksumming.  I do this by hand since portage cannot fetch a particular file for me at this point in time - nor is the emerge command available on non-gentoo servers (and yes, I do have the need that torpage run on non-gentoo systems on the server side).

Additionally torpage doesn't take over your fetching entirely any more, it now actually specifies a seperate torpage:// protocol, so you can have multiple upstream torpage servers (ok, that is just leetness factor and doesn't imho have much real use) along with other standard ftp:// and http:// servers and torpage won't interfere with these.  This allows for fallback should torpage fail for some reason or another.  I've even managed to configure it to use different upstream torpage servers depending on the network I'm connected to, or to silently fail if I'm not connected to one of these networks.

Anyway, please take a look and let me know of any shortcommings (both in my code and in documentation).

URI:  http://www.kroon.co.za/torpage.php

Thanks for any and all feedback.

----------

## mndar

HI,

  Is there any way I could get back my original emerge cache/ portage tree,  after doing emerge --sync

  Sorry, if this question is too silly to be replied to, by all the GURUs out there. I recently installed Gentoo from the universal CD (install-x86-universal-2005.0.iso) and then installed KDE, GNOME, etc from the package CD (packages-x86-2005.0.iso) using emerge --usepkg <package-name> This installed KDE 3.3.2. 

  Well, then I did emerge --sync. Now I can't install any kde-3.3.2 package since all the entries in the emerge cache/portage tree point to kde-3.4. Thus I can't use the package CD. I want to be able to install packages from the Package CD which has got  kde-3.3.2. So, is there a way by which I could get back my original emerge cache or portage tree. I was thinking emerge --regen. Would that do it ?

Also I am not very sure whats the difference (if any) between emerge cache and portage tree.

----------

## jkroon

You presumably got your orriginal portage tree from a tarball on the CD?  I suspect the following sequence of commands will solve you problem in that case:

```
# mount /mnt/cdrom

# mv /usr/portage/distfiles /tmp/distfiles

# rm -rf /usr/portage/distfiles

# tar xjvpf /mnt/cdrom/path/to/that/tarball/which/I/can/never/remember # check the install manual for the command

# rm -rf /usr/portage/distfiles # just in case

# mv /tmp/distfiles /usr/portage/distfiles
```

Something along those lines might work - and if not, emerge --sync will get you back where you are now  :Sad: .

----------

## mndar

Thanks jkroon, it worked!!

Following were the sequence of commands I had to issue

```
mv /usr/portage/distfiles /tmp/distfiles

mount -o loop /downloads/gentoo/install-x86-universal-2005.0.iso /mnt/iso  

tar -xvjf /mnt/iso/snapshots/portage-20050303.tar.bz2 -C /usr 

mv /tmp/distfiles /usr/portage/distfiles
```

----------

## yaneurabeya

If you have several machines as well with different configurations, and several flagship machines that do compiling, etc... then you might want to consider something like this...

       Server

      /         \

    /             \

  /                 \

PC(0) [....] PC(n)

Where PC(0) through PC(n) have NFS access and are trusted for authentication. Given that each and every machine can mount and access the distfiles NFS share, you can setup a simple cron command where the trusted machines will 'sync' with the server by running cp /usr/portage/distfiles/*. Then you can run rsync to your heart's content. Why not just use NFS? Because NFS requires an additional service for all the non-flagship machines and rsync doesn't require any additional kernel modules.

----------

## jasperbg

 *Grimthorn wrote:*   

> Portage uses two methods to keep an updated Portage tree and retrieve current Gentoo packages (distfiles). Rsync (rsync) is used for the tree and wget is used for the packages. Im not aware of the motivations behind these protocol choices but they work very well.

 

Great article, but wget is not a protocol.Last edited by jasperbg on Tue Jun 07, 2005 6:00 am; edited 1 time in total

----------

## jkroon

You really don't want to be doing any kind of NFS unless your _entire_ network can be trusted.  It is simply too insecure.  Take for example a university (Yes, I'm a sysad at a university) where I'll need to export /usr/portage rw to the entire campus (well, a large part thereof anyway) to make your trick work.  Not something I'm willing to do since this means some cracker can go and put a new baselayout along with a trojanned archive on the server and everybody would install it.

Nope, not a good idea imho, plus there is locking issues even if you do trust the entire network.

----------

## yaneurabeya

Yes, and that's why something should be done for authentication dealing with trusted keys and so forth like kerberos auth.

----------

## jkroon

 *yaneurabeya wrote:*   

> Yes, and that's why something should be done for authentication dealing with trusted keys and so forth like kerberos auth.

 

Ok, we're going off topic, but OpenAFS seems to be the only viable alternative at this point in time, and for heavens sake, don't use the version in portage.  Anyhow, kerberos itself has some severe flaws in as well with regards to the way it works making replay attacks possible.  But much better than NFS.  NFSv4 (kerborised NFS) might also be an alternative if it should get stable at some point.

Anyway, with regards to a central repository a read-only NFS export of /usr/portage (And $DISTDIR if different from the default /usr/portage/distfiles) and then mounting that and using something like torpage is about the best solution I've managed to come up with.  Using it both at home and at work and it's like a charm, it just works.  Well, at work I'm only using it as a on-demand mirror (ie, download from outside once, download to servers/workstations from there).  At home I've got a bit of a mixture atm (need to get everything back to ro NFS).

----------

## matt.matolcsi

First off, thanks to everyone for the well-written, thoughtful, and useful comments. 

Here at work we've got something of a similar situation: five recycled Ultra 5's do most of the dirty server work, with one of them being the designated binary package build machine and portage sync'er. Because they are all very similar hardware (only processor speed and RAM differ), they all use the same make.conf and general filesystem layout. Actually, I've got a neat rsync setup that lets the other machines pull this and some other files automatically from the server. 

At first, I setup the machines to use binary package downloading using emerge -g, but this had some drawbacks: 

1. It's slow: every time I wanted to emerge a package, some 'metadata pickle' had to be downloaded and parsed, and this took ages. 

2. It required the setup of an http server on the package host, which was one more thing to worry about that I didn't want. 

I also had that host serve as the rsync mirror, which had another few disadvantages: 

1. It came off of an actual hard drive, which meant that it was probably 5-10x slower than syncing to an official mirror (which act like they're on RAM drives.) 

2. It caused a ton of grinding each time the tree was sync'd! This worries me because I want these servers to go the next 5 years with minimal adjustments on the administrator's part (I'll be leaving soon), and all that grinding * 4 for each machine that had to sync just gave me a bad feeling. 

Solution? Share /usr/portage read-only over NFS to the local network. The other machines automount /usr/portage, and I can use emerge --buildpkg (actually, I have it set in FEATURES in /etc/make.conf to build binary packages) to build packages on the main host, and then install them on the others using emerge --usepkg. 

Actually installing packages has worked great, but I'm worried because emerge goes through a huge round of 'quarterly updates' when I first automounted /usr/portage, and sometimes it acts like it wants to write into those directories:

Performing Global Updates: /usr/portage/profiles/updates/4Q-2004

(Could take a couple of minutes if you have a lot of binary packages.)

  .='update pass'  *='binary update'  @='/var/db move'

  s='/var/db SLOT move' S='binary SLOT move' p='update /etc/portage/package.*'

......................................!!! Cannot update readonly binary: sys-devel/gcc-3.3.5-r1

!!! Cannot update readonly binary: sys-devel/gcc-3.3.5.20050130-r1

.....!!! Cannot update readonly binary: sys-devel/libtool-1.5.2-r5

..........................................................................................................................................................................................................................

Does anyone know if indeed anything *needs* write access within /usr/portage? It seems like nothing should, since all the ebuilds get synced, but it still worries me that emerge complains. 

Thanks, 

Matt

----------

## yaneurabeya

Uhm, you need +w for distfiles and certain sections of portage (individual branches, ie patches) I think.

----------

## matt.matolcsi

distfiles I'm not worried about, since only one machine will sync with the outside world and download/build packages; but what process or program needs to be able to write to specific ebuilds?

----------

## jkroon

I've submitted a patch that removes the need for +w to ${DISTDIR}, but then you need some other method to fetch files (like torpage - http://www.kroon.co.za/torpage.php works nicely).  I like the --usepkg and --buildpkg idea, I quess one can even use the same idea that torpage uses for fetching distfiles for building packages ... (aka, packages gets built on demand whilst the client waits).

----------

## bigdave1

I am having a problem setting this up as described in Setup #3. I have set up rsync exactly as described and setup my client accordingly. However, whenever I do an emerge --sync, I get the following error:

```

linuxclient etc # emerge --sync

>>> starting rsync with rsync://192.168.1.4/gentoo-portage...

>>> checking server timestamp ...

@ERROR: Unknown module 'gentoo-portage'

rsync: connection unexpectedly closed (52 bytes read so far)

rsync error: error in rsync protocol data stream (code 12) at io.c(189)

>>> retry ...

```

My rsyncd.conf file looks like this:

```

# Copyright 1999-2004 Gentoo Foundation

# Distributed under the terms of the GNU General Public License v2

# $Header: /var/cvsroot/gentoo-x86/app-admin/gentoo-rsync-mirror/files/rsyncd.c$

#uid = nobody

#gid = nobody

use chroot = no

max connections = 10

pid file = /var/run/rsyncd.pid

motd file = /etc/rsync/rsyncd.motd

transfer logging = no

log format = %t %a %m %f %b

syslog facility = local3

timeout = 300

[gentoo-x86-portage]

#this entry is for compatibility

path = /opt/gentoo-rsync/portage

comment = Gentoo Linux Portage tree

[gentoo-portage]

#modern versions of portage use this entry

path = /usr/portage

comment = Gentoo Linux Portage tree mirror

exclude = distfiles

[gentoo-packages]

#For distributing Portage packages (distfiles) to internal clients

path = /usr/portage/distfiles

comment = Gentoo Linux Packages mirror

```

Can somebody please tell me what I'm doing wrong or what needs to be adjusted?

Thanks!

----------

## jkroon

From the client machine, what does the command:

```
$ rsync rsync://192.168.1.4
```

output?

----------

## bigdave1

Well, I figured out my problem, I was making changes to the /etc/rsync/rsyncd.conf file instead of the /etc/rsyncd.conf file. However, I now have a new problem. What I would like to happen when I emerge something (ie. emerge gdesklets-core), have my portage server download the ebuild for that package all and the dependencies if its not there and cache it in its portage directory, and then serve it to the client that requested it. However, thats not what is happening. When I do a 'emerge gdesklets-core' here's what I get:

```

linuxclient etc # emerge gdesklets-core

Calculating dependencies ...done!

>>> emerge (1 of 5) dev-python/pyorbit-2.0.1 to /

>>> Downloading http://distfiles.gentoo.org/distfiles/pyorbit-2.0.1.tar.bz2

link_stat "pyorbit-2.0.1.tar.bz2" (in gentoo-packages) failed: No such file or directory

client: nothing to do: perhaps you need to specify some filenames or the --recursive option?

rsync error: some files could not be transferred (code 23) at main.c(653)

>>> Downloading http://distro.ibiblio.org/pub/Linux/distributions/gentoo/distfiles/pyorbit-2.0.1.tar.bz2

link_stat "pyorbit-2.0.1.tar.bz2" (in gentoo-packages) failed: No such file or directory

client: nothing to do: perhaps you need to specify some filenames or the --recursive option?

rsync error: some files could not be transferred (code 23) at main.c(653)

>>> Downloading ftp://ftp.sunet.se/pub/X11/GNOME/sources/pyorbit/2.0/pyorbit-2.0.1.tar.bz2

link_stat "pyorbit-2.0.1.tar.bz2" (in gentoo-packages) failed: No such file or directory

client: nothing to do: perhaps you need to specify some filenames or the --recursive option?

rsync error: some files could not be transferred (code 23) at main.c(653)

>>> Downloading ftp://archive.progeny.com/GNOME/sources/pyorbit/2.0/pyorbit-2.0.1.tar.bz2

link_stat "pyorbit-2.0.1.tar.bz2" (in gentoo-packages) failed: No such file or directory

client: nothing to do: perhaps you need to specify some filenames or the --recursive option?

rsync error: some files could not be transferred (code 23) at main.c(653)

>>> Downloading ftp://ftp.gnome.org/pub/gnome/sources/pyorbit/2.0/pyorbit-2.0.1.tar.bz2

link_stat "pyorbit-2.0.1.tar.bz2" (in gentoo-packages) failed: No such file or directory

client: nothing to do: perhaps you need to specify some filenames or the --recursive option?

rsync error: some files could not be transferred (code 23) at main.c(653)

>>> Downloading http://ftp.gnome.org/pub/gnome/sources/pyorbit/2.0/pyorbit-2.0.1.tar.bz2

link_stat "pyorbit-2.0.1.tar.bz2" (in gentoo-packages) failed: No such file or directory

client: nothing to do: perhaps you need to specify some filenames or the --recursive option?

rsync error: some files could not be transferred (code 23) at main.c(653)

>>> Downloading ftp://ftp.no.gnome.org/pub/GNOME/sources/pyorbit/2.0/pyorbit-2.0.1.tar.bz2

link_stat "pyorbit-2.0.1.tar.bz2" (in gentoo-packages) failed: No such file or directory

client: nothing to do: perhaps you need to specify some filenames or the --recursive option?

rsync error: some files could not be transferred (code 23) at main.c(653)

>>> Downloading ftp://ftp.gnome.org/pub/gnome/2.0.0/sources/pyorbit/2.0/pyorbit-2.0.1.tar.bz2

link_stat "pyorbit-2.0.1.tar.bz2" (in gentoo-packages) failed: No such file or directory

client: nothing to do: perhaps you need to specify some filenames or the --recursive option?

rsync error: some files could not be transferred (code 23) at main.c(653)

!!! Couldn't download pyorbit-2.0.1.tar.bz2. Aborting.

```

As you can see, its not even attempting to connect to my portage server. Here is what my /etc/rsyncd.conf file looks like:

```

# /etc/rsyncd.conf

# $Header: /var/cvsroot/gentoo-x86/net-misc/rsync/files/rsyncd.conf,v 1.6 2005/$

# Minimal configuration file for rsync daemon

# See rsync(1) and rsyncd.conf(5) man pages for help

# This line is required by the /etc/init.d/rsyncd script

pid file = /var/run/rsyncd.pid

use chroot = yes

read only = yes

# Simple example for enabling your own local rsync server

[gentoo-portage]

        path = /usr/portage

        comment = Gentoo Linux Portage tree

        exclude = /distfiles /packages

[gentoo-packages]

#For distributing Portage packages (distfiles) to internal clients

path = /usr/portage/distfiles

comment = Gentoo Linux Packages mirror

```

And here is what my make.conf file looks like on my client:

```

# These settings were set by the catalyst build script that automatically built$

# Please consult /etc/make.conf.example for a more detailed example

CFLAGS="-O2 -pipe -march=athlon-xp"

CHOST="i686-pc-linux-gnu"

CXXFLAGS="${CFLAGS}"

MAKEOPTS="-j2"

USE="~x86 gtk qt gnome gdm alsa cdr nptl cups foomaticdb ppds dga xvid cdparano$

PORTDIR=/usr/portage

DISTDIR=${PORTDIR}/distfiles

SYNC="rsync://copenhagen/gentoo-portage"

FETCHCOMMAND="rsync rsync://copenhagen/gentoo-packages/\${FILE} ${DISTDIR}"

```

I have the hostname of my portage server set to copenhagen which I have the IP defined in my /etc/hosts to point to 192.168.1.4. 

And here is what my make.conf file looks like on my portage server:

```

# These settings were set by the catalyst build script that automatically built$

# Please consult /etc/make.conf.example for a more detailed example

CFLAGS="-Os -march=pentium3 -pipe"

CHOST="i686-pc-linux-gnu"

CXXFLAGS="${CFLAGS}"

MAKEOPTS="-j3"

USE="-X mmx sse cvs"

```

Like I mentioned before, what I would like to have happen is if I try to emerge something that is not cached on the portage server, have the portage server download the ebuild for that package and all the dependencies and cache it locally, then serve it to the client that requested it for the emerge. Any ideas what I need to change to make that happen with my setup? Right now, when I do an 'emerge sync' on my client, that works fine. Just not emerge <package>...

Thanks!

----------

## jkroon

```
FETCHCOMMAND="rsync rsync://copenhagen/gentoo-packages/\${FILE} ${DISTDIR}" 
```

That is your problem right there.  You need some mechanism for the client to notify the server what to fetch.  Some solutions make use of squid, but squid usually don't cache overly large files (yes, it actually has an option to not cache files bigger than x bytes - argument: it will expell too many other pages from the cache).  Enters torpage.  To the best of my knowledge the configuration for torpage is well documented, and can be obtained from http://www.kroon.co.za/torpage.php - it was developed to exactly what you are now attemting.

----------

## bigdave1

Cool.. Now  have an even better question. Now that I've downloaded the ebuild for torpage, where do I save it and how do I get it to emerge?  :Smile: 

Thanks!

----------

## jkroon

```
 #mkdir -p /usr/local/portage/sys-apps/torpage

# echo 'PORTDIR_OVERLAY="/usr/local/portage"' >> /etc/make.conf

# echo 'sys-apps/torpage ~x86' >> /etc/portage/package.keywords

# mv /path/to/torpage-0.2.3.ebuild /usr/local/portage/sys-apps/torpage/

# emerge -f torpage

# ebuild /usr/local/portage/sys-apps/torpage/torpage-0.2.3.ebuild digest

# emerge -av torpage
```

If you want me to break that down just shout.

----------

## bigdave1

Ok, this may sound like a stupid question, but do I need to install this on the server or the client or both? According to the readme on the link you provided, it sounds like both. I just want to make sure so I get this right.

Thanks!

----------

## bigdave1

Ok, well I installed torpage onto the server and client anyway. So, disregard my previous post. However, when I try to emerge something it doesn't pull it from my portage server. And my portage server doesn't go out and retrieve it. Here's what I get when I try to emerge gdesklets-core:

```

linuxclient portage # emerge gdesklets-core

Calculating dependencies ...done!

>>> emerge (1 of 5) dev-python/pyorbit-2.0.1 to /

>>> Downloading torpage://copenhagen/distfiles/pyorbit-2.0.1.tar.bz2

USAGE:  /usr/sbin/torpage_fetch URI DISTDIR

    URI - a valid uri, probably prefixed with torpage://

        DISTDIR - the destination directory, must exist.

 is not a directory!

>>> Downloading http://distfiles.gentoo.org/distfiles/pyorbit-2.0.1.tar.bz2

--21:55:05--  http://distfiles.gentoo.org/distfiles/pyorbit-2.0.1.tar.bz2

           => `/usr/portage/distfiles/pyorbit-2.0.1.tar.bz2'

Resolving distfiles.gentoo.org... 64.50.238.52, 216.165.129.135, 156.56.247.195

Connecting to distfiles.gentoo.org[64.50.238.52]:80... connected.

HTTP request sent, awaiting response... 200 OK

Length: 242,959 [application/x-tar]

100%[====================================>] 242,959      126.07K/s             

```

As you can see it gives an error when trying to grab it from copenhagen, my portage server. Here's what the /etc/make.conf file looks like on my client:

```

# These settings were set by the catalyst build script that automatically built this stage

# Please consult /etc/make.conf.example for a more detailed example

CFLAGS="-O2 -pipe -march=athlon-xp"

CHOST="i686-pc-linux-gnu"

CXXFLAGS="${CFLAGS}"

MAKEOPTS="-j2"

USE="~x86 gtk qt gnome gdm alsa cdr nptl cups foomaticdb ppds dga xvid cdparanoia esd 3dnow 3dnowext acpi avi doc gif jack javascript jpeg mp3 mpeg opengl perl php png quicktime samba spell spl tcpd usb wmf wxwindows xmms win32codecs"

PORTDIR=/usr/portage

DISTDIR=${PORTDIR}/distfiles

SYNC="rsync://copenhagen/gentoo-portage"

FETCHCOMMAND_TORPAGE="/usr/sbin/torpage_fetch \#{URI} \${DISTDIR}"

RESUMECOMMAND_TORPAGE="${FETCHCOMMAND_TORPAGE}"

PORTDIR_OVERLAY="/usr/portage"

```

And here's what /etc/make.conf file looks like now on my server:

```

# These settings were set by the catalyst build script that automatically built$

# Please consult /etc/make.conf.example for a more detailed example

CFLAGS="-Os -march=pentium3 -pipe"

CHOST="i686-pc-linux-gnu"

CXXFLAGS="${CFLAGS}"

MAKEOPTS="-j3"

USE="-X mmx sse cvs"

PORTDIR_OVERLAY="/usr/portage"

```

I have started the torpage service using "/etc/init.d/torpage start" and added it to the default runlevel. I'm not sure where to go from this point. Any ideas what I need to do?

Thanks!

----------

## jkroon

Nasty one.  You made a small typo:

```
FETCHCOMMAND_TORPAGE="/usr/sbin/torpage_fetch \#{URI} \${DISTDIR}"
```

should be

```
FETCHCOMMAND_TORPAGE="/usr/sbin/torpage_fetch \${URI} \${DISTDIR}"
```

And now since it took me about a minute to spot that:  There is a # before {URI} instead of a $.

Also, on the client if you want to force it to use torpage you can set the FETCHCOMMAND and RESUMECOMMAND to equal /bin/false:

```
FETCHCOMMAND="/bin/false"

RESUMECOMMAND="/bin/false"
```

Note that once torpage has downloaded a file it still needs to be transfered from the server to the client - I do this using vsftpd on the server and wget on the client just like portage would normally do.

----------

## bigdave1

Ok, I'm confused as to what I need to do... I didn't realize that I had to setup a FTP server. Are there instructions on how to setup vsftpd for this purpose or can you explain that to me? As of right now, here's what I'm getting when I try to emerge something:

```

linuxclient distfiles # emerge lftp

Calculating dependencies ...done!

>>> emerge (1 of 1) net-ftp/lftp-3.0.13 to /

>>> Downloading torpage://192.168.1.4/distfiles/lftp-3.0.13.tar.bz2

Torpage is initiating fetch for lftp-3.0.13.tar.bz2 (net-ftp/lftp-3.0.13)

This is torpage on copenhagen (protocol level 0.2)

Locking for lftp-3.0.13.tar.bz2.

Obtained lock, proceeding to download

Successfully fetched files for net-ftp/lftp-3.0.13

--14:43:52--  ftp://192.168.1.4/gentoo/distfiles/lftp-3.0.13.tar.bz2

           => `/usr/portage/distfiles/lftp-3.0.13.tar.bz2'

Connecting to 192.168.1.4:21... failed: Connection refused.

>>> Downloading http://distfiles.gentoo.org/distfiles/lftp-3.0.13.tar.bz2

--14:43:53--  http://distfiles.gentoo.org/distfiles/lftp-3.0.13.tar.bz2

           => `/usr/portage/distfiles/lftp-3.0.13.tar.bz2'

Resolving distfiles.gentoo.org... 64.50.238.52, 216.165.129.135, 156.56.247.195

Connecting to distfiles.gentoo.org[64.50.238.52]:80... connected.

HTTP request sent, awaiting response... 200 OK

Length: 1,223,604 [application/x-tar]

```

I'm guessing that I'm getting this problem due to the fact I have no FTP server setup. If you could help me setup vsftpd for this purpose, I'd greatly appreciate it.

Thanks!

----------

## bigdave1

Ok, I installed vsftpd, and I set the home directory of the user ftp to /usr/portage/distfiles (just took a guess, found out that must be incorrect). Now, when I do an emerge, here is what I get:

```
linuxclient distfiles # emerge lftp

Calculating dependencies ...done!

>>> emerge (1 of 1) net-ftp/lftp-3.0.13 to /

>>> Resuming download...

>>> Downloading torpage://192.168.1.4/distfiles/lftp-3.0.13.tar.bz2

Torpage is initiating fetch for lftp-3.0.13.tar.bz2 (net-ftp/lftp-3.0.13)

This is torpage on copenhagen (protocol level 0.2)

Locking for lftp-3.0.13.tar.bz2.

Obtained lock, proceeding to download

Successfully fetched files for net-ftp/lftp-3.0.13

--15:05:05--  ftp://192.168.1.4/gentoo/distfiles/lftp-3.0.13.tar.bz2

           => `/usr/portage/distfiles/lftp-3.0.13.tar.bz2'

Connecting to 192.168.1.4:21... connected.

Logging in as anonymous ... Logged in!

==> SYST ... done.    ==> PWD ... done.

==> TYPE I ... done.  ==> CWD /gentoo/distfiles ... 

No such directory `gentoo/distfiles'.

>>> Resuming download...

>>> Downloading http://distfiles.gentoo.org/distfiles/lftp-3.0.13.tar.bz2

--15:05:05--  http://distfiles.gentoo.org/distfiles/lftp-3.0.13.tar.bz2

           => `/usr/portage/distfiles/lftp-3.0.13.tar.bz2'

```

As you can see, its trying to change directories. I have not made changes to any other configuration files that I've posted before. Can you tell me what I need to change in order to make this work? I'm so close I can feel it!!!

Thanks!

----------

## jkroon

Local network right?  You can also use NFS.

I've changed my DISTDIR on my "server" to /home/ftp/gentoo/distfiles, and then set up vsftp to use /home/ftp as the ftp root for anonymous connections, my vsftpd.conf file looks as follows:

```
anon_mkdir_write_enable=NO

anon_other_write_enable=NO

anon_upload_enable=NO

anonymous_enable=YES

check_shell=YES

chmod_enable=NO

chown_uploads=NO

chroot_local_user=YES

dirlist_enable=YES

download_enable=YES

force_dot_files=NO

guest_enable=YES

guest_username=nobody

hide_ids=YES

listen=YES

local_enable=NO

passwd_chroot_enable=YES

session_support=NO

syslog_enable=YES

write_enable=NO

max_per_ip=5

anon_root=/home/ftp

ftpd_banner="This is my banner, live with it."

background=YES
```

On the client, my /etc/portage/mirrors file simply contains:

```
local torpage://pug.lan/gentoo
```

----------

## bigdave1

Yes, this is a local network. However, I do not wish to use NFS. One day I might want to make this server public. 

Can I not leave the DISTDIR on the server to /usr/portage/distfiles? Right now, I'm just using the example vsftpd.conf file that I copied to vsftpd.conf. And here is what my /etc/portage/mirrors file contains:

```

local torpage://192.168.1.4

```

So, with all this in mind, can you tell me what I need to change in order to get this setup to work?

Thanks!

----------

## jkroon

Yes you could, if you set your ftp root directory to /usr/portage it should work.  Alternatively, you can use "mount --bin /usr/portage/distfiles /home/ftp/gentoo/distfiles" or something similar to make the distfiles available inside your ftp root.  No, symlinks does not work.

----------

## bigdave1

When I set the ftp home directory back to /home/ftp and I create /home/ftp/gentoo/distfiles and mount it as you suggested, it works! Now, the only question I have is if I need to restart the server for any reason, how do I get "mount --bin /usr/portage/distfiles /home/ftp/gentoo/distfiles" to run on startup? 

Thanks!

----------

## jkroon

In /etc/fstab:

```
/usr/portage/distfiles /home/ftp/gentoo/distfiles none bind 0 0
```

----------

## greboide

i would just like to thank linkfromthepast cause his script works like a charm here with no hasslesz, also thanks to the guy who initiated this stuff cause it really is needed, and sorry for ressurecting threads but this thing worked so  well here that i had to leave a report, btw i only use 2 pcs.

----------

