⚠	NOTICE! This is a static HTML version of a legacy ImageJ Trac ticket. The ImageJ project now uses GitHub Issues for issue tracking. Please file all new issues there.

Ticket #1845 (closed defect: fixed)

Opened 2013-04-27T13:34:10-05:00

Last modified 2013-04-28T18:13:57-05:00

Fix ImageJ 1.x mirror

Reported by:	dscho	Owned by:	dscho
Priority:	major	Milestone:	~~imagej2-b7-ndim-data~~
Component:	Server Admin	Version:
Severity:	serious	Keywords:
Cc:	curtis, justin.senseney@…	Blocked By:
Blocking:	#1705

Description (last modified by dscho)

Since we cannot use rsync, we set up a mirror script. To be nice, we tried to use HEAD requests whenever possible (but quite a few directories do not have index.html files, making HEAD requests impossible). The Jenkins job ran twice a day:

http://jenkins.imagej.net/jobs/ImageJ-1.x-website-mirror/

Unfortunately, this was still too much and we were asked to download a large .tgz file with the complete files every single night.

So change the mirror yet again (the fifth iteration now).

Change History

comment:1 Changed 2013-04-27T13:52:00-05:00 by dscho

Status changed from new to closed
Resolution set to fixed

The advantage now, of course, is that we get all the files that are there, not just the ones we can reach directly or indirectly via http://imagej.nih.gov/ij/index.html.

To make things a bit nicer for ourselves, let's put things into a Git repository.

So this is what I have done so far:

since I trust things on the other side to run as smoothly as experience taught us, I test explicitly whether the file is older than 26 hours and fail if it is
I then import the .tgz file into a Git directory for easier handling
Then, I check out the files. This will touch only those files that really changed (removing those that have been deleted), helping the subsequent steps because of maintained mtimes.
I had to adjust the MirrorWebsite class quite a bit to accomodate for the situation that we are rewriting links from a mirror of the website.
Then I update the Git repository for the complete update site, adding a merge between the previous state and the imported .tgz file.

To determine the best time for this to run (I was told to use "the off hours" with a hint that I should heed both US and EU), I checked the timestamp of the ij.tgz file. From an awfully small n I deduce that the job is run at half past midnight by cron and that it runs for a little less than four minutes. My best bet was to leave things at when they used to run: five past one in the morning (local time, which is still one and a half hours after the cronjob starts). And I removed the noon mirroring which now means that whenever there are changes on the website, the mirror is out-of-date for most of the day.

All of this can be found here:

http://jenkins.imagej.net/job/ImageJ-1.x-website-mirror/

Last edited 2013-04-28T18:12:51-05:00 by dscho

comment:2 Changed 2013-04-28T18:13:57-05:00 by dscho

Description modified