Case Study: Anzacs_Collection_Jazzcat

05Sep2010
‹posted by Elwix›
Topic: 
Platform: 

Greetings.

SceneBase user1 has made inquiry:

Hi guys, my anzacs collection is 35.8 MB Anzacs_Collection_Jazzcat.rar Yours is: smaller then 23 MB if i'm right. Any reasons ?

Addressing this gives a good example regarding one of the reasons we think an effort to process/repackage in a standard way2 is worthwile...

Reader is correct - the original archive we have for 'Anzacs_Collection_Jazzcat.rar' is 37,555,766 bytes. SceneBase version is 24,085,514 bytes. Why the big difference?

First, the original rar archive itself contains multiple directories (and subdirectories) with .zip, .gz, zip4 sets, and even other .rar inside. This is an extreme case, usually released c64 collections are simpler with one or several flat directories containing just d64 images. But it is not a unique case as we encounter other sets containing zipped d64s or images spread inside multiple hierarchical directories. When we process a collection we unpack everything as far as possible then pull out the actual media images, in this case d64 and t64. We ended up with 1,027 media images and those you will find in "Anzacs_Collection_Jazzcat.SB.7z" which is the SceneBase version of this collection set. Looking back at the original we see close to 1000 or so relevant files so we presume a few of the contained archives had two or more media images inside. The file count is close enough to give us confidence that we secured all of the actual media images. Therefore we don't think the size difference is a result of missing media images.

The size differential can be attributed to:

  1. lzma compression beats rar, sometimes by very much!
  2. we compressed pure d64/t64... as noted the original archive has many files already compressed with zip, gz, rar and recompressing those into the final rar would have degraded end results - it's counterproductive to compress already compressed data.
  3. we take only media images. Any remaining files, like jpg, pdf, .prg, etc are left out of the main SB set. We produce a "SB-nidx" (which means "no index") so that the files are retained for history/completeness sake but kept out of the "clean" media image archive. "Anzacs_Collection_Jazzcat.SB-nidx.7z" is available directly from the FTP site and is about 1M in size.

For another perspective, the total size of processed and compressed SB files to date is around 4.98GB - the original archives that they are derived from are about 5.86GB or about 15% reduction. So do not be alarmed if the SceneBase collection sets are smaller than their original counterparts. That said, if you feel that we've made a mistake, missed disk images, or so on please let us know...

Thank you for the inquiry and keep scene spirit alive!

 

1 we couldn't respond to you directly because the email you provided, info@web.de, is a generic responder.

2 as we've stated before, we have no problem with people releasing their collections in whatever way they want and do not criticize based on format.