About the C64 Collections Project

SceneBase's C64 Collections is a central directory of collections containing archived disk and tape images. These images are most often created by reading the floppy and cassette media with original hardware and transferring the data to modern PCs via specially made cables.

At the outset let us state that we think this will have limited appeal, being most valuable to hardcore scene collectors who are researching scene history or scavenging specific kinds of releases and files for focused collections they may be putting together. But we'll try to answer the most obvious questions regarding why we're undertaking this, what our approach is, and why we think this is a valuable step in the quest for c64/scene history preservation.

Goals:

  • preservation
    • gather, track and index media image collections
  • availability
    • maintain an up to date directory of download links with stable mirrors
  • organization
    • devise a simple and concise way of referring to individual media images across all collections
    • produce metadata for each collection that may improve browsing, searching, and sorting
    • identify duplicate media images (future)
    • identify empty media images / corrupt transfers (future)

Problems:

Collections are usually posted by the transferring party (aka, the 'disk ripper') in a compressed archive file (.zip, .rar, .7z, .tar ,and .gz are most frequently found). They may contain one or more sub-directories; may contain .d64, .t64, .d81, or other media image formats; may further have used zip or gzip to compress the individual media images, requiring multiple steps of decompression before being able to access the actual media image. Media image files are sometimes named in a default manner by the transfer tool used (e.g. a scheme like DISK0001.D64 is fairly common) but they could also be named by hand. Some collections, but not all, have additional metadata like a text directory listing - either one file covering all images or sometimes one text file per image. In summary, the only thing collections packaged by different rippers have in common is that they generally have nothing in common - release format, image naming scheme, compression tools, directory organization, and accompanying metadata all vary substantially. Looking across collections, there are mass duplication of individual media image names due to repeated use of default naming schemes.

Aside from packaging, method of release is also problematic. Collections are frequently shared via short term free file hosting services. Collection availability through point of release may last as little as a month, usually no more than a few months. An individual who wants to research collections is faced with the task of hunting down links that may be largely defunct, leeching torrents at 4kb/s, or begging for reposts.

Finally, there is no concise and simple 'code' for referring to collections or media images in collections. Citing a file found on the disk named "0150a.d64" is meaningless without providing the collection's file name as well. Where collections have been released with a directory hierarchy, referring to a specific media image may require citing something like "u/Unreal/CHS_A.D64", and once again that is meaningless without also including the collection file name as well.

We could bitch about these issues and try to get rippers to conform with some ad hoc standard, but screw that. Better to let rippers take up the task of transferring media however they want and whatever way they work fastest. Most people aren't worried by these problems as long as they get to throw another 1000 d64's on their hard drive.

Approach:

We're actively gathering collections that are posted online in whatever form. We're processing them with some semi-automated scripts and tools to transform each collection into a "SceneBase" version with consistent content, naming schemes, and metadata. Individual media images are assigned a unique id and the file name corresponds with the id. Metadata includes text directory listings, CSV listings suitable for use in a spreadsheet, and PNG screen shots of the (petscii) directory as it might appear on genuine hardware. Collections are repackaged in a single flat directory containing all media images. Packages are compressed with PeaZip/LZMA with maximum compression into 7zip archives resulting in anywhere from slightly to significantly smaller files than the original collection. Very large collections that may have been released as one large file (e.g. "Jazzcat_Disk_Collection.7z" weighing in at 621Mb) are split into parts not exceeding 200Mb.

A directory listing of all collections is being maintained with a count of images, what id numbers are used in the collection, and source information. The directory provides the capability for multiple download locations (mirrors) for each file. We hope that, with the help of others providing mirrors for some or all of the files, the collections will be available to researchers indefinitely.

To ensure that we don't lose any valuable information from the original collection release, SceneBase versions include all content present in the original collection files - images are renamed but accompanying graphic files, text files, and so forth are retained. We even provide a file mapping the new media image file names to their original form, including directory structure.

To summarize...

Benefits:

  • SB collections contain all content from the original packages
  • SB collections provide a smaller "info" package with metadata about the collection's content, which could be used to quickly assess whether the larger file is of further interest.
  • SB collections contain "rename files" that can be used to look up or recreate the original image names, including directory structure
  • SB collections are compressed into .7z archives and are anywhere from slightly to significantly smaller than the original package. Large collections are divided into sets for easier download
  • One place to find all collections of media images provided by hard working rippers
  • Mirror system to ensure high availability of all collections

Maintainers:

SceneBase C64 Collections are currently maintained by Elwix/Style and Demonger.