Write a comment

After the Convex C240 Arrived, and the I/O cabinets and spare parts along with it, it was time to continue work on the Convex C220 after the Initial Convex C220 Checkout.

Finding the System Disk

Since there is no complete set of installation tapes for ConvexOS, the first priority was to locate a disk with the OS installed that can be used as a system disk.

So, let’s tally the disks we have:

  • 13 x 8” 3GB Seagate Sabre VII IPI disks, unlabeled, not installed
  • 1 x 5.25” 1200MB Seagate Elite IPI disk in an 8” CDC enclosure, unlabeled, not installed
  • 20 x 8” 3GB Seagate Sabre VII IPI disks, labeled, installed in two Convex C3800-style I/O cabinets
  • 10 x 5.25” 3GB Seagate Elite III IPI disks in Convex enclosues, labeled, and 4 x 8” 1GB Seagate Sabre V IPI disks, unlabeled, installed in a Convex C1/C2-style I/O cabinet

Now, according to the /ioconfig file on the SPU harddisk in my C220, when it was last powered down in 1998, it had 24 3GB disks attached; 14 5.25” Elite III disks (du0..du13), and 10 8” Sabre VII disks (du14..du21, du44, du45), with one of the Elite III disks as the system disk (du0).

Fortunately, the 10 Elite III disks in the C1/C2 style enclosure match the ID’s of the Elite III disks used with my C220 (except that du2, du6, du7, and du13 are missing), and the IPI device number settings match as well. However, the previous owner was told that this rack of disks belonged to the C240, not the C220. So, time to try to boot the C220 off the disk labeled “du0”. I have no way to read IPI disks other than through the C220, and the C220 can only read from the IPI disks once it’s booted, so I have no way to make a backup copy of the disk before I attempt this. Fortunately, the C220 booted straight away!

Making a backup of the system disk

So, time to locate a second Elite III disk that I can use to make an 1:1 copy of the system disk on. Looking at /etc/fstab, I can see that disk du0 is the only disk that has anything of interest on it. The C220 apparently had Convex UniTree (later HP UniTree) installed, an early mass-storage management package. The idea was that you could have a large (terabytes) amount of data stored on tapes (using tape robots), and that disks on a “fileserver” could be used as cache. The C220 was the fileserver. All disks, except du0, had a single partition that was mounted on /cache/fsn. One disk was mounted as /unitree, this one held the database that determined what was stored where on tape and on the cache disks. Unfortunately, that disk was du2, one of the missing Elite III disks. Without it, there’s no making sense of the data on the cache disks, as each disk contains one 3GB file, simply called “file” that is managed by UniTree. Unfortunately, only two of the other Elite III disks even power on. These disks seem not to be too reliable after 20 years. Since there is no data on any of these disks that is of any use to me, I decided to make two backup copies of the system disk, so I now have three copies in total. After configuring the C220 to connect to my network, I also made a copy of the four filesystems on that system disk (root, /usr, /mnt, and /tmp) to my NFS server so I can explore what’s on there with the C220 powered down.

And an unexpected power problem

In the middle of that copy, all of a sudden the system went down. Worse, it would not come up again. The 2-digit hex display on the front of the machine indicated an undervoltage on the -4.5V power supply for the ECL logic. As soon as this happens, the SCM – System Control Module – shuts down the power supplies, which makes troubleshooting it difficult. So, I took all the boards out of the machine, and disconnected the SCM. A quick look with the multimeter showed that the -4.5V supplies were not delivering anything at all. With all the fans disconnected, I could hear a faint blip every two seconds or so. With the oscilloscope, I could see that there was a short 0.3 V spike every two seconds. With the power removed, I tried to measure the resistance between the -4.5V and ground bars, and that between the -2V and ground bars for comparison. Of course, you’re not measuring a resistor, but rather the combination of the power supplies and the capacitors in them. On the -2V power supplies, the reading on the multimeter went up as the capacitors got charged, definitely measuring a large resistance. On the -4.5V power supplies, I measured a resistance of less than 0.1 ohms. With the boards installed, that would not be surprising (1600 amperes at 4.5 volt would result in an effective load of about 0.003 ohms), but with the bus bars effectively open, that’s not good. So, it looked like there was a short in the output of one of the 4 parallelled power supplies. I started removing them one by one, until the resistance went up. That happened at the second power supply I removed. I put the first supply back in, still looking good. Powered it on, yes, the -4.5V is back. Plugged the boards back in and reconnected the SCM, and the machine is running again.

  • ipi_cray
  • ipi_disks
  • ipi_enclosure

I was wondering if the problems with the Elite III disks were just related to their power supplies, so I took all the disks out of their Convex enclosures, and re-purposed a couple of disk drawers from a Cray Superserver CS6400 (I have the disks, but not the server) to hold the Elite III IPI disks. With a little creativity in cabling, I put all 10 Elite III disks – with the exception of one of my backup disks – into the two drawers, and connected them to the C220. 4 of the disks still fail.

 Surprising Discoveries

Exploring the data on the system disk, I made a surprising discovery. The previous owner was told that the C220 came from the Nuclear Research Center at Rossendorf, Germany. In doing some research, I learned that that center did, indeed have a C220, installed in 1991. In 1993, they upgraded to a faster C3820, and the C220 took on the role of file-server. So far, the story checks out. However, when I looked at the /etc/hosts file, there was no trace to be found of Rossendorf; instead, it told me that the system was called “chamin.sican.de”; SICAN was a chip manufacturer in Hannover, Germany. Further proof is that there’s a sticker on the side of the CPU cabinet, listing a contact person. The name on that sticker matches the name of the head of IT operations at SICAN in the 1990’s. I have tried to contact this person, as well as the current incarnation of SICAN (it merged a few times), but have not received a response so far.

Imaging the other disks

So, now that we have a basic system up and running, it’s time to start looking at the pile of disks. One thing I’m hoping to find in particular, is a copy of the FORTRAN compiler. Leaving the unlabeled ones for last, I first looked at the 20 disks in the C3800-style cabinets. Examining the root partition on du0, I learned that these disks came from a C3800 installed at the University of Leipzig. I also learned that most of the disks were part of a stripeset (RAID 0), but that there were two mode disks that I do not have. What I did not like was the transfer speed using tar to copy the files to my NFS server, I was only getting about 100 kilobytes per second. So, instead, I decided to use dd to make a full copy of the entire disk to a file on the NFS server. That gave me about 750 kilobytes per second, reasonably close to the maximum you can get with a 10Mbit network card. In this manner, I made copies of all the disks in the C3800-style cabinets, as well as those in the pile of unlabeled disks. I found that the power supply on 8 of the 8”disks were not working. I put those disks aside, and after I was done imaging the other disks, I transplanted 8 good powersupplies from disks I had already imaged to the 8 drives with bad powersupplies. After that, I was able to mage all 8 of those, too. On 4 of the drives, I had a lot of I/O errors, and the FAULT LED lighting up. On those disks, I replaced the logic board with one from a known-good drive. This worked in two cases, so in the end, there are only two 8” disks I could not make an image of. Making the images took about an hour per disk, so with the experiments, failures and retries, this probably took about 60 hours of - mostly unattended - uptime on the C220; a pretty good workout.

Retrieving data from the disk images

So now I had these raw disk images sitting on my FreeBSD NFS server. There is no volume label on these disks; instead, the OS has built-in knowledge of the different types of disks, and knows where each partition begins and ends. So, I wrote a little utility (“splitdisk”) that looks at the images, and tries to determine which partition has a UFS1 superblock at the right offset. So, if I feed this utility a “du0.dd” image, it might spit out “du0a.dd”, “du0d.dd”, “du0e.dd” and “du0h.dd” files for a standard system disk.

I then wrote a second utility – thank goodness UFS1 is such a simple filesystem – to extract the contents of the disk from one of these images (“ufsextract”). Running that utility on, say, du0a.dd, gives me a directory du0a that contains all the files, directories, and special files in the root filesystem.

On the root filesystem, on systems that use stripesets, there is a /etc/stripecap file that describes how the stripesets are laid out across the disks. So, I wrote a third utility (“createstripes”) that would read the stripecap file, and – if all the dunx.dd files are present – create the stn.dd files containing the stripesets, which I could then feed to ufsextract.

Finally, this worked well for stripesets where I knew which disk was which – the disks from the C3800-style cabinet – but not for unlabeled disks. Fortunately, it is not difficult to find partitions that are the first partition in a stripeset. These disks do have a valid superblock in the right location, but the root directory inode (inode 2) contains bogus data. So, finally, I wrote a fourth utility (“findsuper”), that I would feed a partition containing the superblock for a stripeset, and that would search the pile of images of unlabeled disks for copies of that superblock. That would provide me with enough information to puzzle these stripesets together again.

So, what did I find

I still need to go through a lot of the data, but the most interesting thing is that, in all, I found system disks for four different systems:

  • SICAN’s (= now my) C220
  • University of Leipzig (= now my) C240
  • University of Leipzig’s C3800
  • A C3800 that also belonged to SICAN 

I could match most disks to one of these systems, but two disks look like they belonged to the Dutch office of Convex itself. Of course, most of what is on all of these disks is just data, more precisely, data that does not belong to me, so that data will be destroyed. I’m trying to obtain permission from the former owners though to at least allow me to go through the data to see if there’s anything useful – like application software – amidst this data. One of the founders of Convex has graciously bestowed his blessing on doing that with the Convex data. 

Getting FORTRAN going

What I did do, was do a search for the fortran compiler, “fc”. I found three copies, on the C240’s /usr disk, and on both C3800 /usr disks. However, when I copy the compiler to the C220, and try to compile a simple FORTRAN program, I get an error message: 

c220# /usr/bin/fc hello.f
fc:
>>>>> C O M P I L E R E R R O R <<<<<
>>>>> See your system manager for help <<<<<
cannot compile; machine serial number mismatch 

Apparently, the FORTRAN compilers are tied to the serial number of the machine they ran on. Fortunately, I know what the serial numbers of those three machines were, and I know how to change the serial number of the C220 (some wires on the backplane). So, I changed my C220’s serial number to match one of the FORTRAN compilers, and…

cop:   fatal error
   machine class 0xa unknown - execution terminating
Using scnlink to initialize scan structures ...
 
scnlink:   FATAL ERROR:
   machine class 10 unknown
scnlink:   revision 5.2 (Fri Jun 18 17:49:00 1993)
-: /mnt/bin/.diaginit failed, cannot continue.

 Yikes! The first four bits of the serial number double as the machine’s hardware class. So, a C220’s serial number needs to be in the 8192-12287 range, or it won't initialize properly!

So, time to look at the fskel executable (the core of the compiler). The version of FORTRAN on the two C3800’s is the same version (that on the C240 is different), so I can do a direct comparison between the two exectables. It appears that the activation code (a string like “12345-67890-1234”) is embedded in the executable at install time. Bummer. I did play around with the debugger a bit, and even replaced the code in the executable that executes the “getsysinfo” system call (which retrieves the serial number, among other things) with a bit of assembler code that returns fake information matching the serial number of the machine the compiler was stamped for, but that didn’t work either. As several million instructions get executed before the error message, solving this riddle was proving to be elusive.

Then I went through the Convex tapes that I had made images of earlier with a fine tooth comb, and found that there was one tape in my collection that I had not made an image of. It’s a tape that’s labeled “AOI-0558”, and it was generated by the Convex TAC for my C220 (it’s serial number is on the label). I made an image of it, and found that it contained a couple of OS patched, followed by… version 7.0.1 of the FORTRAN compiler. There was a matching envelope attached to the reel, that would originally have held the piece of paper with a listing of the tape contents, and the activation codes for my machine. But the envelope is empty. However, someone did write an activation code for a different machine on that envelope. Could it be…?

  • fortran_label
  • fortran_tape
  • fortran_tape2

Now, the magtape drive on the C220 is not working, it has some mechanical problems that I may not be able to fix. There was another tapedrive with the system though, a Metrum RSP-2150. That’s a drive that stores 19GB of data on an S-VHS tape. I opened it up, and it looks remarkably clean inside. So, I connected it to my PC, and wrote a copy of the FORTRAN tape to an S-VHS tape. I read it back, and that worked fine too. I then hooked the Metrum up to the C220, configured it, and ran the installation program

c220# installsw -i -d /dev/rrsp0n
Tape device is /dev/rrsp0n
** Installsw Header File From Tape **
Copyright 1992 CONVEX Computer Corp.
All rights are reserved.
CREATED ON Sat Sep 26 13:44:09 1992
710-000215-005 ConvexOS Utilities Patch, release 10.1.2 3
720-000515-226 CONVEX FORTRAN, release 7.0.1 4
Choose the type of installation you want to perform:
LOCAL --> install on this machine
REMOTE --> install on a remote machine
ABORT --> abort installation
Enter your selection now --> local
Setting up installation environment. Please wait ...
tar: blocksize = 65536 blocking = 12
Idx Part Number Description Release Files Offset
1 710-000215-005 ConvexOS Utilities Patch 10.1.2 3 3
2 + 720-000515-226 CONVEX FORTRAN 7.0.1 4 6
^ Items marked with a + will be installed.
Items marked with a - will be de-installed.
selection? install
[Installing CONVEX FORTRAN v7.0.1]
--- Extracting files from tape...
--- Files extracted.
ALTERNATE DIRECTORY SPECIFICATION
Do you want to specify an alternate directory for this product (y or n)? n
Installing into /usr/convex & /usr/lib (/usr/convex/newfc for field tests).
The CONVEX FORTRAN compiler checks the serial number of the system
it is run on to determine if that system is authorized to run
CONVEX FORTRAN.
Usually, this software is shipped with a separate piece of paper
containing the necessary activation key[s] for your site.
If the activation key is not known, the TAC (Convex's Technical
Assistance Center) can provide the activation key for CONVEX FORTRAN
when provided with a system's serial number, and the part number for this
software (see the installation procedures document for the part #).
It is time to enter the activation key[s].
If the activation key is not available, you may bypass this step.
Note that the installation test will fail unless the compiler is
already stamped.
Do you want to enter an activation key now? y
What is the activation key?
24870-61479-5163
Using activation key[s] 24870-61479-5163
--- Begin installation on c220
--- running installation tests
--- fc installation tests all pass
The CXpa versions of the Fortran runtime libraries are provided for those
sites which use or expect to use the CXpa profiler. Substantial disk space
may be saved by not installing these libraries.
Do you wish to install this optional software? [yn] y
Installing optional software
--- Installation completed successfully.
Updating /etc/motd
Updating Version Database
Processing of installation media complete.
Trace file may be found in c220:/tmp/install375script.
c220# fc hello.f
c220# ./a.out
Hello World!
c220# 

Changing disks and making things neat

So, there are two things I disliked about the Elite III disks; they seem to be less reliable than the 8” IPI disks, the power supplies in the enclosures for them are crap, and the Cray disk drawers are too long for the Convex cabinets; they either stick out at the back or at the front. So, I ended up transferring the system disk filesystems to one of the Sabre VII disks. I used dump to copy the filesystems to a Metrum tape, then used restore to restore them onto the Sabre VII (I could not use dd, as the Sabre is slightly smaller than the Elite III). I then made a second copy onto a second Sabre VII, removed the Elite III disks, and started putting together everything I wanted in a single I/O rack. I bolted an empty I/O rack to the side of the C220, then filled it with the VMEbus chassis, the Metrum tapedrive, and 8 3Gb Sabre VII disks. I put in all the cabling, and added the necessary trim panels to make it look neat.

Upgrading the memory

Using some extra memory boards I had in my spare parts pile, I upgraded the memory on this C220 from 128MB to 512MB, which is the maximum it will take with the revision of the CPX board I have installed. A later version of the CPX supports up to 2 GB of memory, but I don't have that version.

For now, I'm about done with this system hardware-wise. I'd like to set up an FDDI network between the C220 and a system that also has a 100M or 1G ethernet NIC, because the 10Mbit network connection is a bit of a bottleneck. If I can get the magtape drive in shape, I’ll add that cabinet to the system as well, but for now, this is it.

 

Write comments...
Log in with ( Sign Up ? )
or post as a guest
Loading comment... The comment will be refreshed after 00:00.

Be the first to comment.