2 Comments

After we received an appropriate Intel SYP301front-end (Getting an SRM for the Intel iPSC/860), we could start thinking about getting our Intel iPSC/860 running again. The SYP-301 came with documentation and software for iPSC/2 and iPSC/860 systems, and an iPSC/2 cardcage with 8 node boards from an iPSC/2, a node board from an iPSC/860, an I/O board with SCSI controller, and a USM board.

First, we had to get the software up and running on the SRM. We wrote about getting the SRM to boot System V UNIX earlier (SRM for the iPSC/860 boots!), but we hadn’t installed the iPSC specific software yet, and there were some issues with the recovered System V installation. Some files were corrupted on the disk, and as a result some UNIX commands were misbehaving (e.g. any attempt to use the vi editor resulted in a core dump, leaving us with just ed (which is fine, but vi is much more convenient).

with a little  TLC, using new tension belts and Tyvek over the tape bollards, we managed to make images of almost all of the installation tapes. We used the opportunity to write an article about our process for QIC Tape Data Recovery, so you can find some of those details here. One tape was giving us difficulty though, and that was the Intel System V R3.2 V2.1 installation tape, which we need for the all important first step of installing a clean copy of the OS. That tape had a bad spot in the middle of the tape, a spot that threw read errors on every pass across it.

To recover that installation tape, we read all the bits that we could read, leaving gaps of missing data. Analysis learned that the installation tape consisted of a number of cpio file archives. These archives contained the files to install, as well as installation scripts. As part of the installation scripts, a check was done to verify that all files had been extracted from the archive. This gave us a list of which files were supposed to be present in each of the archives. Analyzing each of the archives that had some data blocks missing, we could determine which files were fully there, which files were partially there, and which were completely missing. We then wrote a small utility to extract all the files from the disk image we made of the existing installation (fortunately, System V filesystems are dead easy to figure out), and using a hex editor, we put the missing files into the archives, creating cpio headers for each of them. Confirmation that this was done correctly was gained when the total length of the archive after fixing them was a multiple of the 512-byte tape block size.

The other issue was that the installation diskette threw errors about being unable to write temporary files when trying to start an installation. We found that this could be fixed by running fsck over the diskette image.

So, with fixed installation diskette and tape images (both available in the downloads section), we managed to do a fresh installation of Intel System V UNIX on the pcem emulator.

After installing the base Operating System, we followed all the steps in the Intel iPSC/2 and iPSC/860 Release 3.3.1 Software Product Release Notes to install all the R3.3.1 software and patches, still on pcem. After writing the resulting disk image to an IDE disk again (since replaced with an industrial-grade CF card), we put the disk into the SYP-301 and booted it. The iPSC interface card driver detected the DCM card, and the network driver recognized the Intel pc586 NIC. After configuring TCP/IP, we could also access the SRM over the network.

The next step was to connect the SRM to the Hypercube. The cable that came with out iSPC/860 was badly damaged. One connector had partially broken off, and the cable was nearly cut in half near the middle. After cutting out the damaged bit, we figured out the wiring, then put a new connector on the half of the cable with the other good connector. There are 7 shielded twisted pairs in the cable, 4 pairs for the high-speed cube interconnect, and 3 pairs for a slower synchronous serial connection for diagnostics.

After connecting the cube to the SRM with the fixed cable, we started following the diagnostic procedures outlined for verifying a system. First, it turned out that one of the 16 i860 node boards wasn’t booting at all. I swapped it with the i860 node board I received with the SRM, but that one timed out during initialization. Swapping memory and DCM modules between these two boards did not result in a working node, so we decided to continue running diagnostics with 15 nodes. it turned out that 2 more i860 nodes were unreliable and didn’t pass diagnostics.

The way the iPSC works, is that you allocate a hypercube for your calculations, so if you have between 8 and 15 nodes, you can run your calculations on a maximum of 8 nodes. The architecture does allow mixed-architecture hypercubes, though, so what we ended up doing is put 8 working i860 nodes into the chassis as nodes 0-7, and replace nodes 8-15 with the iPSC/2 SX nodes (Intel 386 + Weitek numeric coprocessor) that we received with the SRM. Fortunately, all of these node boards are functioning, so with that combination of boards, we could pass initial diagnostics with the full 16 nodes. These initial diagnostics (“Node Confidence Tests”) start from the assumption that nothing works, and start testing all low-level functionality one bit at a time. After passing the NCT, you can run the SAT (System Acceptance Test); this is a higher-level diagnostic that mimics application behavior to stress-test the system as a whole. Since the SAT tests are single-architecture only, we ran them twice, once for an 8-node i860 hypercube, and once for an 8-node SX hypercube.

System V.3.2 UNIX (ipsc860)



login: camiel
Password:
UNIX System V/386 Release 3.2
ipsc860
Copyright (C) 1984, 1986, 1987, 1988 AT&T
Copyright (C) 1987, 1988 Microsoft Corp.
All Rights Reserved
Login last used: Fri Jan 15 23:31:01 1988

/         :     Disk space:  97.23 MB of 110.74 MB available (87.80%).
/usr      :     Disk space: 221.52 MB of 332.22 MB available (66.68%).

Total Disk Space: 318.75 MB of 442.96 MB available (71.96%).

TERM=vt100
$
$ su
Password:
#
# bootcube
 1.  Reset driver
 2.  Scan cardcages
 3.  Reset nodes
 4.  Scan nodes
 5.  Download 386 boot loader: /usr/ipsc/lib/bootld
 6.  Download 860 boot loader: /usr/i860/ipsc/lib/bootld
 7.  Start boot loader
 8.  Load SRM Direct Connect Module
Load SRM DCM with /usr/ipsc/lib/hbits.g
 9.  Query nodes
10.  Load 386 node Direct Connect Modules
Load 1 node DCM's with /usr/ipsc/lib/386nbits.d
Load 4 node DCM's with /usr/ipsc/lib/386nbits.e
Load 7 node DCM's with /usr/ipsc/lib/386nbits.f
11.  Load 860 node Direct Connect Modules
Load 8 node DCM's with /usr/ipsc/lib/860nbits.f
12.  Initialize node Direct Connect Modules
13.  Test nodes
14.  Reset Direct Connect Modules
15.  Check configuration file
16.  Execute startup run file: /usr/ipsc/lib/rc1  -Q /usr/ipsc/conf/cubeconf
17.  Send load command to nodes
18.  Get boot partition
19.  Download /usr/ipsc/lib/nx.b
20.  Download /usr/i860/ipsc/lib/nx.b
21.  Release boot partition
22.  Execute startup run file: /usr/ipsc/lib/rc2   -Q /usr/ipsc/conf/cubeconf
No drives available
Non-CFS initialization complete
#
# getcube -t rx
getcube successful: cube type 8m16rxn0 allocated
#
# sat

iSAT - iPSC System Acceptance Test, v1.0
Probing hardware configuration ...
        Detected RX nodes with: 
    no extras

Reading test configuration file /usr/ipsc/diag/satbin/satconf.rx ...
        5 tests listed in configuration file.

Main Menu
   0. Return to UNIX
   1. Show Help
   2. Manage Test Configuration
   3. Manage Log File
   4. Enter Shell
   5. Run Tests

Enter Selection [5] -> 5

System Acceptance Tests
   0. Return to Main Menu
   1. Help
   2. Msgsize Test
   3. CFT-ncft Test (disabled)
   4. Async Test
   5. Rand-ih Test
   6. 3d-fftrx Test
   7. Run All Tests (approx. 110 minutes)

Enter Selection [7] -> 7

Running System Acceptance Tests

How many cycles? (0 for continuous; q for menu) [1] -> 
Starting logging for this run...Done

 -@- 01/16/88 03:03:08  Starting SAT cycle 1 of 1
 -@- 01/16/88 03:03:08  Executing Msgsize
1/16/88 03:03:09 NODE: 8 PID: 64 : Start MSGSIZE.H test.
NODE: 8 PID: 64 : Message size = 33000, cycles = 100
1/16/88 03:03:13 NODE: 8 PID: 64 : On cycle 1, msg size 0
 -@- msgsize: PASSED -@- 
 -@- 01/16/88 03:33:11  Test done, 0 errors.
 -@- 01/16/88 03:33:12  Executing Async
ASYNC: Asynchronous Message Passing test
ASYNC: using async.rx with 100000 byte messages.

 -@- ASYNC async.rx: PASSED -@- 
 -@- 01/16/88 04:13:15  Test done, 0 errors.
 -@- 01/16/88 04:13:16  Executing Rand-ih
Message 0, seed 00000001, Sat Jan 16 04:13:18 1988
RAND: Random Message Exchange test, using ihrxnode.rx
 -@- RAND ihrxnode.rx: PASSED -@- 
 -@- 01/16/88 04:43:19  Test done, 0 errors.
 -@- 01/16/88 04:43:20  Executing 3d-fftrx
 Elements =      2097152 MFLOPS =    7.685658     Total time =   28.64900    
 Elements =       262144 MFLOPS =    6.376055     Total time =   3.700000    
 Elements =      1048576 MFLOPS =    7.224106     Total time =   14.51400    
 Elements =       524288 MFLOPS =    6.776987     Total time =   7.349000    
 Elements =       524288 MFLOPS =    6.797335     Total time =   7.327000    
 Elements =      2097152 MFLOPS =    7.612848     Total time =   28.92300    
 Elements =      2097152 MFLOPS =    7.465212     Total time =   29.49500    
 Elements =      2097152 MFLOPS =    7.571488     Total time =   29.08100    
 Elements =      2097152 MFLOPS =    7.567844     Total time =   29.09500    
 Elements =      1048576 MFLOPS =    6.703150     Total time =   15.64200    
 Elements =      1048576 MFLOPS =    6.865550     Total time =   15.27200    
 Elements =      1048576 MFLOPS =    6.920837     Total time =   15.15000    
 Elements =       262144 MFLOPS =    5.570579     Total time =   4.235000    
 Elements =       262144 MFLOPS =    5.745592     Total time =   4.106000    
 Elements =       262144 MFLOPS =    5.770891     Total time =   4.088000    
          Name        Hardware  MaxTime RunTime Errors Comment
   1. Msgsize             HOST     30      30      0   host-node comm
   2. CFT-ncft             CFS disabled     0      0   node-ionode general
   3. Async               NODE     40      40      0   node-to-node comm
   4. Rand-ih             NODE     30      30      0   random isends, hrecvs
   5. 3d-fftrx         860NODE     10      10      0   numerics and comm
        Totals:                   110     110      0

 -@-    SAT run finished, 0 errors seen in this run.
#

We also compiled some of the code examples and ran them on the system.

One thing that isn’t working yet is the concurrent filesystem. The hard disks in the iPSC/860 haven’t survived, and CFS doesn’t like the SCSI2SD cards that I tried, even after making some changes to the SCS2SD card firmware to mimic the correct disk geometry. Work on getting CFS working remains to be done.

In the current setup, there are a total of 21 processors. 8 i860 and 8 i386 (+i387 + Weitek) processors on the node boards, 1 i386 (+i387) processor in the SRM, 3 i386 (+i387) processors on the SCSI I/O nodes, and one i386 (+i387) processor on a “service” I/O node. The I/O nodes aren’t part of the hypercube, but each of them is linked to one of the Hypercube nodes, using the same interconnect used between hypercube nodes. The SCSI I/O nodes each have a SCSI bus for the CFS (Concurrent File System), the “service” I/O node can be used to run UNIX commands on (e.g. to backup the CFS), so you don’t have to take compute resources away from the hypercube to do so.

Write comments...
Log in with ( Sign Up ? )
or post as a guest
Loading comment... The comment will be refreshed after 00:00.