Click for Mini Pro Dual Drive!
Click for Mini Pro Dual Drive!


Keep this site growing - Please visit my Sponsors

Accelerate Your Mac!  - the source for performance news and reviews
The Source for Mac Performance News and Reviews

Mac, PC, & UNIX workstation comparisons
By Brent Robinson 26-Jun-98

(1) Introduction

Four types of benchmarks were performed :

• Tests of raw number-crunching power. Specifically, execution times for FIR filters of various lengths and for different precisions (doubles and longs). In addition, a tight loop exercising math library routines, "MathLib" was executed.

• Tests of typical engineering applications. Specifically, an application (App #1) to manipulate large data files and two applications (App #2 & App #3) that simulate signal processing hardware.

• Disk throughputs.

• Compilation times for App #3 using full compiler optimisation.

(2) Platforms Studied

The same benchmarks were run on 7 different platforms. These were :

Power Mac 9500/150

Processor : 150 MHz PowerPC 604

Operating System : MacOS 8.0, L2 Cache = 512 k, Disk Cache = 7680k

Compiler : CodeWarrior Professional Release 2, Optimisation Level = 3.

Power Mac 9600/350

Processor : 350 MHz PowerPC 604e

Operating System : MacOS 8.1, L2 Cache = 512 k, Disk Cache = 7680k

Compiler : CodeWarrior Professional Release 2, Optimisation Level = 3.

Power Mac 9600/G3

Processor : 300 MHz PowerPC 750 ("G3"). Power Logix upgrade card in 9600.

Operating System : MacOS 8.1, L2 Cache = 1024 k (150 MHz), Disk Cache = 7680k

Compiler : CodeWarrior Professional Release 2, Optimisation Level = 3.

SGI ONYX

Processor : 195 MHz MIPS R10000

Operating System : IRIX Release 6.2

Compiler : SGI, Optimisation Level = -O3 (-O2 for App #3).

SGI O2

Processor : 180 MHz MIPS R5000

Operating System : IRIX Release 6.3

Compiler : SGI, Optimisation Level = -O3 (-O2 for App #3).

Sun Ultra 60

Processor : 300 MHz Sun UltraSparc

Operating System : SunOS 5.6

Compiler : gnu, Optimisation Level = -O3.

Compaq DeskPro 6000

Processor : 266 MHz Intel Pentium II

Operating System : Windows 95

Compiler : CodeWarrior Professional Release 2, Optimisation Level = 3.

Generally, the maximum level of compiler optimisation was used for each platform. Optimisation level had to be reduced to -O2 for the SGI because -O3 produced code that resulted in errors. The same compiler (ie: CodeWarrior PR2) was used to produce the code for the Mac and PC platforms.

Execution times are in terms of "wall clock" times as reported by each computer (typically to the nearest 1/60 Sec) rather than CPU times. No other (major) tasks, ie: aside from the benchmarks, were running on any of the systems. However all systems were connected to the network with the usual set of background processes running.

(3) Results

Graph of Results

Graph of Results

Fig. 1 Performance (measured in Multiply-Accumulates per uSec) for FIR filters using 8-byte floating point (upper plot) and 4-byte integer arithmetic (lower plot) for various numbers of taps. The numbers of taps (ie: 2 to 256) are recorded on a log scale. The ordering of the legends reflect the ranking of performance for the 256 tap filter case.

 

Benchmark

Pwr Mac

604/150

Pwr Mac

604e/350

Pwr Mac

G3/300

SGI ONYX

SGI

O2

Sun

Ultra 60

Pentium

266

MathLib

4.3

1.8

2.5

4.1

5.7

5.1

4.3

#1

15.4

9.0

5.8

3.4

6.4

2.9

-

#2

5.1

2.0

2.4

2.7

5.1

2.8

5.2

#3 (inc disk)

7.4

4.0

3.9

3.2

5.9

3.3

-

#3 (no disk)

5.2

2.6

3.1

2.6

4.5

2.8

-

disk write

1.6

3.7

5.9

64.0

22.9

53.3

1.0

disk read

1.3

2.7

5.9

51.6

25.0

38.1

2.5

Table 1 The first 5 benchmarks are execution times in seconds, the last 2 benchmarks are disk transfer rates in MBytes/Sec. Best performance on each benchmark is highlighted in magenta, worst in blue. The #1 and #3 benchmarks could not be run on the Compaq 6000 (Pentium processor) due to big-endian/little endian issues.

(3.1) Raw Computational Speed

Results for raw computational speeds of FIR filters are plotted in Fig. 1. Notice that throughput improves with increasing filter length, presumably due to reductions loop overhead. In addition, different relative rankings are obtained for double precision as opposed to integer arithmetic.

It is possible for floating point to be faster than integer arithmetic due to "super-scalar" architectures - the integer unit is used for loop control while the floating point unit simultaneously performs the multiply/accumulates.

Further raw computational speed results are presented in Table 1. The MathLib benchmark makes repeated calls to transcendental functions contained within the math libraries such as sin(), atan(), sqrt(), & log.

The Power Mac platforms used the Motorola LibMotoSh math libraries. These were found to be significantly faster (eg: 17 % for the MathLib benchmark and 16 % for App #2) than the Apple math libraries contained in Mac OS 8.1, yet gave identical results (to 16 significant figures). The speed advantage of LibMotoSh is even greater when compared against earlier versions of the Mac OS. For example, 91 % faster for MathLib benchmark and 31 % for App #2 when compared with the math libraries contained in Mac OS 7.5.5.

The 350 MHz PowerPC 604e is clearly the speed champion for these types of benchmarks, being fastest in every category. The 300 MHz PowerPC 750 (G3) also showed consistently high performance. Results for the other processors were mixed. For example, the Pentium II was slowest for double precision arithmetic but third fastest for integer.

 

(3.2) Typical engineering applications

Applications #1, #2, & #3 are compute intensive data manipulation and signal processing tasks. Results are listed in Table 1. Relative to the raw computational results, the results for these kinds of benchmarks are even more mixed. The UNIX platforms tended to have better relative performance when disk access was involved (ie: App #2, App #3 (inc disk) benchmarks) reflecting the importance of disk I/O sub-systems to overall performance. However, when disk overhead was eliminated (ie: the App #2 & App #3 (no disk) results) the high end Mac platforms were competitive with the high end UNIX platforms.

Apps #1 & #3 benchmarks could not be run on the Pentium II due to big endian/little endian incompatibilities. However, another benchmark, App #2 that is similar to App #3 (no disk) was run on all platforms. The Pentium II proved slowest in this benchmark.

(3.3) Disk Performance

The disk performance benchmarks are the result of fwrite() and fread() calls on 16 Mbyte files. They were performed in order to better understand differences in the simulation benchmarks described above. While the files should be large enough to not be cached, it is unclear how meaningful the numbers are and how the operating system affects these results. However, it seems abundantly clear that the performance of the disk I/O sub systems on UNIX platforms far exceeds that of the Mac or Wintel platforms.

(3.4) Compilation Times

Of course, comparing compilation times between Mac and UNIX platforms is of dubious value due to the vastly different compiler technologies. However, it is clear from the limited number of compilation times reported in Table 2 that the Power Mac platforms are also competitive with the UNIX workstations for development work.

The G3 card in the 9600 was a PowerLogix "PowerForce" 250/250/1Meg. The L2 cache size and clock rates on this card can be easily varied via software. Examples of the effects of the L2 cache on the 9600/G3 compilation times are also listed in Table 2.

Platform

CPU

(MHz)

L2 Cache

(MHz)

L2 Cache

(kBytes)

Compile + Link

(Secs)

Mac 9600/604e

350

100

512

46.2

Mac 9600/G3

275

NA

0

56.3

Mac 9600/G3

275

138

512

41.5

Mac 9600/G3

275

138

1024

36.9

Mac 9600/G3

275

275

512

39.1

Mac 9600/G3

275

275

1024

34.4

Sun Ultra 60

300

?

?

47.1

Table 2 Compile + link times for maximum compiler optimisation. Unlike preceeding sections, the Power Mac results were obtained for the CodeWarrior 8 compiler.

These results suggest that the effects of increasing the L2 cache size and/or clock rates are moderate (at least for the compilation benchmark). Doubling the cache size results in about a 13 % increase in speed whereas doubling the cache speed results in only about a 6.7 % increase in compilation speed. Disabling the cache altogether (in which case the 512k of cache on the 9600 motherboard presumably takes over as the L2 cache) increases the compilation time by 63 %.

(4) Discussion

Experience shows that any single benchmark is not particularly useful in characterising relative compute performance. Widely different results can be obtained depending on the type of benchmark that is run.

This is why results for a variety of benchmarks are reported here. While even these do not eliminate variations due to type of compiler used, amount of RAM and cache available, disk fragmentation, version of the operating system, etc, some general trends emerge from these results:

• Raw compute power does not always predict performance on more general purpose tasks (such as the engineering applications and the compilation benchmarks). For example, the Power Mac 604e/350 did not perform particularly well on the App #1 benchmark, probably due to the heavy amount of bit twiddling and disk I/O involved with this benchmark. Conversely the Sun Ultra 60, which performed rather poorly in all the raw compute benchmarks, did much better on the engineering applications and compilation benchmarks.

• The PowerPC 604e can hold its own against the G3 (at least when both chips are operating at the maximum clocks rates currently available for the respective chips). The 604e has the edge for numeric intensive tasks whereas the G3 would appear to have the edge in more general purpose tasks.

• The high end Power Macs closely approach workstation level performance, but for considerably less cost. This suggests a cost effective upgrade path for older Power Mac 9500 & 9600 platforms. 300 MHz PowerPC upgrade cards are currently available for under $1100 with prices falling rapidly. Alternatively new Apple 300 Mhz Power Mac systems (which also have 1024k of 150 Mhz L2 cache used in these tests) would be expected to provide similar performance. In fact the Apple systems would be expected to be somewhat faster due to higher system bus speeds (66 Mhz compared with the 40 Mhz used with the 9600/G3 upgrade card combination).

• However, the Power Macs are sadly lacking in disk performance. The 9600/G3 used a 7200 RPM Atlas II (ultra-wide) connected to an Adaptec 2940UW (ultra-wide) PCI controller. Therefore the hardware should not have presented much of a bottleneck. Rather the Mac OS appears to be the culprit (or at least the path between the C library calls and the OS). It is hoped that this situation will be remedied in OS X.

This is not intended to be a definitive study. Rather the intention is to provide a few data points for information purposes Your interpretation of the significance of these results may differ from mine.



Guest Reviews are always welcome - contact news@xlr8yourmac.com.

Back to XLR8YOURMAC.COM


The views/opinions expressed on this page are the author's alone
& do not necessarily represent those of the site's publishers.

All brand or product names mentioned here are properties of their respective companies.

Users of this web site must read and are bound by the terms and conditions of use.