Click for Mini Pro Dual Drive!
Keep this site growing - Please visit my Sponsors
The Source for Mac Performance News and Reviews
Mac, PC, & UNIX workstation comparisons
By Brent Robinson 26-Jun-98
(1) Introduction
Four types of benchmarks were performed :
• Tests of raw number-crunching power. Specifically, execution times for FIR filters of various lengths and for different precisions (doubles and longs). In addition, a tight loop exercising math library routines, "MathLib" was executed.
• Tests of typical engineering applications. Specifically, an application (App #1) to manipulate large data files and two applications (App #2 & App #3) that simulate signal processing hardware.
• Disk throughputs.
• Compilation times for App #3 using full compiler optimisation.
(2) Platforms Studied
The same benchmarks were run on 7 different platforms. These were :
• Power Mac 9500/150
Processor : 150 MHz PowerPC 604
Operating System : MacOS 8.0, L2 Cache = 512 k, Disk Cache = 7680k
Compiler : CodeWarrior Professional Release 2, Optimisation Level = 3.
• Power Mac 9600/350
Processor : 350 MHz PowerPC 604e
Operating System : MacOS 8.1, L2 Cache = 512 k, Disk Cache = 7680k
Compiler : CodeWarrior Professional Release 2, Optimisation Level = 3.
• Power Mac 9600/G3
Processor : 300 MHz PowerPC 750 ("G3"). Power Logix upgrade card in 9600.
Operating System : MacOS 8.1, L2 Cache = 1024 k (150 MHz), Disk Cache = 7680k
Compiler : CodeWarrior Professional Release 2, Optimisation Level = 3.
• SGI ONYX
Processor : 195 MHz MIPS R10000
Operating System : IRIX Release 6.2
Compiler : SGI, Optimisation Level = -O3 (-O2 for App #3).
• SGI O2
Processor : 180 MHz MIPS R5000
Operating System : IRIX Release 6.3
Compiler : SGI, Optimisation Level = -O3 (-O2 for App #3).
• Sun Ultra 60
Processor : 300 MHz Sun UltraSparc
Operating System : SunOS 5.6
Compiler : gnu, Optimisation Level = -O3.
• Compaq DeskPro 6000
Processor : 266 MHz Intel Pentium II
Operating System : Windows 95
Compiler : CodeWarrior Professional Release 2, Optimisation Level = 3.
Generally, the maximum level of compiler optimisation was used for each platform. Optimisation level had to be reduced to -O2 for the SGI because -O3 produced code that resulted in errors. The same compiler (ie: CodeWarrior PR2) was used to produce the code for the Mac and PC platforms.
Execution times are in terms of "wall clock" times as reported by each computer (typically to the nearest 1/60 Sec) rather than CPU times. No other (major) tasks, ie: aside from the benchmarks, were running on any of the systems. However all systems were connected to the network with the usual set of background processes running.
(3) Results
Fig. 1 Performance (measured in Multiply-Accumulates per uSec) for FIR filters using 8-byte floating point (upper plot) and 4-byte integer arithmetic (lower plot) for various numbers of taps. The numbers of taps (ie: 2 to 256) are recorded on a log scale. The ordering of the legends reflect the ranking of performance for the 256 tap filter case.
Benchmark
Pwr Mac
604/150
Pwr Mac
604e/350
Pwr Mac
G3/300
SGI ONYX
SGI
O2
Sun
Ultra 60
Pentium
266
MathLib
4.3
1.8
2.5
4.1
5.7
5.1
4.3
#1
15.4
9.0
5.8
3.4
6.4
2.9
-
#2
5.1
2.0
2.4
2.7
5.1
2.8
5.2
#3 (inc disk)
7.4
4.0
3.9
3.2
5.9
3.3
-
#3 (no disk)
5.2
2.6
3.1
2.6
4.5
2.8
-
disk write
1.6
3.7
5.9
64.0
22.9
53.3
1.0
disk read
1.3
2.7
5.9
51.6
25.0
38.1
2.5
magenta, worst in blue. The #1 and #3 benchmarks could not be run on the Compaq 6000 (Pentium processor) due to big-endian/little endian issues. Table 1 The first 5 benchmarks are execution times in seconds, the last 2 benchmarks are disk transfer rates in MBytes/Sec. Best performance on each benchmark is highlighted in
(3.1) Raw Computational Speed
Results for raw computational speeds of FIR filters are plotted in Fig. 1. Notice that throughput improves with increasing filter length, presumably due to reductions loop overhead. In addition, different relative rankings are obtained for double precision as opposed to integer arithmetic.
It is possible for floating point to be faster than integer arithmetic due to "super-scalar" architectures - the integer unit is used for loop control while the floating point unit simultaneously performs the multiply/accumulates.
Further raw computational speed results are presented in Table 1. The MathLib benchmark makes repeated calls to transcendental functions contained within the math libraries such as sin(), atan(), sqrt(), & log.
The Power Mac platforms used the Motorola LibMotoSh math libraries. These were found to be significantly faster (eg: 17 % for the MathLib benchmark and 16 % for App #2) than the Apple math libraries contained in Mac OS 8.1, yet gave identical results (to 16 significant figures). The speed advantage of LibMotoSh is even greater when compared against earlier versions of the Mac OS. For example, 91 % faster for MathLib benchmark and 31 % for App #2 when compared with the math libraries contained in Mac OS 7.5.5.
The 350 MHz PowerPC 604e is clearly the speed champion for these types of benchmarks, being fastest in every category. The 300 MHz PowerPC 750 (G3) also showed consistently high performance. Results for the other processors were mixed. For example, the Pentium II was slowest for double precision arithmetic but third fastest for integer.
(3.2) Typical engineering applications
Applications #1, #2, & #3 are compute intensive data manipulation and signal processing tasks. Results are listed in Table 1. Relative to the raw computational results, the results for these kinds of benchmarks are even more mixed. The UNIX platforms tended to have better relative performance when disk access was involved (ie: App #2, App #3 (inc disk) benchmarks) reflecting the importance of disk I/O sub-systems to overall performance. However, when disk overhead was eliminated (ie: the App #2 & App #3 (no disk) results) the high end Mac platforms were competitive with the high end UNIX platforms.
Apps #1 & #3 benchmarks could not be run on the Pentium II due to big endian/little endian incompatibilities. However, another benchmark, App #2 that is similar to App #3 (no disk) was run on all platforms. The Pentium II proved slowest in this benchmark.
(3.3) Disk Performance
The disk performance benchmarks are the result of fwrite() and fread() calls on 16 Mbyte files. They were performed in order to better understand differences in the simulation benchmarks described above. While the files should be large enough to not be cached, it is unclear how meaningful the numbers are and how the operating system affects these results. However, it seems abundantly clear that the performance of the disk I/O sub systems on UNIX platforms far exceeds that of the Mac or Wintel platforms.
(3.4) Compilation Times
Of course, comparing compilation times between Mac and UNIX platforms is of dubious value due to the vastly different compiler technologies. However, it is clear from the limited number of compilation times reported in Table 2 that the Power Mac platforms are also competitive with the UNIX workstations for development work.
The G3 card in the 9600 was a PowerLogix "PowerForce" 250/250/1Meg. The L2 cache size and clock rates on this card can be easily varied via software. Examples of the effects of the L2 cache on the 9600/G3 compilation times are also listed in Table 2.
Platform
CPU
(MHz)
L2 Cache
(MHz)
L2 Cache
(kBytes)
Compile + Link
(Secs)
Mac 9600/604e
350
100
512
46.2
Mac 9600/G3
275
NA
0
56.3
Mac 9600/G3
275
138
512
41.5
Mac 9600/G3
275
138
1024
36.9
Mac 9600/G3
275
275
512
39.1
Mac 9600/G3
275
275
1024
34.4
Sun Ultra 60
300
?
?
47.1
Table 2 Compile + link times for maximum compiler optimisation. Unlike preceeding sections, the Power Mac results were obtained for the CodeWarrior 8 compiler.
These results suggest that the effects of increasing the L2 cache size and/or clock rates are moderate (at least for the compilation benchmark). Doubling the cache size results in about a 13 % increase in speed whereas doubling the cache speed results in only about a 6.7 % increase in compilation speed. Disabling the cache altogether (in which case the 512k of cache on the 9600 motherboard presumably takes over as the L2 cache) increases the compilation time by 63 %.
(4) Discussion
Experience shows that any single benchmark is not particularly useful in characterising relative compute performance. Widely different results can be obtained depending on the type of benchmark that is run.
This is why results for a variety of benchmarks are reported here. While even these do not eliminate variations due to type of compiler used, amount of RAM and cache available, disk fragmentation, version of the operating system, etc, some general trends emerge from these results:
• Raw compute power does not always predict performance on more general purpose tasks (such as the engineering applications and the compilation benchmarks). For example, the Power Mac 604e/350 did not perform particularly well on the App #1 benchmark, probably due to the heavy amount of bit twiddling and disk I/O involved with this benchmark. Conversely the Sun Ultra 60, which performed rather poorly in all the raw compute benchmarks, did much better on the engineering applications and compilation benchmarks.
• The PowerPC 604e can hold its own against the G3 (at least when both chips are operating at the maximum clocks rates currently available for the respective chips). The 604e has the edge for numeric intensive tasks whereas the G3 would appear to have the edge in more general purpose tasks.
• The high end Power Macs closely approach workstation level performance, but for considerably less cost. This suggests a cost effective upgrade path for older Power Mac 9500 & 9600 platforms. 300 MHz PowerPC upgrade cards are currently available for under $1100 with prices falling rapidly. Alternatively new Apple 300 Mhz Power Mac systems (which also have 1024k of 150 Mhz L2 cache used in these tests) would be expected to provide similar performance. In fact the Apple systems would be expected to be somewhat faster due to higher system bus speeds (66 Mhz compared with the 40 Mhz used with the 9600/G3 upgrade card combination).
• However, the Power Macs are sadly lacking in disk performance. The 9600/G3 used a 7200 RPM Atlas II (ultra-wide) connected to an Adaptec 2940UW (ultra-wide) PCI controller. Therefore the hardware should not have presented much of a bottleneck. Rather the Mac OS appears to be the culprit (or at least the path between the C library calls and the OS). It is hoped that this situation will be remedied in OS X.
This is not intended to be a definitive study. Rather the intention is to provide a few data points for information purposes Your interpretation of the significance of these results may differ from mine.
Guest Reviews are always welcome - contact news@xlr8yourmac.com.Back to XLR8YOURMAC.COM
The views/opinions expressed on this page are the author's alone
& do not necessarily represent those of the site's publishers.All brand or product names mentioned here are properties of their respective companies.
Users of this web site must read and are bound by the terms and conditions of use.