Number Six is my newer AI server I built about a month ago. I have done a lot of testing and tweaking in that time, but I finally got around to do a more thorough power efficiency test.
The server mainly revolves around the two Nvidia RTX 6000 Pro 600W cards. My initial testing showed only around 4% loss in performance by implementing a power limit of 300W per card. This effectively cut the power ceiling of the cards by 50%, and yielded around 43% power savings. A trade I was more than happy to do.
After some discussion on Twitter, I decided to spend a few and do more thorough testing now I have had time to tweak performance and get my desired model running well.
My daily driver is GLM Air 4.5 FP8 until they get around to releasing 4.6 they promised months ago. I typically see around 95 tokens/sec when just asking a simple question and as much as 195 tokens/sec when doing more complex and agentic tasks.
My testing is for 250W, 300W, 360W, and 600W (stock).
250W
Input token throughput (tok/s): 1071.01
Output token throughput (tok/s): 525.69
Total token throughput (tok/s): 1596.71
300W
Input token throughput (tok/s): 1216.33
Output token throughput (tok/s): 597.02
Total token throughput (tok/s): 1813.35
360W
Request throughput (req/s): 2.46
Input token throughput (tok/s): 1263.23
Output token throughput (tok/s): 620.04
Total token throughput (tok/s): 1883.27
600W
Input token throughput (tok/s): 1274.46
Output token throughput (tok/s): 625.55
Total token throughput (tok/s): 1900.02
These tokens/sec seem high, but this is simulating a multi user workload which will perform considerably better than a single user making one request.
Peak performance is of course at 600W for 625.55 tokens/second with the lowest performance at 250W giving 525.69 tokens/second. When looking at everything, 300W is a clear winner with 597.02 tokens/second.
If you look at the actual power draw, this gets really interesting though.
250W actually uses more power overall, the tests take longer but actually has peak spikes higher than 300W. If you look closely at the graph you can see the 250W test hits as high as 862W where as the 300W test peaked at 821W. The average wattage is fairly similar between these two tests.
Performance & Efficiency Comparison
| Per-card limit | System power (measured) | Total tok/s | % of max throughput | Output tok/s | Median TTFT | Median ITL | Tokens per Watt | Efficiency vs 600W |
|---|---|---|---|---|---|---|---|---|
| 250 W | 814 W | 1 597 | 84.0 % | 526 | 229.8 s | 20.68 ms | 1.963 | +27 % |
| 300 W | 816 W | 1 813 | 95.4 % | 597 | 201.9 s | 17.79 ms | 2.223 | +44 % |
| 360 W | 990 W | 1 883 | 99.1 % | 620 | 195.5 s | 17.33 ms | 1.902 | +23 % |
| 600 W (max) | 1 229 W | 1 900 | 100 % | 626 | 196.6 s | 17.27 ms | 1.546 | baseline |
Summary – vs full 600 W mode
| Per-card limit | System power | Power saved vs max | Throughput loss vs max | ||
|---|---|---|---|---|---|
| 300 W per card | 816 W | –34 % | –4.6 % | +44 % | |
| 360 W per card | 990 W | –19 % | –0.9 % | +23 % | |
| 250 W per card | 814 W | –34 % | –16 % | +27 % | |
| 600 W per card (max) | 1 229 W | 0 % | 0 % | baseline |
In reality though, the numbers are even more in favor of 300W, as I was cherry picking the peak wattage specifically. It is interesting that 360W is where you get almost no loss in performance with 99.1% throughput, but with minimal power savings.