Hello there~ This is Minglu, a 1st-year master-degree student who still could not forget the impact when we heard our prof was planning to buy an A100 GPU!
Finally, welcome to A100! It was very shining when we were taking it out from the box.
We installed A100 into a server used for RTX 2080Ti temporarily.
Okay, there, I believe you must also be interested in its performance! Let’s figure it out using CIFAR-100 and CNN.
The training time for 20 and 100 epochs are showing in the left figure. Obviously, it is definitely faster than the CPU.
Though it is the same as what I expected that A100 is faster than 2080Ti for 20 epochs, when it comes to 100 epochs, A100 becomes slower instead.
To investigate the reason, we checked the execution time of each step. The results are showing in the right figure. At first few tens of epochs, A100 is fast. But it gets slowing down later.
To our surprise, the reason is because of the temperature 🙁
RTX2080Ti’s temperature was kept around 26 to 40 degrees, while A100’s temperature exceeded 80 degrees in a moment from 34 degrees. (AMAZING!!😨)
In conclusion:
Though A100’s performance is very excellent, we could not fully use it without a good cooling environment. 🙁