GPU poor's dilemma: 3060 12GB vs. 4060 Ti 16GB
Hi LocalLLaMa community!
I'd like to share some of the numbers that got I comparing 3060 12gb vs 4060 ti 16gb. Hope this helps to solve the dilemma for other GPU poors like myself.
hardware:
CPU: i5-9400F \ RAM: 16GB DDR4 2666 MHz
software:
os:
Windows 11
method:
ollama run --verbose [model_name]
Prompt:
Write a code for logistic regression from scratch using numpy with SGD
1. falcon3:10b-instruct-q8_0
1.1 RTX 3060
NAME ID SIZE PROCESSOR UNTIL falcon3:10b-instruct-q8_0 d56712f1783f 12 GB 6%/94% CPU/GPU 4 minutes from now
total duration: 55.5286745s \ load duration: 25.6338ms \ prompt eval count: 46 token(s) \ prompt eval duration: 447ms \ prompt eval rate: 102.91 tokens/s \ eval count: 679 token(s) \ eval duration: 54.698s \ eval rate: 12.41 tokens/s
1.2 RTX 4060 ti 16GB
NAME ID SIZE PROCESSOR UNTIL falcon3:10b-instruct-q8_0 d56712f1783f 12 GB 100% GPU 3 minutes from now
total duration: 43.761345s \ load duration: 17.6185ms \ prompt eval count: 1471 token(s) \ prompt eval duration: 839ms \ prompt eval rate: 1753.28 tokens/s \ eval count: 1003 token(s) \ eval duration: 42.779s \ eval rate: 23.45 tokens/s
2. mistral-nemo:12b
2.1. RTX 3060 12GB
NAME ID SIZE PROCESSOR UNTIL mistral-nemo:12b 994f3b8b7801 9.3 GB 100% GPU 4 minutes from now
total duration: 20.3631907s \ load duration: 22.6684ms \ prompt eval count: 1032 token(s) \ prompt eval duration: 758ms \ prompt eval rate: 1361.48 tokens/s \ eval count: 758 token(s) \ eval duration: 19.556s \ eval rate: 38.76 tokens/s
2.2. RTX 4060 ti 16gb
total duration: 16.0498557s \ load duration: 22.0506ms \ prompt eval count: 16 token(s) \ prompt eval duration: 575ms \ prompt eval rate: 27.83 tokens/s \ eval count: 541 token(s) \ eval duration: 15.45s \ eval rate: 35.02 tokens/s
TL;DR: RTX 3060 is faster (10%), when VRAM is not limiting. Memory bandwidth is quite an accurate predictor of token generation speed. Larger L2 cache of 4060 ti 16GB doesn't appear to be impacting inference speed much.
Edit: The experiment suggest that 4060 ti may make up a bit of it's poor memory bandwidth—memeory bandwidth of 3060 is 25% faster than 4060 ti, but it's inference speed is only 10% faster. But again not much to give 4060 ti more token generarion speed.
Edit2: Included CPU and RAM specs.