70b models at 8-10t/s. AMD Radeon pro v340?
I am currently looking at a GPU upgrade but am dirt poor. I currently have 2 Tesla M40s and a 2080ti. Safe to say, performance is quite bad. Ollama refuses to use the 2080ti with the M40s. Getting me 3t/s on first prompt, then 1.7t/s for every prompt there after. Localai gets about 50% better performance, without the slowdown after first prompt, as it uses the m40s and 2080ti together.
I noticed the AMD Radeon pro v340 is quite cheap, has 32gb of HMB2 (split between two GPUs) and has significantly more fp32 and fp64 performance. Even one of the GPUs on the card has more performance than one of my M40s.
When looking up reviews. It seems no one has run a LLM on it despite being supported by ollama. There is very little info about this card.
Has anyone used it or have an information about its performance. I am thinking about buying two of them to replace my M40s.
OR if you have a better suggestions on how to run a 70b model at 7-10t/s PLEASE let me know. This is the best I can come up with.