Answer generation is very fast, usually within 2 seconds or so. The answers are short, concise and cited the document it got it’s answer from, which I appreciated. It has a tendency to fixate on trying to provide the best answer it could find within one source, but ignore another source that actually contains a much better and correct answer.

Source: Nvidia’s “Chat With RTX” is a ChatGPT-style app that runs on your own GPU

It seems like it will be common to run an LLM/GPT on local hardware in the future. I’m curious to see where this goes.

No wonder Nvidia’s valuation is through the roof right now.