If there is a problem that is out of a large language model's training distribution and requires greater context than the maximum window, that problem would be fundamentally unsolvable by LLMs. This is a formal deductive argument. Hopefully, its validity is clear, though I assert neither its soundness nor its practical utility.
The argument runs like this:
- Every (implementable) LLM has a finite context window of length $N$.[1]
- For any LLM to solve a problem, the minimal representation of that problem[2] must fit within the context window.
- For every minimal representation of length $N$, there is always another problem that requires a representation of length longer than $N$.
- For implementable LLM there is always at least one problem that does not fit in its context window.
- So, for any implementable LLM there exist problems it cannot solve.
- I do not attempt to prove the finiteness of implementable LLMs here, but it is a plausible assumption given computational and physical constraints. ↩︎
- By "solving", I mean producing the desired output given some minimal representation or set of representations. For 'problem', we might substitute the term 'minimal problem representation' to denote the minimal representation of one of the problem's steps that must fit within the LLM's context window in order for the LLM to infer the solution. ↩︎