That fundamentally misunderstands the problem in multiple ways:
* this is still during lexing, not yet to parsing
* there are multiple valid token sequences that vary only with a single character at the start of the file. This is very common with Python multi-line strings in particular, since they are widely used as docstrings.
An error-cost-minimizing dynamic programming parser could do this.