Thanks for the feedback. You raise great points and this was the reason why we wrote this post, so that we can hear from people where the actual problem lies.
On a related note, this sort of explains why our model is struggling to fit on 500 hours of our current dataset (even on the training set). Even so, the current state of automatic translation for Indian Sign Language is that, in-the-wild, even individual words cannot be detected very well. We hope that what we are building might at least improve the state-of-the-art there.
> It's more of a bad and broken transliteration that if you struggle to think about you can parse out and understand.
Can you elaborate a bit more on this. Do you think if we make a system for bad/broken transliteration and funnel it through ChatGPT, it might give meaningful results? That is ChatGPT might be able to correct for errors as it is a strong language model.
> Do you think if we make a system for bad/broken transliteration and funnel it through ChatGPT, it might give meaningful results?
No, because ChatGPT's training data has practically no way of knowing what a real sign language looks like, since there's no real written form of any sign language and ChatGPT learned its languages from writing.
Sincerely: I think it's awesome that you're taking something like this on, and even better that you're open to learning about it and correcting flawed assumptions. Others have already noted some holes in your understanding of sign, so I'll also just note that I think a solid brush up on the fundamentals of what language models are and aren't is called for—they're not linguistic fairy dust you can sprinkle on a language problem to make it fly. They're statistical machines that can predict likely results based on their training corpus, which corpus is more or less all the text on the internet.
I'm afraid I'm not in a good position to recommend beginner resources (I learned this stuff in university back before it really took off), but I've heard good things about Andrej Karpathy's YouTube channel.
I think you think it's a magic box. There's not actually such thing as a "strong language model", not in the way you're using the concept.
> We hope that what we are building might at least improve the state-of-the-art there.
Do you have any theoretical arguments for how and why it would improve it? If not, my concern is that you're just sucking the air out of the room. (Research into "throw a large language model at the problem" doesn't tend to produce any insight that could be used by other approaches, and doesn't tend to work, but it does funnel a lot of grant funding into cloud providers' pockets.)
You're mixing up cause and effect. The transformer architecture was invented for machine translation – and it's pretty good at it! (Very far from human-level, but still mostly comprehensible, and a significant improvement over the state-of-the-art at time of first publication.) But we shouldn't treat this as anything more than "special-purpose ML architecture achieves decent results".
The GPT architecture, using transformers to do iterated predictive text, is a modern version of the Markov bot. It's truly awful at translation, when "prompted" to do so. (Perhaps surprisingly so, until you step back, look at the training data, and look at the information flow: the conditional probability of the next token isn't mostly coming from the source text.)
I haven't read that paper yet, but it looks interesting. From the abstract, it looks like one of those perfectly-valid papers that laypeople think is making a stronger claim than it is. This paragraph supports that:
> Note that these models are not intended to accurately capture natural language. Rather, they illustrate how our theory can be used to study the effect of language similarity and complexity on data requirements for UMT.
It’s true that the Transformer architecture was developed for seq2seq MT, but you can get similar performance with Mamba or RWKV or other new non-Transformer architectures. It seems that what is important is having a strong general sequence-learning architecture plus tons of data.
> The GPT architecture, using transformers to do iterated predictive text, is a modern version of the Markov bot.
The Markov nature only matters if the text falls outside the context window.
> Perhaps surprisingly so, until you step back, look at the training data, and look at the information flow: the conditional probability of the next token isn't mostly coming from the source text.
I’m not sure what you’re getting at here. If it’s that you can predict the next token in many cases without looking at the source language, then that’s also true for traditional encoder-decoder architectures, so it’s not a problem unique to prompting. Or are you getting at problems arising from teacher-forcing?
Basically the question was how an LM could possibly help translation, and the answer is that it gives you a strong prior for the decoder. That’s also the basic idea in the theoretical UMT paper: you are trying to find a function from source to target language that produces a sensible distribution as defined by an LM.
On a related note, this sort of explains why our model is struggling to fit on 500 hours of our current dataset (even on the training set). Even so, the current state of automatic translation for Indian Sign Language is that, in-the-wild, even individual words cannot be detected very well. We hope that what we are building might at least improve the state-of-the-art there.
> It's more of a bad and broken transliteration that if you struggle to think about you can parse out and understand.
Can you elaborate a bit more on this. Do you think if we make a system for bad/broken transliteration and funnel it through ChatGPT, it might give meaningful results? That is ChatGPT might be able to correct for errors as it is a strong language model.