Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> the token with id 24912 for gpt-4o to be precise

How do you find this out?



GPT-4o uses BPE (byte pair encoding). They released `tiktoken` which allows you to count tokens in strings in python.

    pip install tiktoken
    >>> import tiktoken
    >>> encoding = tiktoken.encoding_for_model("gpt-4o")
    >>> print(encoding.encode("hello marcellus"))
    [24912, 2674, 10936, 385]


OpenAI provides a tokenizer tool: https://platform.openai.com/tokenizer




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: