Let’s revise Natural Language Processing — 1
--
(1) What is a token? What is tokenisation? And n-gram tokenisation.
In natural language processing (NLP), a token refers to a unit of text that is treated as a single meaningful entity. It can be a word, a sentence, or even smaller components, such as characters or subwords. Tokenization is the process of splitting a text into these individual tokens.