Tokenizing Words and Sentences with NLTK

Posted by WRW on February 8, 2019

Tokenizing Words and Sentences with NLTK

from nltk.tokenize import sent_tokenize, word_tokenize

EXAMPLE_TEXT = “Hello Mr. Smith, how are you doing today? The weather is great, and Python is awesome. The sky is pinkish-blue. You shouldn’t eat cardboard.”

print(sent_tokenize(EXAMPLE_TEXT))

输出为

[‘Hello Mr. Smith, how are you doing today?’, ‘The weather is great, and Python is awesome.’, ‘The sky is pinkish-blue.’, “You shouldn’t eat cardboard.”]

print(word_tokenize(EXAMPLE_TEXT))

输出为

[‘Hello’, ‘Mr.’, ‘Smith’, ‘,’, ‘how’, ‘are’, ‘you’, ‘doing’, ‘today’, ‘?’, ‘The’, ‘weather’, ‘is’, ‘great’, ‘,’, ‘and’, ‘Python’, ‘is’, ‘awesome’, ‘.’, ‘The’, ‘sky’, ‘is’, ‘pinkish-blue’, ‘.’, ‘You’, ‘should’, “n’t”, ‘eat’, ‘cardboard’, ‘.’]