python - What should be the outcome of stemming a word with apostrophe? -
i'm using nltk.stem.porter.porterstemmer
in python stems of words.
when stem of "women" , "women's" different results respectively: "women" , "women'". purposes need have both words having same stem.
in line of thought both words refer same idea/concept , pretty same word suffering transformation should have same stem.
why getting 2 different results? correct?
it's necessary tokenize text before lemmatizing.
without tokenization:
>>> nltk import word_tokenize >>> nltk.stem import wordnetlemmatizer >>> wnl = wordnetlemmatizer() >>> [wnl.lemmatize(i) in "the woman's going home".split()] ['the', "woman's", 'going', 'home'] >>> [wnl.lemmatize(i) in "the women's home in london".split()] ['the', "women's", 'home', 'is', 'in', 'london']
with tokenization:
>>> [wnl.lemmatize(i) in word_tokenize("the woman's going home")] ['the', 'woman', "'s", 'going', 'home'] >>> [wnl.lemmatize(i) in word_tokenize("the women's home in london")] ['the', u'woman', "'s", 'home', 'is', 'in', 'london']
Comments
Post a Comment