python - What should be the outcome of stemming a word with apostrophe? -


i'm using nltk.stem.porter.porterstemmer in python stems of words.

when stem of "women" , "women's" different results respectively: "women" , "women'". purposes need have both words having same stem.

in line of thought both words refer same idea/concept , pretty same word suffering transformation should have same stem.

why getting 2 different results? correct?

it's necessary tokenize text before lemmatizing.

without tokenization:

>>> nltk import word_tokenize >>> nltk.stem import wordnetlemmatizer >>> wnl = wordnetlemmatizer()  >>> [wnl.lemmatize(i) in "the woman's going home".split()] ['the', "woman's", 'going', 'home'] >>> [wnl.lemmatize(i) in "the women's home in london".split()] ['the', "women's", 'home', 'is', 'in', 'london'] 

with tokenization:

>>> [wnl.lemmatize(i) in word_tokenize("the woman's going home")] ['the', 'woman', "'s", 'going', 'home'] >>> [wnl.lemmatize(i) in word_tokenize("the women's home in london")] ['the', u'woman', "'s", 'home', 'is', 'in', 'london'] 

Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

javascript - Get parameter of GET request -

javascript - Twitter Bootstrap - how to add some more margin between tooltip popup and element -