python - How to find words that are not the same between two text files -
i have 2 text documents, contain of same words, there few exceptions. how find words in document2 not anywhere in document1 , print them out? example:
document1: "hello there how you"
document2: "hi how today john"
desired output: "hi today john"
edit: print words present in document2 , not found anywhere in document1. don't want print words same between them.
i created code think finds matches between 2 text files, not want do:
doc1 = open("k:\system files\desktop\document1.txt", "r+") doc2 = open("k:\system files\desktop\document2.txt", "r+") list1 = [] list2 = [] in doc1: #removes new line after each word = i[:-1] list1.append(i) in doc2: = i[:-1] list2.append(i) in list1: j in list2: if == j: print(i)
if not worried order of words, use sets accomplish follows:
import re def get_words(filename): open(filename, 'r') f_input: return set(w.lower() w in re.findall(r'(\w+)', f_input.read())) words1 = get_words('document1.txt') words2 = get_words('document2.txt') print words2 - words1
this display:
set(['john', 'hi', 'today'])
using -
on 2 sets has effect of giving difference between 2 sets.
Comments
Post a Comment