python 2.7 - Removing Duplicate Tag Content Using BeautifulSoup -
i made script getting every h1 tag 76 pages of website. in process program copy specific line "current affairs january 2015" line present in every page. can edit code print 1 time ?
here's code:
from bs4 import beautifulsoup bs import urllib in range(2,77): url1="http://currentaffairs.gktoday.in/month/current-affairs-january-2015/"+"page/"+str(i) soup = bs(urllib.urlopen(url1)) link in soup.findall('h1'): print link.string
thanks in advance.
from bs4 import beautifulsoup bs import urllib in range(2,77): url1="http://currentaffairs.gktoday.in/month/current-affairs-january-2015/"+"page/"+str(i) soup = bs(urllib.urlopen(url1)) ulinks = soup.findall('h1') index, item in enumerate(ulinks): if == 2: print(item.string) if != 2: if index != 0: print(item.string)
Comments
Post a Comment