screen scraping - Logging into website with multiple pages using Python (urllib2 and cookielib) -
i writing script retrieve transaction information bank's home banking website use in personal mobile application.
the website laid out so:
https:/ /homebanking.purduefed.com/onlinebanking/login.aspx
-> enter username -> submit form ->
https:/ /homebanking.purduefed.com/onlinebanking/aop/password.aspx
-> enter password -> submit form ->
https:/ /homebanking.purduefed.com/onlinebanking/accountsummary.aspx
the problem having since there 2 separate pages make posts, first thought problem session information being lost. use urllib2's httpcookieprocessor store cookies , make , post requests website, , have found isn't issue.
my current code is:
import urllib import urllib2 import cookielib loginurl = 'https://homebanking.purduefed.com/onlinebanking/login.aspx' passwordurl = 'https://homebanking.purduefed.com/onlinebanking/aop/password.aspx' accturl = 'https://homebanking.purduefed.com/onlinebanking/accountsummary.aspx' loginname = 'sample_username' password = 'sample_password' values = {'loginname' : loginname, 'password' : password} class myhttpredirecthandler(urllib2.httpredirecthandler): def http_error_302(self, req, fp, code, msg, headers): print "cookie manipulation right here" return urllib2.httpredirecthandler.http_error_302(self, req, fp, code, msg, headers) http_error_301 = http_error_303 = http_error_307 = http_error_302 login_cred = urllib.urlencode(values) jar = cookielib.cookiejar() cookieprocessor = urllib2.httpcookieprocessor(jar) opener = urllib2.build_opener(myhttpredirecthandler, cookieprocessor) urllib2.install_opener(opener) opener.addheaders = [('user-agent', 'mozilla/5.0 (windows; u; windows nt 5.1; de; rv:1.9.1.5) gecko/20091102 firefox/3.5.5')] opener.addheader = [('referer', loginurl)] response = opener.open(loginurl, login_cred) reqpage = opener.open(passwordurl) opener.addheader = [('referer', passwordurl)] response2 = opener.open(passwordurl, login_cred) reqpage2 = opener.open(accturl) content = reqpage2.read()
currently, script makes passwordurl page, username posted correctly, when post made passwordurl page, instead of going accturl, redirected login page (the redirect location if accturl opened without proper or lack of credentials).
any thoughts or comments on how move forward appreciated @ point!
Comments
Post a Comment