screen scraping - Logging into website with multiple pages using Python (urllib2 and cookielib) -


i writing script retrieve transaction information bank's home banking website use in personal mobile application.

the website laid out so:

https:/ /homebanking.purduefed.com/onlinebanking/login.aspx

-> enter username -> submit form ->

https:/ /homebanking.purduefed.com/onlinebanking/aop/password.aspx

-> enter password -> submit form ->

https:/ /homebanking.purduefed.com/onlinebanking/accountsummary.aspx

the problem having since there 2 separate pages make posts, first thought problem session information being lost. use urllib2's httpcookieprocessor store cookies , make , post requests website, , have found isn't issue.

my current code is:

import urllib import urllib2 import cookielib  loginurl = 'https://homebanking.purduefed.com/onlinebanking/login.aspx' passwordurl = 'https://homebanking.purduefed.com/onlinebanking/aop/password.aspx' accturl = 'https://homebanking.purduefed.com/onlinebanking/accountsummary.aspx'  loginname = 'sample_username' password = 'sample_password'  values = {'loginname' : loginname,       'password' : password}  class myhttpredirecthandler(urllib2.httpredirecthandler):     def http_error_302(self, req, fp, code, msg, headers):         print "cookie manipulation right here"         return urllib2.httpredirecthandler.http_error_302(self, req, fp, code, msg, headers)      http_error_301 = http_error_303 = http_error_307 = http_error_302  login_cred = urllib.urlencode(values)  jar = cookielib.cookiejar() cookieprocessor = urllib2.httpcookieprocessor(jar)  opener = urllib2.build_opener(myhttpredirecthandler, cookieprocessor) urllib2.install_opener(opener) opener.addheaders = [('user-agent', 'mozilla/5.0 (windows; u; windows nt 5.1; de; rv:1.9.1.5) gecko/20091102 firefox/3.5.5')]  opener.addheader = [('referer', loginurl)] response = opener.open(loginurl, login_cred)  reqpage = opener.open(passwordurl)  opener.addheader = [('referer', passwordurl)] response2 = opener.open(passwordurl, login_cred)  reqpage2 = opener.open(accturl)  content = reqpage2.read() 

currently, script makes passwordurl page, username posted correctly, when post made passwordurl page, instead of going accturl, redirected login page (the redirect location if accturl opened without proper or lack of credentials).

any thoughts or comments on how move forward appreciated @ point!


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -