[Python]一次爬虫模拟登录的尝试

0x00 前言

最近想学一下爬虫的进阶用法,比如模拟登录之类的,在网上找的教程里面所用的测试网站现在已经加了反爬虫机制,目前我还太菜,还过不了反爬机制。但是忽然想到了学校还有个垃圾强智系统,漏洞百出,应该也不会加反爬机制,所以拿强智练练手。

0x01 用浏览器初探

信息搜集

这里只需用Chrome浏览器的开发者工具(F12),也不需要用BP之类的抓包工具

打开网站,强智-山东科技大学,打开开发者工具–Network–勾上Preserve log,然后用自己的账号登录。

然后可以看到有挺多信息的,

这里有几个关键的信息:

1
2
3
4
5
6
7
8
Request URL: http://jwgl.sdust.edu.cn/Logon.do?method=logon
Request Method: POST
Origin: http://jwgl.sdust.edu.cn
Referer: http://jwgl.sdust.edu.cn/
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.69 Safari/537.36

encoded: *******************************************************
RANDOMCODE: cn3v

这些信息向我们指明了真实请求的URL请求方法OriginRefererUser-AgentPost所传参数

陷入困境

这里post所传参数与我想象的有所不同,我以为会传入用户名、密码、验证码,但是实际上只有一个encoded和验证码。

盲猜可能通过某种加密方法将用户名和密码加密成了encoded,因为我从这个encoded里面看到了我密码的一部分,而且密码像插队一样插在encoded中。

多次重新登录并查看这个encoded发现这个值是会变的,也就是说加密算法应该是有一个动态的参照,或者是随机数种子(结合一些web知识,随机数种子不太现实)?

那么该如何通过用户名和密码来得到这个encoded?

柳暗花明

困扰了一会以后,我看了一下网页源码,我看到了一段令我惊喜的代码。

这里写了encoded的生成算法,果然是由用户名和密码以及dataStr计算得来。

dataStr是通过/Logon.do?method=logon&flag=sess生成的

dataStr生成网址

刷新几次可以看到每次生成的结果都不同。

把JS代码转化成Python代码即可得到encoded。

0x02 用爬虫初探

获取验证码

从网页源码里面可以看到,验证码是在这个网站获取的

验证码网站

那么我们同样可以去这个网站获取验证码,由于我现在还没法很完美的实现验证码的自动识别,这里采用将二维码show出来手工输入的办法。

Session

由于验证码生成网站和dataStr生成网站每次刷新,里面内容都会不同

那么我们该如何实现爬虫传的参数 与 我们生成的dataStr和验证码的统一?

这里通过创建session对象来实现统一。

如何测试是否登录成功

可以打印一下post以后的内容,看看是不是和用浏览器登录进去的一样,不能通过看post的state_code。

代码(demo)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# -*- coding: UTF-8 -*-
import requests
import re
import os
from PIL import Image

class Spider():
def __init__(self):
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.69 Safari/537.36',
'Referer': 'http://jwgl.sdust.edu.cn/'
}
self.seesion = requests.session()
print '[+] Session初始化完成!'
self.userAccount = raw_input('请输入学号:')
self.userPassword = raw_input('请输入密码:')
self.dataStr = ''
self.safeCode = ''
self.encoded = ''
def get_dataStr(self):
dataStrUrl = 'http://jwgl.sdust.edu.cn/Logon.do?method=logon&flag=sess'
self.dataStr = self.seesion.get(url=dataStrUrl, headers=self.headers).content
print '[+] dataStr 已获取!'

def get_safeCode(self):
safeCodeUrl = 'http://jwgl.sdust.edu.cn/verifycode.servlet'
safeCodeCont = self.seesion.get(url=safeCodeUrl, headers=self.headers).content
f = open('safeCode.jpg','wb')
f.write(safeCodeCont)
f.close()
img = Image.open('safeCode.jpg')
img.show()
safeCode = raw_input('请输入验证码:')
os.remove('safeCode.jpg')
self.safeCode = safeCode
print '[+] 验证码已获取!'

def get_encoded(self):
scode = self.dataStr.split('#')[0]
sxh = self.dataStr.split('#')[1]
code = self.userAccount + '%%%' + self.userPassword
encode = ''
i = 0
while i < len(code):
if i < 20:
encode += code[i:i + 1] + scode[0:int(sxh[i:i + 1])]
scode = scode[int(sxh[i:i + 1]):len(scode)]
else:
encode += code[i:len(code)]
i = len(code)
i += 1
self.encoded = encode
print '[+] encoded已获取!'

def login(self):
loginUrl = 'http://jwgl.sdust.edu.cn/Logon.do?method=logon'
self.get_dataStr()
self.get_safeCode()
self.get_encoded()
login_data = {
'encoded': self.encoded,
'RANDOMCODE': self.safeCode
}
print self.seesion.post(url = loginUrl,data = login_data,headers = self.headers).content
if __name__ == '__main__':
test = Spider()
test.login()

放一张表明登录成功的截图吧。

代码2(demo)

4月3号更新了一些小功能,做了一些优化,但程序依然还只是个雏形。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
# -*- coding: UTF-8 -*-
import requests
import webbrowser
import re
import os
import time
from PIL import Image

__Author__ = 'LiuLian'
# 测试环境Python2.7.16
# Create for fun!

class Spider():
def __init__(self):
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.69 Safari/537.36',
'Referer': 'http://jwgl.sdust.edu.cn/'
}
self.seesion = requests.session()
print('[+] Session初始化完成!')
self.userAccount = raw_input('请输入学号:')
self.userPassword = raw_input('请输入密码:')
self.dataStr = ''
self.safeCode = ''
self.encoded = ''

def get_dataStr(self):
dataStrUrl = 'http://jwgl.sdust.edu.cn/Logon.do?method=logon&flag=sess'
self.dataStr = self.seesion.get(url=dataStrUrl, headers=self.headers).content
print('[+] dataStr 已获取!')

def get_safeCode(self):
safeCodeUrl = 'http://jwgl.sdust.edu.cn/verifycode.servlet'
safeCodeCont = self.seesion.get(url=safeCodeUrl, headers=self.headers).content
f = open('safeCode.jpg', 'wb')
f.write(safeCodeCont)
f.close()
img = Image.open('safeCode.jpg')
img.show()
safeCode = raw_input('请输入验证码:')
os.remove('safeCode.jpg')
self.safeCode = safeCode
print('[+] 验证码已获取!')

def get_encoded(self):
scode = self.dataStr.split('#')[0]
sxh = self.dataStr.split('#')[1]
code = self.userAccount + '%%%' + self.userPassword
encode = ''
i = 0
while i < len(code):
if i < 20:
encode += code[i:i + 1] + scode[0:int(sxh[i:i + 1])]
scode = scode[int(sxh[i:i + 1]):len(scode)]
else:
encode += code[i:len(code)]
i = len(code)
i += 1
self.encoded = encode
print('[+] encoded已获取!')

def login(self):
loginUrl = 'http://jwgl.sdust.edu.cn/Logon.do?method=logon'
self.get_dataStr()
self.get_safeCode()
self.get_encoded()
login_data = {
'encoded': self.encoded,
'RANDOMCODE': self.safeCode
}
while True:
html = self.seesion.post(url=loginUrl, data=login_data, headers=self.headers).content
if html.find('<title>学生个人中心</title>') != -1:
print('[+] 登录成功! :)\n姓名、学号如下:')
pattern = r'<div id="Top1_divLoginName" class="Nsb_top_menu_nc" style="color: #000000;">(.+?)</div>'
name_num = re.findall(pattern=pattern,string=html)[0].decode('utf-8')
print(name_num)
break
elif html.find('<font color="red">验证码无效,请重新登录!</font>') != -1:
print('[-] 验证码输入错误 :(,请重试该程序...')
time.sleep(2)
exit(0)
else:
print('[-] 登录失败 :( ,正在尝试重新登录...')
time.sleep(1)
continue

def get_class_schedule(self):
"获得学期理论课表并展示"
class_schedule_url = 'http://jwgl.sdust.edu.cn/jsxsd/xskb/xskb_list.do?Ves632DSdyV=NEW_XSD_PYGL'
f = open('schedule.html', 'wb')
html = self.seesion.get(url=class_schedule_url, headers=self.headers).content
f.write(html)
f.close()
webbrowser.open('schedule.html')
# os.remove('schedule.html')

def Teacher_evaluation(self):
"进行教师评价"
evaluation_url = 'http://jwgl.sdust.edu.cn/jsxsd/xspj/xspj_find.do?Ves632DSdyV=NEW_XSD_JXPJ'

def Robbing_class(self):
"抢课"
robbing_url = 'http://jwgl.sdust.edu.cn/jsxsd/xsxkRedis/xklc_list?Ves632DSdyV=NEW_XSD_PYGL'


if __name__ == '__main__':
test = Spider()
test.login()
test.get_class_schedule()

代码3(demo)

一次偶然的机会在网上找到了一个关于过强智网站验证码识别的代码。

比较特殊,强智的验证码里面只有123zxcvbnm这些字符,因此过验证码的难度大大降低。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#orc.py
from PIL import Image
from char_lists import chars

def identify(img):
identification_code_temp=[];identification_code=['']*4;diff_min=[144]*4;
for i in range (4):
identification_code_temp.append(img.crop((i*10, 0, i*10+13, 12)).getdata())
for char in chars:
diff = [0]*4
for i in range(4):
for j in range(156):
if identification_code_temp[i][j] ^ chars[char][j]:
diff[i] += 1
for i in range(4):
if diff[i]<diff_min[i]:
diff_min[i]=diff[i]
identification_code[i]=char
return ''.join(identification_code)

def identificationCodeHandle(img):
rect_box = (3,4,46,16)
img = img.crop(rect_box)
img = img.convert('1')
return img
1
2
3
4
5
6
7
8
9
10
11
12
13
#char_lists.py
chars = {
'1':[255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255],
'2':[255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255],
'3':[255, 255, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255],
'z':[255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255],
'x':[255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255],
'c':[255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255],
'v':[255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255],
'b':[255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 0, 0, 0, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 255, 0, 0, 0, 255, 255, 255, 255, 255, 255],
'n':[255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 0, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255, 0, 0, 255, 255, 255, 255],
'm':[255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 255, 0, 0, 0, 255, 255, 0, 0, 0, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 0, 0, 0, 255, 255, 0, 0, 0, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 0, 0, 255, 0, 0, 255, 255, 255, 0, 0, 255, 255, 255, 0, 0],
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
# -*- coding: UTF-8 -*-
import requests
import webbrowser
import re
import os
import time
from ocr import *
from PIL import Image

__Author__ = 'LiuLian'
# 测试环境Python2.7.16
# Create for fun!

class Spider():
def __init__(self):
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.69 Safari/537.36',
'Referer': 'http://jwgl.sdust.edu.cn/'
}
self.seesion = requests.session()
print('[+] Session初始化完成!')
self.userAccount = raw_input('请输入学号:')
self.userPassword = raw_input('请输入密码:')
self.dataStr = ''
self.safeCode = ''
self.encoded = ''

def get_dataStr(self):
dataStrUrl = 'http://jwgl.sdust.edu.cn/Logon.do?method=logon&flag=sess'
self.dataStr = self.seesion.get(url=dataStrUrl, headers=self.headers).content
print('[+] dataStr 已获取!')

def get_safeCode(self):
safeCodeUrl = 'http://jwgl.sdust.edu.cn/verifycode.servlet'
safeCodeCont = self.seesion.get(url=safeCodeUrl, headers=self.headers).content
f = open('safeCode.jpg', 'wb')
f.write(safeCodeCont)
f.close()
img = Image.open('safeCode.jpg')
#img.show()
img = identificationCodeHandle(img)
safeCode = identify(img)
print '识别的验证码为: ' + safeCode
#safeCode = raw_input('请输入验证码:')
os.remove('safeCode.jpg')
self.safeCode = safeCode
print('[+] 验证码已获取!')

def get_encoded(self):
scode = self.dataStr.split('#')[0]
sxh = self.dataStr.split('#')[1]
code = self.userAccount + '%%%' + self.userPassword
encode = ''
i = 0
while i < len(code):
if i < 20:
encode += code[i:i + 1] + scode[0:int(sxh[i:i + 1])]
scode = scode[int(sxh[i:i + 1]):len(scode)]
else:
encode += code[i:len(code)]
i = len(code)
i += 1
self.encoded = encode
print('[+] encoded已获取!')

def login(self):
loginUrl = 'http://jwgl.sdust.edu.cn/Logon.do?method=logon'
self.get_dataStr()
self.get_safeCode()
self.get_encoded()
login_data = {
'encoded': self.encoded,
'RANDOMCODE': self.safeCode
}
while True:
html = self.seesion.post(url=loginUrl, data=login_data, headers=self.headers).content
if html.find('<title>学生个人中心</title>') != -1:
print('[+] 登录成功! :)\n姓名、学号如下:')
pattern = r'<div id="Top1_divLoginName" class="Nsb_top_menu_nc" style="color: #000000;">(.+?)</div>'
name_num = re.findall(pattern=pattern,string=html)[0].decode('utf-8')
print(name_num)
break
elif html.find('<font color="red">验证码无效,请重新登录!</font>') != -1:
print('[-] 验证码输入错误 :(,请重试该程序...')
time.sleep(2)
exit(0)
else:
print('[-] 登录失败 :( ,正在尝试重新登录...')
time.sleep(1)
continue

def get_class_schedule(self):
"获得学期理论课表并展示"
class_schedule_url = 'http://jwgl.sdust.edu.cn/jsxsd/xskb/xskb_list.do?Ves632DSdyV=NEW_XSD_PYGL'
f = open('schedule.html', 'wb')
html = self.seesion.get(url=class_schedule_url, headers=self.headers).content
f.write(html)
f.close()
webbrowser.open('schedule.html')
# os.remove('schedule.html')

def Teacher_evaluation(self):
"进行教师评价"
evaluation_url = 'http://jwgl.sdust.edu.cn/jsxsd/xspj/xspj_find.do?Ves632DSdyV=NEW_XSD_JXPJ'

def Robbing_class(self):
"抢课"
robbing_url = 'http://jwgl.sdust.edu.cn/jsxsd/xsxkRedis/xklc_list?Ves632DSdyV=NEW_XSD_PYGL'


if __name__ == '__main__':
test = Spider()
test.login()
#test.get_class_schedule()

测试发现,这个代码对m和n的识别度较差,其他都很好,总体识别成功率还是有80%左右的。

0x03 Maybe have more

完成登录以后就可以嘿嘿嘿了。

最后吐槽一句:垃圾强智系统爬爬爬