目的是想对存在的两种不同格式的请求日志进行统计,根据url请求次数进行排行;
1.日志范例
不含用户信息的日志格式1
[2023-05-24 13:45:03.611] [Param] [请求方法:POST] [请求地址:/wfcm-api/ad/getImageOrVideo] [请求参数:sign=c881ac77e9a0abb03d5adf41c79145de×tamp=1684907104&type=1] [请求ip:125.37.29.10][请求链路标识:RID_ANDROID_1661246914756759553]
含有用户信息的日志格式2
[2023-05-24 13:45:03.636] [Param] [请求方法:POST] [请求地址:/wfcm-api/popupWindow/get] [请求参数:token=ANDROID_18330_1683981259653_29a1c01a-6330-4f4d-901f-f369e2bddf52&sign=c881ac77e9a0abb03d5adf41c79145de&terminal=2&position=1] [用户信息:userId=18330&phone=166****631] [请求ip:125.37.29.10][请求链路标识:RID_ANDROID_1661246914853228545]```
2.统计脚本
url_sort.py
# coding=utf-8
import re
import pandas as pd
import sys
paramRexHaveUser = re.compile(
r'\[(?P<time>.*?)\] \[(?P<type>Param)\] \[(?P<method>.*?)\] \[(?P<url>.*?)\] \[(?P<param>.*?)\] \[(?P<user>.*?)\] \[(?P<ip>.*?)\]\[(?P<requestId>.*?)\]')
paramRexNotUser = re.compile(
r'\[(?P<time>.*?)\] \[(?P<type>Param)\] \[(?P<method>.*?)\] \[(?P<url>.*?)\] \[(?P<param>.*?)\] \[(?P<ip>.*?)\]\[(?P<requestId>.*?)\]')
keys = ['time', 'type', 'method', 'url', 'param', 'user', 'ip', 'requestId']
def tryPrint(name, str):
try:
result = paramRexHaveUser.match(str)
if result:
print(result.group(name))
except:
try:
result = paramRexNotUser.match(str)
if result:
print(result.group(name))
except:
pass
def ceshi(str):
for key in keys:
tryPrint(key, str)
def printLog(filePath):
with open(filePath, 'r') as log:
for logLine in log:
ceshi(logLine)
def rankByUrlCount():
param = ''
try:
param= sys.argv[1]
except:
print('知识课堂: ios输入1 安卓输入2 h5输入3')
print('电台: 输入4')
return
if param == '4':
path = '/opt/tomcat/tomcat-talkshow/logs/renren/info.log'
else:
path = '/opt/tomcat/tomcat-'+ param + '/logs/renren/info.log'
dict = {}
name = 'url'
with open(path, 'r') as log:
for logLine in log:
val = ''
try:
result = paramRexHaveUser.match(logLine)
if result:
val = result.group(name)
except:
try:
result = paramRexNotUser.match(logLine)
if result:
val = result.group(name)
except:
pass
try:
tmp = dict[val]
dict[val] = tmp + 1
except:
dict[val] = 1
dict.pop('',None)
df = pd.DataFrame(list(dict.items()),columns=['请求地址','请求次数'])
df = df.sort_values(by='请求次数',axis=0,ascending=False, inplace=False)
#索引进行重新排序
df = df.reset_index(drop=True)
#数据左对齐
#pd.set_option('colheader_justify', 'left')
#打印全部数据
#pd.set_option('display.max_rows', None)
#pd.set_option('display.max_rows', 20)
print(df.head(20))
def main():
try:
rankByUrlCount()
except BaseException as e:
print('报错信息:',e)
main()
3.执行脚本
统计ios的请求日志:
python url_sort.py 1
输出结果:
请求地址 请求次数
0 请求地址:/wfcm-api/course/saveStudyTime20 212074
1 请求地址:/wfcm-api/course/savePlayLog 9287
2 请求地址:/wfcm-api/popupWindow/get 6960
3 请求地址:/wfcm-api/member/myCoin 6408
4 请求地址:/wfcm-api/order/courseChapterByPage 4571
5 请求地址:/wfcm-api/vip/vipModule 2149
本文由 GY 创作,采用 知识共享署名4.0 国际许可协议进行许可
本站文章除注明转载/出处外,均为本站原创或翻译,转载前请务必署名
最后编辑时间为:
2023/05/24 05:49