本文详细介绍了如何通过Python脚本实现Google Dork的自动搜索,包括代理池的获取与筛选、用户代理轮换、SSL绕过技术以及多线程并发处理,有效提升搜索效率并规避检测。
Google Dork SSL 绕过实现自动 Dorking
代码实现概述
以下Python脚本实现了自动化的Google Dork搜索功能,通过代理池和用户代理轮换技术绕过SSL限制,支持多线程并发处理。
代理获取与筛选
1
2
3
4
5
6
7
8
9
10
11
|
def get_proxies():
proxies = []
if not os.path.exists("proxies.txt"):
url = "https://api.proxyscrape.com/v2/?request=getproxies&protocol=http&timeout=10000&country=all&ssl=all&anonymity=all&limit=5000"
proxies = requests.get(url).text.split("\n")
with open("proxies.txt", "w") as f:
f.write("\n".join(proxies))
else:
with open("proxies.txt", "r") as f:
proxies = f.read().split("\n")
return proxies
|
代理测试功能
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
def test_proxy(proxy, user_agent, verbose):
test_url = "https://bing.com"
headers = {"User-Agent": user_agent}
try:
proxies = {"http": f"http://{proxy}", "https": f"http://{proxy}"}
response = requests.get(test_url, headers=headers, proxies=proxies, timeout=3)
print(colored(f"Scraping good proxies...", "blue"))
if response.status_code == 200:
print(colored(f"Good proxy found: {proxy}", "green"))
return True
except requests.exceptions.ConnectTimeout:
if verbose:
print(colored(f"Connection timeout for proxy: {proxy}", "red"))
# 其他异常处理...
return False
|
工作代理筛选
1
2
3
4
5
6
7
8
9
|
def filter_working_proxies(proxies, user_agents, verbose):
working_proxies = []
user_agent = random.choice(user_agents)
with ThreadPoolExecutor(max_workers=50) as executor:
futures_to_proxies = {executor.submit(test_proxy, proxy, user_agent, verbose): proxy for proxy in proxies}
for future in as_completed(futures_to_proxies):
if future.result():
working_proxies.append(futures_to_proxies[future])
return working_proxies
|
Google搜索功能
1
2
3
4
5
6
7
|
def google_search(query, user_agent, proxy):
url = f"https://www.google.com/search?q={query}"
headers = {"User-Agent": user_agent}
proxies = {"http": f"http://{proxy}", "https": f"http://{proxy}"}
response = requests.get(url, headers=headers, proxies=proxies, timeout=10)
soup = BeautifulSoup(response.text, "html.parser")
return [result["href"] for result in soup.select(".yuRUbf a")]
|
Dork搜索主逻辑
1
2
3
4
5
6
7
|
def search_dork(dork, proxies, user_agents, verbose, max_retries=3, backoff_factor=1.0):
# 实现重试机制和代理轮换
retries = 0
while retries <= max_retries:
proxy = random.choice(proxies)
user_agent = random.choice(user_agents)
# 搜索逻辑...
|
主程序入口
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
def main():
parser = argparse.ArgumentParser()
parser.add_argument("-v", "--verbose", help="Display errors with proxies.", action="store_true")
args = parser.parse_args()
# 读取dork文件
dorks = []
with open("dorks.txt", "r") as f:
dorks = f.read().split("\n")
# 获取用户代理和有效代理
user_agents = get_user_agents()
proxies = filter_working_proxies(get_proxies(), user_agents, args.verbose)
# 创建结果目录
if not os.path.exists("results"):
os.makedirs("results")
# 多线程执行dork搜索
with ThreadPoolExecutor(max_workers=20) as executor:
futures = {executor.submit(search_dork, dork, proxies, user_agents, args.verbose): dork for dork in dorks}
for future in as_completed(futures):
future.result()
|
技术特点
- 代理池管理:自动从ProxyScrape API获取代理并本地缓存
- 代理验证:多线程验证代理可用性,支持超时和错误处理
- 用户代理轮换:随机选择用户代理避免被检测
- 重试机制:实现指数退避重试策略
- 并发处理:使用ThreadPoolExecutor实现多线程搜索
- 结果保存:自动保存前20个搜索结果到文件
使用方式
1
|
python3 script.py [-v|--verbose]
|
通过-v参数可以显示详细的代理错误信息,便于调试和监控代理状态。