Google Dork SSL 绕过实现自动 Dorking 技术解析

本文详细介绍了如何通过Python脚本实现Google Dork的自动搜索,包括代理池的获取与筛选、用户代理轮换、SSL绕过技术以及多线程并发处理,有效提升搜索效率并规避检测。

Google Dork SSL 绕过实现自动 Dorking

代码实现概述

以下Python脚本实现了自动化的Google Dork搜索功能,通过代理池和用户代理轮换技术绕过SSL限制,支持多线程并发处理。

代理获取与筛选

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def get_proxies():
    proxies = []
    if not os.path.exists("proxies.txt"):
        url = "https://api.proxyscrape.com/v2/?request=getproxies&protocol=http&timeout=10000&country=all&ssl=all&anonymity=all&limit=5000"
        proxies = requests.get(url).text.split("\n")
        with open("proxies.txt", "w") as f:
            f.write("\n".join(proxies))
    else:
        with open("proxies.txt", "r") as f:
            proxies = f.read().split("\n")
    return proxies

代理测试功能

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def test_proxy(proxy, user_agent, verbose):
    test_url = "https://bing.com"
    headers = {"User-Agent": user_agent}
    try:
        proxies = {"http": f"http://{proxy}", "https": f"http://{proxy}"}
        response = requests.get(test_url, headers=headers, proxies=proxies, timeout=3)
        print(colored(f"Scraping good proxies...", "blue"))
        if response.status_code == 200:
            print(colored(f"Good proxy found: {proxy}", "green"))
            return True
    except requests.exceptions.ConnectTimeout:
        if verbose:
            print(colored(f"Connection timeout for proxy: {proxy}", "red"))
    # 其他异常处理...
    return False

工作代理筛选

1
2
3
4
5
6
7
8
9
def filter_working_proxies(proxies, user_agents, verbose):
    working_proxies = []
    user_agent = random.choice(user_agents)
    with ThreadPoolExecutor(max_workers=50) as executor:
        futures_to_proxies = {executor.submit(test_proxy, proxy, user_agent, verbose): proxy for proxy in proxies}
        for future in as_completed(futures_to_proxies):
            if future.result():
                working_proxies.append(futures_to_proxies[future])
    return working_proxies

Google搜索功能

1
2
3
4
5
6
7
def google_search(query, user_agent, proxy):
    url = f"https://www.google.com/search?q={query}"
    headers = {"User-Agent": user_agent}
    proxies = {"http": f"http://{proxy}", "https": f"http://{proxy}"}
    response = requests.get(url, headers=headers, proxies=proxies, timeout=10)
    soup = BeautifulSoup(response.text, "html.parser")
    return [result["href"] for result in soup.select(".yuRUbf a")]

Dork搜索主逻辑

1
2
3
4
5
6
7
def search_dork(dork, proxies, user_agents, verbose, max_retries=3, backoff_factor=1.0):
    # 实现重试机制和代理轮换
    retries = 0
    while retries <= max_retries:
        proxy = random.choice(proxies)
        user_agent = random.choice(user_agents)
        # 搜索逻辑...

主程序入口

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("-v", "--verbose", help="Display errors with proxies.", action="store_true")
    args = parser.parse_args()

    # 读取dork文件
    dorks = []
    with open("dorks.txt", "r") as f:
        dorks = f.read().split("\n")

    # 获取用户代理和有效代理
    user_agents = get_user_agents()
    proxies = filter_working_proxies(get_proxies(), user_agents, args.verbose)

    # 创建结果目录
    if not os.path.exists("results"):
        os.makedirs("results")

    # 多线程执行dork搜索
    with ThreadPoolExecutor(max_workers=20) as executor:
        futures = {executor.submit(search_dork, dork, proxies, user_agents, args.verbose): dork for dork in dorks}
        for future in as_completed(futures):
            future.result()

技术特点

  1. 代理池管理:自动从ProxyScrape API获取代理并本地缓存
  2. 代理验证:多线程验证代理可用性,支持超时和错误处理
  3. 用户代理轮换:随机选择用户代理避免被检测
  4. 重试机制:实现指数退避重试策略
  5. 并发处理:使用ThreadPoolExecutor实现多线程搜索
  6. 结果保存:自动保存前20个搜索结果到文件

使用方式

1
python3 script.py [-v|--verbose]

通过-v参数可以显示详细的代理错误信息,便于调试和监控代理状态。

comments powered by Disqus
使用 Hugo 构建
主题 StackJimmy 设计