Python yordamida Veb-saytlar uchun Foydalanuvchi Agentlarini Aniqlash
User-Agent (foydalanuvchi agenti) – bu mijoz (foydalanuvchi yoki dastur) tomonidan serverga yuboriladigan so‘rov sarlavhasining bir qismi bo‘lib, u orqali veb-saytlar mijoz qurilmasi haqida ma’lumot olishadi. Bu agentning vazifasi – foydalanuvchi qurilmasining turi, operatsion tizimi va brauzeri haqida ma’lumot berishdir. Serverlar User-Agent orqali qaysi qurilmadan foydalanilayotganini bilib olishlari va kontentni moslashtirishlari mumkin.
Foydalanuvchi Agentining Vazifalari
Kontentni moslashtirish: Brauzer va qurilmaga mos dizayn va tarkib.
Statistik tahlil: Brauzerlar, operatsion tizimlar va qurilmalar statistikasi.
Botlarni aniqlash: Ma’lumotlarni o‘zlashtiruvchi botlarni aniqlash.
Foydalanuvchi Agentlarining Strukturasi
Oddiy User-Agent qatorini ko‘rib chiqamiz:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36
Bu qator foydalanuvchi agentining brauzer turi, operatsion tizim va boshqa komponentlar haqidagi ma’lumotlarini beradi. Strukturasi:
Brauzer (yoki dastur) turi: Mozilla/5.0
Operatsion tizim: (Windows NT 10.0; Win64; x64)
Render dvigateli: AppleWebKit/537.36 (KHTML, like Gecko)
Brauzer nomi va versiyasi: Chrome/92.0.4515.107
Qo‘shimcha ma’lumot: Safari/537.36
1 Python’da Foydalanuvchi Agentini Sozlash
Python’da requests kutubxonasidan foydalanib, veb-saytlarga so‘rov yuborishda User-Agent ni qo‘lda sozlash mumkin.
O‘rnatish
requests kutubxonasini quyidagi buyruq bilan o‘rnating:
pip install requests
Oddiy Foydalanuvchi Agentni Sozlash
Quyidagi funksiyada veb-saytga so‘rov yuborishda User-Agent ni qanday sozlash mumkinligi ko‘rsatilgan:
import requests
def get_page_with_user_agent(url, user_agent):
"""
Berilgan URL manzilga o'zgartirilgan User-Agent bilan so'rov yuborish.
"""
headers = {
"User-Agent": user_agent # Foydalanuvchi agentini sozlash
}
response = requests.get(url, headers=headers) # GET so'rovini yuborish
return response
# Sinov uchun User-Agent va URL
url = "https://httpbin.org/headers"
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"
response = get_page_with_user_agent(url, user_agent)
# Javobni chiqarish
print(response.json())
Tahlil:
headers = {"User-Agent": user_agent}: User-Agent qiymatini sozlaydigan sarlavha (header) o‘zgaruvchisi yaratiladi.
requests.get(url, headers=headers): Berilgan URL manzilga headers parametri bilan so‘rov yuboriladi, bu so‘rovda User-Agent kiritilgan.
2 Foydalanuvchi Agentlari Ro‘yxati bilan Random Agentni Tanlash
Ba’zan bir nechta User-Agent larni ishlatish foydali bo‘ladi (masalan, saytga skanerlash yoki tahlil qilganda). Quyida ro‘yxatdagi User-Agent lardan tasodifiy tanlab so‘rov yuborish funksiyasi keltirilgan.
import random
# Foydalanuvchi agentlari ro'yxati
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15",
"Mozilla/5.0 (iPhone; CPU iPhone OS 14_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1"
]
def get_random_user_agent():
"""
Foydalanuvchi agentlaridan tasodifiy User-Agent qaytarish.
"""
return random.choice(user_agents)
def get_page_random_user_agent(url):
"""
Tasodifiy User-Agent bilan URL'ga so'rov yuborish.
"""
user_agent = get_random_user_agent()
headers = {"User-Agent": user_agent}
response = requests.get(url, headers=headers)
return response
# URL manzil va tasodifiy User-Agent bilan sinov
url = "https://httpbin.org/headers"
response = get_page_random_user_agent(url)
# Javobni chiqarish
print(response.json())
headers = {"User-Agent": user_agent}: User-Agent tanlangan qiymat bilan sozlanadi va keyinchalik so‘rovda ishlatiladi.
3 User-Agent orqali Brauzer Tahlili
User-Agent ni tahlil qilish orqali brauzer va qurilma haqidagi ma’lumotlarni ajratib olish mumkin. Quyidagi misolda User-Agent ni brauzer va operatsion tizimga ajratish funksiyasi keltirilgan.
import re
def analyze_user_agent(user_agent):
"""
User-Agent ma'lumotidan brauzer va operatsion tizimni ajratib olish.
"""
browser = "Noma'lum"
os = "Noma'lum"
# Brauzerni aniqlash
if "Chrome" in user_agent:
browser = "Chrome"
elif "Safari" in user_agent:
browser = "Safari"
elif "Firefox" in user_agent:
browser = "Firefox"
# Operatsion tizimni aniqlash
if "Windows" in user_agent:
os = "Windows"
elif "Mac OS X" in user_agent:
os = "Mac OS X"
elif "iPhone OS" in user_agent:
os = "iOS"
elif "Android" in user_agent:
os = "Android"
return browser, os
# User-Agentni tahlil qilish
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"
browser, os = analyze_user_agent(user_agent)
print("Brauzer:", browser)
print("Operatsion tizim:", os)
Tahlil:
if "Chrome" in user_agent – User-Agent tarkibida "Chrome" matni mavjudligini tekshiradi va brauzerni aniqlaydi.
if "Windows" in user_agent – User-Agent tarkibida "Windows" mavjudligini tekshiradi va operatsion tizimni aniqlaydi.
5 Veb-sayt Javobidagi Foydalanuvchi Agentni Tekshirish
Veb-saytlar so‘rovlarni qabul qilganda User-Agent ni qayta ishlaydi va bu ma’lumotni tekshirish orqali biz o‘zimiz yuborgan User-Agent ni ko‘rib olishimiz mumkin.
def check_user_agent_on_site(url, user_agent):
"""
URL'ga yuborilgan so'rovda User-Agentni tekshirish.
"""
headers = {"User-Agent": user_agent}
response = requests.get(url, headers=headers)
response_json = response.json()
# JSON tarkibidagi User-Agent ma'lumotini chiqarish
print("Yuborilgan User-Agent:", user_agent)
print("Serverdan qaytgan User-Agent:", response_json["headers"]["User-Agent"])
# Sinov uchun URL va User-Agent
url = "https://httpbin.org/headers"
user_agent = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"
check_user_agent_on_site(url, user_agent)
Tahlil:
response_json["headers"]["User-Agent"] – JSON javobidan User-Agent qiymatini oladi va yuborilgan agent bilan solishtiradi.
5 Foydalanuvchi Agentlari Orasida Aylanma So‘rov Yuborish
Quyidagi funksiya User-Agent lar orasidan har so‘rovda birini aylanma (tasodifiy emas) tarzda tanlaydi va saytga so‘rov yuboradi.
import itertools
# Aylanma tarzda foydalanish uchun user_agents listini itertools.cycle ga o'rnatamiz
user_agents_cycle = itertools.cycle(user_agents)
def get_page_cyclic_user_agent(url):
"""
Aylanma (cycle) usulida User-Agent bilan so'rov yuborish.
"""
user_agent = next(user_agents_cycle)
headers = {"User-Agent": user_agent}
response = requests.get(url, headers=headers)
return response
# URL va aylanma User-Agent bilan so'rov yuborish
url = "https://httpbin.org/headers"
for _ in range(5): # 5 ta so'rov yuborish
response = get_page_cyclic_user_agent(url)
print(response.json())
Tahlil:
user_agents_cycle = itertools.cycle(user_agents) – itertools.cycle yordamida user_agents ro‘yxati bo‘yicha aylanma generator yaratiladi.
user_agent = next(user_agents_cycle) – Har safar yangi so‘rov yuborilganda keyingi User-Agent ni tanlaydi.
6 To‘liq Dastur
Quyidagi dastur barcha funktsiyalarni birlashtirgan to‘liq dastur bo‘lib, veb-saytga User-Agent lar orqali so‘rov yuborish, brauzer va operatsion tizimni aniqlash, va aylanma User-Agent lar bilan ishlash imkonini beradi.
import requests
from bs4 import BeautifulSoup
import random
import re
import itertools
user_agents = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.3 Safari/605.1.15",
"Mozilla/5.0 (iPhone; CPU iPhone OS 14_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1"
]
user_agents_cycle = itertools.cycle(user_agents)
def get_random_user_agent():
return random.choice(user_agents)
def analyze_user_agent(user_agent):
browser = "Noma'lum"
os = "Noma'lum"
if "Chrome" in user_agent:
browser = "Chrome"
elif "Safari" in user_agent:
browser = "Safari"
elif "Firefox" in user_agent:
browser = "Firefox"
if "Windows" in user_agent:
os = "Windows"
elif "Mac OS X" in user_agent:
os = "Mac OS X"
elif "iPhone OS" in user_agent:
os = "iOS"
elif "Android" in user_agent:
os = "Android"
return browser, os
def get_page_random_user_agent(url):
user_agent = get_random_user_agent()
headers = {"User-Agent": user_agent}
response = requests.get(url, headers=headers)
return response
def check_user_agent_on_site(url, user_agent):
headers = {"User-Agent": user_agent}
response = requests.get(url, headers=headers)
response_json = response.json()
print("Yuborilgan User-Agent:", user_agent)
print("Serverdan qaytgan User-Agent:", response_json["headers"]["User-Agent"])
# URL va sinov uchun funksiyalarni chaqirish
url = "https://httpbin.org/headers"
random_response = get_page_random_user_agent(url)
print("Random User-Agent bilan so'rov:", random_response.json())
for _ in range(5):
cyclic_response = get_page_random_user_agent(url)
print("Cycle User-Agent bilan so'rov:", cyclic_response.json())
user_agent = user_agents[0]
browser, os = analyze_user_agent(user_agent)
print("Brauzer:", browser)
print("Operatsion tizim:", os)
check_user_agent_on_site(url, user_agent)
Bu dastur yordamida User-Agent larni sozlash va o‘zgartirish, tahlil qilish va aylanma usulda ishlatish bo‘yicha amaliyotlar bilan tanishasiz.