We are launching Crawlee for Python, an open-source library for web scraping and browser automation. Quickly scrape data, store it, and avoid getting blocked with auto-generated human-like fingerprints, headless browsers, and smart proxy rotation.
Crawlee is a robust web scraping and browser automation library for Python. It enables developers to build reliable crawlers quickly and efficiently.
Crawlee is developed by experienced web scraping professionals who use it daily for large-scale data extraction projects.
pipx run crawlee create my-crawler
pip install 'crawlee[playwright]'
playwright install
import asyncio
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext
async def main():
crawler = PlaywrightCrawler(
max_requests_per_crawl=5,
headless=False,
browser_type='firefox',
)
@crawler.router.default_handler
async def request_handler(context: PlaywrightCrawlingContext) -> None:
await context.enqueue_links()
data = {
'url': context.request.url,
'title': await context.page.title(),
'content': (await context.page.content())[:100],
}
await context.push_data(data)
await crawler.run(['https://crawlee.dev'])
await crawler.export_data('results.json')
if __name__ == '__main__':
asyncio.run(main())
# # # # # # # # # # # # # # # # # # # # # # # # # .
轻松 Web 抓取,人人可享。
在不存在的地方创建 API。使用 kimono,您可以快速...
Saldor 为大型语言模型提取最佳的网络数据。
立即将网站转换为可定制的 API。
无痛数据提取和Web自动化
无缝地抓取、解析和扩展 Web 数据
最简单的网页抓取 API,不会被封锁。
AI 驱动的浏览器自动化。
简化网络数据抓取
与志同道合的专业人士进行一对一对话
从 Slack 混乱到清晰,只需几分钟
在不到 30 分钟的时间内个性化数千个着陆页
首个用于文档解析的 LLM,兼具准确性和速度
面向 SaaS 专业人士的 AI 助手
带实时翻译功能的 AI 电话应用程序
令人愉快的 AI 支持的互动演示—现在无需登录
AI 动态图形副驾驶
抛洒彩带,摆脱压力和焦虑,100% 无需人工智能
SaaS 的顺畅支付