Crawlee for Python logo

Crawlee for PythonBuild reliable scrapers in Python

We are launching Crawlee for Python, an open-source library for web scraping and browser automation. Quickly scrape data, store it, and avoid getting blocked with auto-generated human-like fingerprints, headless browsers, and smart proxy rotation.

Crawlee for Python screenshot
Crawlee for Python Hakkında Daha Fazla

Crawlee: Powerful Web Scraping and Browser Automation Library

Introduction

Crawlee is a robust web scraping and browser automation library for Python. It enables developers to build reliable crawlers quickly and efficiently.

Key Features

  • Python implementation with type hints
  • Seamless switching between HTTP and headless browser crawling
  • Built on Playwright for browser automation
  • Automatic scaling and proxy management
  • Support for Chrome, Firefox, and other browsers

Use Cases

  • Web scraping at scale
  • Browser automation tasks
  • Data extraction from JavaScript-rendered websites
  • Maintaining large-scale crawling projects

Teams

Crawlee is developed by experienced web scraping professionals who use it daily for large-scale data extraction projects.

Getting Started

pipx run crawlee create my-crawler
pip install 'crawlee[playwright]'
playwright install

Example Usage

import asyncio
from crawlee.playwright_crawler import PlaywrightCrawler, PlaywrightCrawlingContext

async def main():
    crawler = PlaywrightCrawler(
        max_requests_per_crawl=5,
        headless=False,
        browser_type='firefox',
    )

    @crawler.router.default_handler
    async def request_handler(context: PlaywrightCrawlingContext) -> None:
        await context.enqueue_links()
        data = {
            'url': context.request.url,
            'title': await context.page.title(),
            'content': (await context.page.content())[:100],
        }
        await context.push_data(data)

    await crawler.run(['https://crawlee.dev'])
    await crawler.export_data('results.json')

if __name__ == '__main__':
    asyncio.run(main())