Overview¶
scrapelib is a library for making requests for less-than-reliable websites.
Source: https://github.com/jamesturk/scrapelib
Documentation: https://jamesturk.github.io/scrapelib/
Issues: https://github.com/jamesturk/scrapelib/issues
Features¶
scrapelib originated as part of the Open States project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.
Advantages of using scrapelib over using requests as-is:
- HTTP(S) and FTP requests via an identical API
 - support for simple caching with pluggable cache backends
 - highly-configurable request throtting
 - configurable retries for non-permanent site failures
 - All of the power of the suberb requests library.
 
Installation¶
scrapelib is on PyPI, and can be installed via any standard package management tool:
poetry add scrapelib
or:
pip install scrapelib
Example Usage¶
  import scrapelib
  s = scrapelib.Scraper(requests_per_minute=10)
  # Grab Google front page
  s.get('http://google.com')
  # Will be throttled to 10 HTTP requests per minute
  while True:
      s.get('http://example.com')