Changelog¶
Note
spatula 1.0 should be ready in a few months, providing a more stable interface to build upon, until then interfaces may change between releases.
0.9.0 - 2022-02-10¶
- add
Page.accept_responsemethod that can be overriden to trigger custom retry logic - add preliminary spatula.config for setting/overriding global defaults (this feature is not yet considered stable, it likely will be modified before 1.0)
0.8.10 - 2022-01-31¶
- update click dependency
0.8.9 - 2021-12-14¶
- fix for
--rmdirnot recreating directory
0.8.8 - 2021-12-09¶
- add
--rmdirflag tospatula scrape
0.8.7 - 2021-11-09¶
- add support for raising
SkipItemfrom a detail page to resume processing without yielding data from the page
0.8.6 - 2021-10-13¶
- add
timeoutargument to URL source - add
--subpagesargument tospatula testwhich runs similarly tospatula scrapebut writes output to the terminal
0.8.5 - 2021-08-09¶
- add
verifyargument to URL source - improve messaging when using
spatula test - add
--dumpflag tospatula scrapeto control output format
0.8.4 - 2021-07-15¶
self.skipis deprecated in favor of raisingSkipItem- add experimental support for module arguments to
scrapecommand
0.8.3 - 2021-06-23¶
- fix bug where default headers were cleared by default
- update to scrapelib 2.0.6 which contains a bugfix for a redirect follow bug
0.8.2 - 2021-06-22¶
- fix
spatula --versionto report correct version - allow
--datacommand line flags to overrideexample_inputvalues - add caching of
dependencies - fix pagination on non-list pages
- add advanced documentation & anatomy of a scrape
0.8.1 - 2021-06-17¶
- remove undocumented
page_to_itemsfunction - added
Page.do_scrapeto programmatically get all items from a scrape - added
--sourceparameter to scout & scrape commands
0.8.0 - 2021-06-15¶
- remove undocumented
Workflow - allow using
Pageinstances (as opposed to just the type) for scout & scrape - add check for
get_filenameon output classes to override default filename - improved automatic
pydanticsupport - add --timeout, --no-verify, --retries, --retry-wait options
- add --fastmode option to use local cache
- fix all CLI commands to obey various scraper options
0.7.1 - 2021-06-14¶
- remove undocumented default behavior for
get_source_from_input - major documentation overhaul
- fixes for scout scrape when working with raw data returns
0.7.0 - 2021-06-04¶
- add
spatula scoutcommand - make error messages a bit more clear
- improvements to documentation
- added more CLI options to control verbosity, user agent, etc.
- if module cannot be found, search current directory
0.6.0 - 2021-04-12¶
- add full typing to library
- small bugfixes
0.5.0 - 2021-02-04¶
- add
ExcelListPage - improve
Page.loggerand CLI output - move to simpler
Workflowclass spatula scrapecan now take the name of a page, will use default Workflow- bugfix: inconsistent name for
process_error_response
0.4.1 - 2021-02-01¶
- bugfix: dependencies are instantiated from parent page input
0.4.0 - 2021-02-01¶
- restore Python 3.7 compatibility
- add behavior to handle returning additional
Pagesubclasses to continue scraping - add default behavior when
Page.inputhas aurlattribute. - add
PdfPage - add
page_to_itemshelper - add
Page.example_inputandPage.example_sourcefor test command - add
Page.loggerfor logging - allow use of
dataclassesin addition toattrsas input objects - improve output of HTML elements
- bugfix: not specifying a page processor on workflow is no longer an error
0.3.0 - 2021-01-18¶
- first documented major release