Changelog¶
Note
spatula 1.0 should be ready in a few months, providing a more stable interface to build upon, until then interfaces may change between releases.
0.9.0 - 2022-02-10¶
- add
Page.accept_response
method that can be overriden to trigger custom retry logic - add preliminary spatula.config for setting/overriding global defaults (this feature is not yet considered stable, it likely will be modified before 1.0)
0.8.10 - 2022-01-31¶
- update click dependency
0.8.9 - 2021-12-14¶
- fix for
--rmdir
not recreating directory
0.8.8 - 2021-12-09¶
- add
--rmdir
flag tospatula scrape
0.8.7 - 2021-11-09¶
- add support for raising
SkipItem
from a detail page to resume processing without yielding data from the page
0.8.6 - 2021-10-13¶
- add
timeout
argument to URL source - add
--subpages
argument tospatula test
which runs similarly tospatula scrape
but writes output to the terminal
0.8.5 - 2021-08-09¶
- add
verify
argument to URL source - improve messaging when using
spatula test
- add
--dump
flag tospatula scrape
to control output format
0.8.4 - 2021-07-15¶
self.skip
is deprecated in favor of raisingSkipItem
- add experimental support for module arguments to
scrape
command
0.8.3 - 2021-06-23¶
- fix bug where default headers were cleared by default
- update to scrapelib 2.0.6 which contains a bugfix for a redirect follow bug
0.8.2 - 2021-06-22¶
- fix
spatula --version
to report correct version - allow
--data
command line flags to overrideexample_input
values - add caching of
dependencies
- fix pagination on non-list pages
- add advanced documentation & anatomy of a scrape
0.8.1 - 2021-06-17¶
- remove undocumented
page_to_items
function - added
Page.do_scrape
to programmatically get all items from a scrape - added
--source
parameter to scout & scrape commands
0.8.0 - 2021-06-15¶
- remove undocumented
Workflow
- allow using
Page
instances (as opposed to just the type) for scout & scrape - add check for
get_filename
on output classes to override default filename - improved automatic
pydantic
support - add --timeout, --no-verify, --retries, --retry-wait options
- add --fastmode option to use local cache
- fix all CLI commands to obey various scraper options
0.7.1 - 2021-06-14¶
- remove undocumented default behavior for
get_source_from_input
- major documentation overhaul
- fixes for scout scrape when working with raw data returns
0.7.0 - 2021-06-04¶
- add
spatula scout
command - make error messages a bit more clear
- improvements to documentation
- added more CLI options to control verbosity, user agent, etc.
- if module cannot be found, search current directory
0.6.0 - 2021-04-12¶
- add full typing to library
- small bugfixes
0.5.0 - 2021-02-04¶
- add
ExcelListPage
- improve
Page.logger
and CLI output - move to simpler
Workflow
class spatula scrape
can now take the name of a page, will use default Workflow- bugfix: inconsistent name for
process_error_response
0.4.1 - 2021-02-01¶
- bugfix: dependencies are instantiated from parent page input
0.4.0 - 2021-02-01¶
- restore Python 3.7 compatibility
- add behavior to handle returning additional
Page
subclasses to continue scraping - add default behavior when
Page.input
has aurl
attribute. - add
PdfPage
- add
page_to_items
helper - add
Page.example_input
andPage.example_source
for test command - add
Page.logger
for logging - allow use of
dataclasses
in addition toattrs
as input objects - improve output of HTML elements
- bugfix: not specifying a page processor on workflow is no longer an error
0.3.0 - 2021-01-18¶
- first documented major release