Changelog
0.6.0
move to supporting Python 3.11 and 3.12
move to openai
1.0
move to pydantic
2.0
add support for November 2023 model upgrades
0.5.1 - 2023-06-13
Improve type annotations and remove some ignored errors.
Support for new OpenAI models announced June 13th 2023.
Improved support for model fallbacks. Now if a request has 6k tokens and the model list looks like ['gpt-3.5-turbo', 'gpt-3.5-turbo-16k']
, the 16k model will be used automatically since the default 4k model will not be able to handle the request.
0.5.0 - 2023-06-06
Restore PaginatedSchemaScraper
and add documentation for pagination.
Documentation improvements.
Small quality-of-life improvements such as better pydantic
schema support and
more useful error messages.
0.4.4 - 2023-03-31
Deactivate HallucinationCheck
by default, it is overly aggressive and needs more work to be useful without raising false positives.
Bugfix for postprocessors parameter behavior not overriding defaults.
0.4.2 - 2023-03-26
Fix type bug with JSON nudging.
Improve HallucinationCheck
to handle more cases.
More tests!
0.4.1 - 2023-03-24
Fix bug with HallucinationCheck.
0.4.0 - 2023-03-24
New configurable pre- and post-processing pipelines for customizing behavior.
Addition of ScrapeResult
object to hold results of scraping along with metadata.
Support for pydantic
models as schemas and for validation.
"Hallucination" check to ensure that the data in the response truly exists on the page.
Use post-processing pipeline to "nudge" JSON errors to a better result.
Now fully type-annotated.
Another big refactor, separation of API calls and scraping logic.
Finally, a ghost logo reminiscent of library's namesake .
0.3.0 - 2023-03-20
Add tests, docs, and complete examples!
Add preprocessors to SchemaScraper
to allow for uniform interface for cleaning & selecting HTML.
Use tiktoken
for accurate token counts.
New cost_estimate
utility function.
Cost is now tracked on a per-scraper basis (see the total_cost
attribute on SchemaScraper
objects).
SchemaScraper
now takes a max_cost
parameter to limit the total cost of a scraper.
Prompt improvements, list mode simplification.
0.2.0 - 2023-03-18
Add list mode, auto-splitting, and pagination support.
Improve xpath
and css
handling.
Improve prompt for GPT 3.5.
Make it possible to alter parameters when calling scrape.
Logging & error handling.
Command line interface.
See blog post for details: https://jamesturk.net/posts/scraping-with-gpt-part-2/
0.1.0 - 2023-03-17
Back to top