Skip to content

Command Line Interface

scrapeghost offers a command line interface which is particularly useful for experimentation.

It is also possible to use as a step in a data pipeline.

Configuration

In order to use the CLI, the OPENAI_API_KEY environment variable must be set.

Usage

scrapeghost --help
 Usage: scrapeghost [OPTIONS] URL                                                                                                                                            
╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────╮
│ *    url      TEXT  [default: None] [required]                                                      │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────╮
│ --xpath                         TEXT     XPath selector to narrow the scrape [default: None]        │
│ --css                           TEXT     CSS selector to narrow the scrape [default: None]          │
│ --schema                        TEXT     Schema to use for scraping [default: None]                 │
│ --schema-file                   PATH     Path to schema.json file [default: None]                   │
│ --gpt4             --no-gpt4             Use GPT-4 instead of GPT-3.5-turbo [default: no-gpt4]      │
│ --verbose      -v               INTEGER  Verbosity level 0-2 [default: 0]                           │
│ --help                                   Show this message and exit.                                │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯