Skip to content

OutbackCDX does not get parameters of POST request #585

Closed
@kaij

Description

@kaij

Describe the bug

When using OutbackCDX as an index server, the __wb_post_data is not sent with the url to the outbackcdx server. On webpages with multiple XHR POSTs to the same URL, this will return the wrong data. Using a local CDXJ file index works as expected.

Steps to reproduce the bug

  1. Create warc of public page at http://www.corona-data.ch/.
  2. Index with outbackcdx using a command similar to cdx-indexer -p -s corona-data.warc.gz | curl -X POST --data-binary @- http://127.0.0.1:8078/collection
  3. Open the page in pywb replay -> most of the diagrams will stay white.

Expected behavior

The replayed POST requests should contain correct responses (so the diagrams can be drawn)

Screenshots

Replayed page with invalid (white) diagrams. The reason for this is that the CDX information for the POST requests to _dash-update-components are not passed with the query.
image
image

Environment

  • pywb 2.4.2

Additional context

I tried to track this down to the _get_api_url function in warcserver/indexsource.py. The url used does not contain the __wb_post_data. FileIndexSource uses the key parameter. So I see the following options:

  • Passing the key using the urlkey parameter of outbackcdx (and updating documentation)
  • Adding __wb_post_data to the url parameter

There might be also be other options to consider. Also, the __wb_post_data changed to __warc_post_data with cdxj-indexer, so maybe there is more development going on. I'd be interested to contribute a fix, but need some guidance as to the best way.

Update. Quote from the OutbackCDX page: "The canonicalized URL (first field) is ignored, OutbackCDX performs its own canonicalization." - indexing in OutbackCDX seems to ignore the __wb_post_data parameter, so this might need further evaluation/coordination.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions