Skip to content

How to extract items of a list into a []struct? #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TLINDEN opened this issue Mar 11, 2025 · 2 comments
Open

How to extract items of a list into a []struct? #15

TLINDEN opened this issue Mar 11, 2025 · 2 comments

Comments

@TLINDEN
Copy link

TLINDEN commented Mar 11, 2025

Howdy,

I'd need you help again. I'm trying to scrape an amazon wishlist. There are several items per list which I'm trying to put into a slice of Items:

type Wishlist struct {
	Name  string     `goquery:"#profile-list-name,text"`
	Items []Wishitem `goquery:"#content-right,[html]"`
}

type Wishitem struct {
	Product string `goquery:"h2.a-size-base"`
	Price   string `goquery:".a-price .a-offscreen"`
	Link    string `goquery:"h2.a-size-base a,[href]"`
}

While this does extract something, it just creates 1 Wishitem containing ALL matches of the selectors, e.g.:

        items:
            - product: "Cello C1624F 16\" Full HD LED TV Integrierter DVD-Player Triple Tuner DVB-T/T2-C-S/S2 HDMI USB 230V „Pitch Perfect Sound“ für EIN einzigartiges Klangerlebnis\n                        \n                \n            \n        \n        \n    \n\n    \n        \n            \n            \n            \n            \n                \n                \n                    \n                            Cello C1620FS 16\" (41 cm Diagonale) Full HD LED TV mit eingebautem DVD Player DVBT2 S2 Triple Tuner, Schwarz\n"
              price: 199,99 €169,99
              link: ***

I'd expect it to create multiple Wishlist items for every entry on the web page.

Do you have any idea what might be wrong here?

Thanks in advance,
Tom

@TLINDEN
Copy link
Author

TLINDEN commented Mar 11, 2025

PS: if I change the item struct like this:

type Wishitem struct {
	Product []string `goquery:"h2.a-size-base"`
	Price   []string `goquery:".a-price .a-offscreen"`
	Link    []string `goquery:"h2.a-size-base a,[href]"`
}

Then I get several slices per thing:

- product:
  - Cello C1624F 16" Full HD LED TV Integrierter DVD-Player Triple Tuner DVB-T/T2-C-S/S2 HDMI USB 230V „Pitch Perfect Sound“ für EIN einzigartiges Klangerlebnis
  - Cello C1620FS 16" (41 cm Diagonale) Full HD LED TV mit eingebautem DVD Player DVBT2 S2 Triple Tuner, Schwarz
  - Reflexion_TV LDDW19iSB+ DVD-PlayerSmart-TV 19 Zoll für Wohnmobile und Wohnwagen 12V KFZ-Adapter mit Soundbar HD Auflösung HDMI, WLAN, Bluetooth erschütterungsfest, schwarz, LDDW19i+
price:
  - 199,99 €
  - 169,99 €
  - 319,95 €

While I could iterate over all of these slices and combine them into one struct manually, I'd prefer to get a slice of structs directly.

@TLINDEN
Copy link
Author

TLINDEN commented Mar 11, 2025

Ok, I almost nailed it:

type Wishlist struct {
	Name  string     `goquery:"#profile-list-name,text"`
	Items []Wishitem `goquery:"ul#g-items li .a-list-item ,[html]"`
}

type Wishitem struct {
	Product string `goquery:"h2.a-size-base"`
	Price   string `goquery:".a-price .a-offscreen"`
	Link    string `goquery:"h2.a-size-base a,[href]"`
}

But there are some empty items in the slice, which I'm just cleaning out at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant