Help with diagnosing 403 forbidden error from wget command

Theo Godfrey · 2 Sep 2021 at 14:38

Hi there,

When I try the following code, I get a 403 forbidden error, and I can't work out why.

wget --random-wait --wait 1 --no-directories --user-agent="Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36" --no-parent --span-hosts --accept jpeg,jpg,bmp,gif,png --secure-protocol=auto referer=https://pixabay.com/images/search/ --recursive --level=2 -e robots=off --load-cookies cookies.txt --input-file=pixabay_background_urls.txt

It returns:

--2021-09-01 18:12:06-- https://pixabay.com/photos/search/wallpaper/?cat=backgrounds&pagi=2
Connecting to pixabay.com (pixabay.com)|104.18.20.183|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-09-01 18:12:06 ERROR 403: Forbidden.

Notes:

-The input file has the the url 'https://pixabay.com/photos/search/wallpaper/?cat=backgrounds&pagi=2 ' page3, page 4 etc separated by new lines

-I used the long form for the flags just so I could remember what they were.

-I used a cookie file generated from the website called 'cookies.txt' and made sure it was up to date.

-I used the referer 'https://pixabay.com/images/search/' that I found by looking at the headers in Google DevTools.

-I'm able to visit these URLs normally without any visible captcha requirements

-I noticed one of the cookies _cf_bm had a Secure = TRUE- so needed to be sent using https. I'm not sure whether I'm doing that or not

It might not actually be possible to do, perhaps cloudflare is a deciding factor. But I'd like to know if it was something that could be circumvented and whether or not it's doable to download a large number of files from this website

Any solutions, insights or any other way of downloaded large numbers of image files would be very appreciated.I know pixabay has an API which I might use as a last resort, but I think it's very rate limited.

Theo Godfrey · 3 Sep 2021 at 14:10

Thanks for the input - I'll try cURL and if that doesn't work the cloudflare-scrape module

Help with diagnosing 403 forbidden error from wget command

Theo Godfrey

Theo Godfrey

Theo Godfrey

Theo Godfrey