Can anyone help me scrape a website please?

Man of Honour
Joined
5 Jun 2003
Posts
91,549
Location
Falling...
Sorry if it's the wrong place mods.

I'm trying to get a full list of exhibitors and headline title (what they do) from this event 2708 of them is too much to do manually! :D


(it's free to register, you just need to enter your email and they send you a code).



The red arrows are priority, blue are a nice to have.

Problem is I have the list of exhibitors as a PDF, but it doesn't tell me the headline figures of what they do. There are too many to go through manually, and I just want to get a quick and dirty csv file or something I can plonk into excel.

It means I can do some analysis of who's there and who I need to speak to when I fly out there, but rather than go through the clunky and slow website there's just no easy way of getting the data I need. I've asked them to send me an excel file, but they don't have one which is frustrating.

Can anyone help? I'll gladly donate something for your time - but I'm not a coder, nor do I really have the patience to try and learn how to use a webscraper - unless there's an easy one that exists out there that I can use....
 
Fortunately they not using any fancy security so it was quite easy to just monitor how the website loads data - in a nutshell, in your browser open the dev tools, go to the network tab, watch what requests it makes and use a tool like cURL to issue the same request and dump it to disk.

Here's a JSON dump of all the data: https://www.dropbox.com/s/wfbu5y638ure7rl/exhibitors.zip?dl=1 (I'll probably delete this, so save it and keep it yourself)

To open it in Excel:
  1. Open Excel, create a blank workbook
  2. Go to the data tab
  3. Click get data > from file > from json:
    9cXMhzi.png
  4. Select the json file from above
  5. Once it loads, click to table (top left):
    HowiZkw.png
  6. Click OK on the window that pops up
  7. You should now get this columns thing showing up:
    DwZBbSt.png
  8. Click it to select what columns you want to keep
  9. Click Close & Load at the top left:
    oYDyx9F.png
  10. Success!
    mlwfGuc.png

This will get you most of the way there, with the red arrows at least.

For the blue arrows, it looks like the data is within the stands and categories columns, but they're stored as a list of multiple values, so don't automatically expand. Just after step 8 scroll across to those columns then click the button thing at the top right of say the stand column:

tS8Z8XX.png

Playing with these options e.g. expand to new rows might let you dump it out. Never really used this feature in Excel.

If you really can't get it working properly in Excel I could write a tiny script to just dump it out if you ask nicely :p


I'd recommend you use something like notepad++ (free - beats the Windows one hands down) to open the raw json file and browse it.
 
Back
Top Bottom