.

Associate
Joined
16 Aug 2010
Posts
1,365
Location
UK
I don't know about premade tools, but if I were doing it myself, I'd use Python, Selenium to scrape and the pyexcel package to generate the excel sheet (or insert into a current one).
 
Soldato
Joined
6 Mar 2008
Posts
10,078
Location
Stoke area
yep, Python + selenium or BeautifulSoup to scrape, just save it to a notepad file as a CSV then open with excel and saves as xlsx. No need to worry about programming the excel side of it then.

A quick google shows this: http://webscraper.io/
 
Last edited:
Man of Honour
OP
Joined
13 Jul 2004
Posts
44,080
Location
/* */
yep, Python + selenium or BeautifulSoup to scrape, just save it to a notepad file as a CSV then open with excel and saves as xlsx. No need to worry about programming the excel side of it then.

A quick google shows this: http://webscraper.io/

It looks like they have protection built into the website to stop scraping. When I test this tool I get an "I am not a robot" form to fill out.

Edit: All the lawyer vCards are in the format "https://www.sav-fsa.ch/documents/anwaltssuche_vcard<firstname>_<lastname>_SAV<UID>.vcf"

I wonder if there is a way to access the "anwaltssuche_vcard" directory and pull all the .vcf files from there.
 
Last edited:
Soldato
Joined
18 Oct 2002
Posts
4,152
Location
West Lancashire
As the others have said, a few ways to do this - personally I'd go with Python + Beautifulsoup. Spending 5 mins looking at the source and you could easily walk through each canton, sending post requests to the search page to get the results. Each result in the table has the info you need for the vcard download. The row ID is the UID, first and last name are there as text.

If you want all the extra info you'll need to use Selenium to push the right buttons to trigger the JS.

It looks like they have protection built into the website to stop scraping. When I test this tool I get an "I am not a robot" form to fill out.

Make sure you've got nicely formatted user agent data to avoid the robot check.
 
Back
Top Bottom