.

I don't know about premade tools, but if I were doing it myself, I'd use Python, Selenium to scrape and the pyexcel package to generate the excel sheet (or insert into a current one).
 
yep, Python + selenium or BeautifulSoup to scrape, just save it to a notepad file as a CSV then open with excel and saves as xlsx. No need to worry about programming the excel side of it then.

A quick google shows this: http://webscraper.io/
 
Last edited:
yep, Python + selenium or BeautifulSoup to scrape, just save it to a notepad file as a CSV then open with excel and saves as xlsx. No need to worry about programming the excel side of it then.

A quick google shows this: http://webscraper.io/

It looks like they have protection built into the website to stop scraping. When I test this tool I get an "I am not a robot" form to fill out.

Edit: All the lawyer vCards are in the format "https://www.sav-fsa.ch/documents/anwaltssuche_vcard<firstname>_<lastname>_SAV<UID>.vcf"

I wonder if there is a way to access the "anwaltssuche_vcard" directory and pull all the .vcf files from there.
 
Last edited:
As the others have said, a few ways to do this - personally I'd go with Python + Beautifulsoup. Spending 5 mins looking at the source and you could easily walk through each canton, sending post requests to the search page to get the results. Each result in the table has the info you need for the vcard download. The row ID is the UID, first and last name are there as text.

If you want all the extra info you'll need to use Selenium to push the right buttons to trigger the JS.

It looks like they have protection built into the website to stop scraping. When I test this tool I get an "I am not a robot" form to fill out.

Make sure you've got nicely formatted user agent data to avoid the robot check.
 
Back
Top Bottom