Soldato
- Joined
- 27 Dec 2005
- Posts
- 17,315
- Location
- Bristol
There's a directory website with publicly accessible information that we just need in an excel sheet at our end.
An external site lists these with consecutive IDs which then link to the entry's individual page, of which the ID seems to have no obvious order to it. Each individual page doesn't have a huge amount of information on, and each piece of info is labelled via a div or li. For example the address is listed in li's with the end of the id="Town", "Country", "Postcode" etc.
I'm only really familiar with PHP but the script would effectively need to:
- Start at www.directory.com/0001 and go to 3000
- For each, find the link that starts "www.directory2.com" (only one link per page) and go there
- Dump the contents of id="Town" etc into a database
Is this possible? I know PHP may not be clean or neat but for a one-off thing I'm not fussed about that. Hell it doesn't even need to dump it to a database, just echo it with commas for saving as a comma-delimited CSV.
An external site lists these with consecutive IDs which then link to the entry's individual page, of which the ID seems to have no obvious order to it. Each individual page doesn't have a huge amount of information on, and each piece of info is labelled via a div or li. For example the address is listed in li's with the end of the id="Town", "Country", "Postcode" etc.
I'm only really familiar with PHP but the script would effectively need to:
- Start at www.directory.com/0001 and go to 3000
- For each, find the link that starts "www.directory2.com" (only one link per page) and go there
- Dump the contents of id="Town" etc into a database
Is this possible? I know PHP may not be clean or neat but for a one-off thing I'm not fussed about that. Hell it doesn't even need to dump it to a database, just echo it with commas for saving as a comma-delimited CSV.
Last edited: