Is there any common structure to the data on the pages you are wanting to scrape, and where to you intend to deposit (and in what format) the data once you have it?
E.g:
- "I want to scrape all headings", or "I want to scrape a paragraph below a particular heading". These are easy if they are consistent. If the question is more.. "I want to run some kind of logic to understand which part of the page to scrape, then I need to format it based on some rules", then it's more difficult.
You want to run it once per day - do you have your own server, or are you leaving a cron job (or similar) running locally?
For the amount it would cost (peanuts), i'd recommend running a aws lambda function daily. This can be written in Node.js, Java, C# or Python. If it was me, as I work mainly with Java, i'd do it in Java... but all of those languages can easily request web pages and parse them. Using Java, i'd recommend JSoup. I don't know about the others. From there it's easy to store in S3, or in a database.
Regarding the legality, i'd be very careful. I suspected it may not be legal (I'm no legal expert), but some quick googling comes up with things like:
"Most courts are in agreement that web scraping is unlawful, but that does not mean that all web scrapers are identified and punished."
"In 2001, the legality question was brought up again when a travel agency sued another company for scraping information. The rival company made use of the pricing information taken, without consent and undercut the competition – and resulted in less customers and income for that agency. This brought to attention the importance of authorized and unauthorized access of information on websites and how to ensure that no unauthorized users could scrape information."
https://www.scrapesentry.com/scraping-wiki/web-scraping-legal-or-illegal/
So yes, I would imagine it is illegal to scrape OcUKs prices and then use them to undercut them - but i'm not an expert in the matter. I just know it seems to be a contested subject.