web scraping - general advice, tips etc..

Dj_Jestar · 9 Apr 2017 at 14:23

Making use of someone elses content without their permission is breach of copyright, yes. Fact or not. (And a seriously pathetic point that was, too).

You've done your usual goldfish tactic of forgetting/ignoring what has already been said regarding search engines. Hey ho, anyone surprised? Anyone? No?

Now for some easily found facts:

https://www.out-law.com/en/articles...and-conditions-says-eu-court-in-ryanair-case/

Website operators can prohibit 'screen scraping' of unprotected data via terms and conditions, says EU court in Ryanair case

http://www.e-comlaw.com/e-commerce-...ate.asp?ID=1805&Search=Yes&txtsearch=scraping

UK: Screen scraping and web harvesting: the legal issues

An Irish court ruled last year that extracting data from a website can infringe the website owner's rights. The decision is a clear indication that the issue of web scraping is being taken seriously. Steven James, Associate at Latham & Watkins, discusses the lawfulness of web scraping and the legal issues surrounding it.

dowie · 9 Apr 2017 at 15:04

oh great another nonsense reply

you're of course missing the important part:

To gain access to the flight information, PR Aviation had to agree to Ryanair's terms and conditions which prohibited the use of an automated system or software to extract data from the website for commercial purposes, unless Ryanair consented to the activity.

The CJEU ruled that the flight data on Ryanair's website did not qualify for either database rights or copyright protection, upholding previous findings of a Dutch court.
[...]
Copyright law alone cannot offer protection to database creators where the database contains facts, as only the expression of facts and not the facts themselves can be copyrighted.

the issue here is you needed to agree to the terms and conditions on the Ryan Air flight before using it, this doesn't apply in my case, as is clear from your own link the data itself isn't copyright protected, it is simply factual information but don't let actual details get in the way of you cherry picking a case that doesn't really apply here. You've actually managed to completely misunderstand (or perhaps didn't bother reading) your own link as it doesn't support your assertion re: copyrighted material at all - the case is related to a breach of contract.

so again, as I've already stated previously - I'm not doing anything illegal and this doesn't concern copyright protected content just data/facts, if you've got nothing useful to add re: the actual thread topic/query other than ill informed legal opinion then please don't bother replying

peterwalkley · 9 Apr 2017 at 15:58

The copyright issue would be a lot easier to argue if you were to say what sites you are going to scrape, what data you want and what you are going to use it for.

What you have said so far is too vague to make a fair and impartial judgement.

Dj_Jestar · 9 Apr 2017 at 16:27

Given you are seeking means to avoid being "detected" it's pretty damn clear they don't want you scraping. Thus, you don't have permission, thus you don't have any legal grounds. The only person telling with nonsense is your self, @dowie.

dowie · 9 Apr 2017 at 16:32

Dj_Jestar said:
Given you are seeking means to avoid being "detected" it's pretty damn clear they don't want you scraping. Thus, you don't have permission, thus you don't have any legal grounds. The only person telling with nonsense is your self, @dowie.

You seem to have a chip on your shoulder regarding this, yes what you've posted previously is nonsense as already explained - your cited case concerned breach of contract not copyright issues. Now that has been shot down you've come up with a handwaving argument: f because I don't want to be detected it is illegal. You've got nothing helpful to add here and your previous argument was flawed your current argument doesn't even rely on facts and you seem to be pursuing it now out of frustration.

dowie · 9 Apr 2017 at 16:35

peterwalkley said:
The copyright issue would be a lot easier to argue if you were to say what sites you are going to scrape, what data you want and what you are going to use it for.

What you have said so far is too vague to make a fair and impartial judgement.

Well I'm not too interested in debating the copyright issue much other than pointing out that it is factual data and really not likely to be an issue. For some reason another poster has taken it upon himself to offer nothing useful by try to present some flawed arguments.

I'll give an unrelated example - suppose I was an electronics retailer competing with OCUK and I wanted to grab the prices of their latest graphics cards to keep an eye on the competitions - am I breaching copyright if, instead of browsing through the OCUK website and writing down the prices manually I instead scrape the pages and extract the prices automatically?

Dj_Jestar · 9 Apr 2017 at 16:49

Flawed? Ha, k. How about your lack of argument at all?

And yes, you would be in breach of copyright if you scraped OcUK's prices without their permission.

dowie · 9 Apr 2017 at 16:56

Dj_Jestar said:
Flawed? Ha, k. How about your lack of argument at all?

And yes, you would be in breach of copyright if you scraped OcUK's prices without their permission.

How? Did you even read your own link - see my previous post, in particular the bold part. You're replying with nonsense as a result of your own stupidity.

Dj_Jestar · 9 Apr 2017 at 17:14

Have you had permission from OcUK to scrape their site? No? Breach of copyright. Iirc full copyright ("all rights reserved") is declared by OcUK (and is assumed anywhere it isn't expressly stated as other wise, as per earlier which your sieve of a brain has "forgotten" already).

For somebody to browse a site is reasonable use. Automated scraping is not.

AHarvey · 9 Apr 2017 at 17:28

Can you have copyright over a price?

dowie · 9 Apr 2017 at 17:29

Dj_Jestar said:
Have you had permission from OcUK to scrape their site? No? Breach of copyright. Iirc full copyright ("all rights reserved") is declared by OcUK (and is assumed anywhere it isn't expressly stated as other wise, as per earlier which your sieve of a brain has "forgotten" already).

For somebody to browse a site is reasonable use. Automated scraping is not.

Jerez you're going round in circles, read the previous post as explained before. Factual information lile that isn't protected by copyright, it is there in your own linked court case, I even highlighted it in bold for you. You've made several nonseense posts in my thread now offering nothing helpful other than your own flawed opinions on what is or isn't legal.

dowie · 9 Apr 2017 at 17:30

AHarvey said:
Can you have copyright over a price?

Nope of course you can't, the other poster is clueless.

Dj_Jestar · 9 Apr 2017 at 18:05

You do have copyright over the content presenting the price

abuse of a service and/or content is the breach.

dowie · 9 Apr 2017 at 18:10

Dj_Jestar said:
You do have copyright over the content presenting the price abuse of a service and/or content is the breach.

More hand waving... you're even wilfully ignoring the case you cited - just scroll up, it is highlighted in bold in my other post... Either present something factual or stop disrupting my thread with your nonsense.

Dj_Jestar · 9 Apr 2017 at 19:18

Factual.. like both links pointing out scraping is illegal without permission. Kk.

andshrew · 9 Apr 2017 at 19:22

Your question is like asking how long is a piece of string.

How much traffic does the site process, does the owner have any reason to want to prevent people viewing thousands of pages per day? If the answer is millions and no, then as long as you're not flooding them with requests you're more than likely to go unnoticed. If they do care, or you suddenly account for 80% of their traffic, they're more than likely going to do something to prevent your access. You've already answered how you may get around this.

Have you approached the company to ask if they can provide the data you want on a regular basis, rather than just resorting to brute forcing it off their web site?

dowie · 9 Apr 2017 at 19:30

Dj_Jestar said:
Factual.. like both links pointing out scraping is illegal without permission. Kk.

see the previous post and try applying some basic reading comprehension, you've chucked in a few rude comments yourself about goldfish brain, brain live a sieve etc.. yet ironically you've already had it posted out to you where that Ryan Air comparison falls down - that was unrelated to copyright as has already been pointed out to you

if you don't have any factual, sensible comments to add then please don't carry on posting in this thread as you're just creating unnecessary noise

dowie · 9 Apr 2017 at 19:34

andshrew said:
Your question is like asking how long is a piece of string.

How much traffic does the site process, does the owner have any reason to want to prevent people viewing thousands of pages per day? If the answer is millions and no, then as long as you're not flooding them with requests you're more than likely to go unnoticed. If they do care, or you suddenly account for 80% of their traffic, they're more than likely going to do something to prevent your access. You've already answered how you may get around this.

thanks for the reply, definitely a high traffic website I don't think scraping this data from their website will involve flooding them with requests or harming their service in anyway

Have you approached the company to ask if they can provide the data you want on a regular basis, rather than just resorting to brute forcing it off their web site?

Absolutely not. What I'd use it for has no impact on their business but also it has value if used in a certain way so isn't something I'd want known ergo why I'm not really able to give full details on here

Dj_Jestar · 9 Apr 2017 at 20:32

dowie said:
see the previous post and try applying some basic reading comprehension, you've chucked in a few rude comments yourself about goldfish brain, brain live a sieve etc.. yet ironically you've already had it posted out to you where that Ryan Air comparison falls down - that was unrelated to copyright as has already been pointed out to you

if you don't have any factual, sensible comments to add then please don't carry on posting in this thread as you're just creating unnecessary noise

reading comprehension is something you lack. Not I. So why do you want to mask your scraping, if you don't think you have any reason to hide it? Do carry on pretending you think there is nothing wrong, by all means, but this proves you know you have no permission to do it.

The funny thing about scraping that you are woefully trying to dodge (though round of applause for not tripping yourself up): you aren't scraping just the "data/facts" (still a pathetically weak point). By fact of scraping a site you are scraping the entirety of the site's content. Markup etc. that are all proprietary. Copyright infractions ahoy.

All that still adjacent to the point that you need permission to use site like this anyway, if you want to be free of copyright infringement.

Of course you could just ask them, maybe even broker a deal for the data so you don't need to scrape at all. But hey, why do that when you can do it illegally for free?

dowie · 9 Apr 2017 at 20:47

Dj_Jestar said:
reading comprehension is something you lack. Not I. So why do you want to mask your scraping, if you don't think you have any reason to hide it? Do carry on pretending you think there is nothing wrong, by all means, but this proves you know you have no permission to do it.

The funny thing about scraping that you are woefully trying to dodge (though round of applause for not tripping yourself up): you aren't scraping just the "data/facts" (still a pathetically weak point). By fact of scraping a site you are scraping the entirety of the site's content. Markup etc. that are all proprietary. Copyright infractions ahoy.

All that still adjacent to the point that you need permission to use site like this anyway, if you want to be free of copyright infringement.

Of course you could just ask them, maybe even broker a deal for the data so you don't need to scrape at all. But hey, why do that when you can do it illegally for free?

from your own link:

The CJEU ruled that the flight data on Ryanair's website did not qualify for either database rights or copyright protection, upholding previous findings of a Dutch court.

Is that so hard to understand?

Now can you please stop posting off topic drivel in my thread - I'm asking about web scraping, I'm not asking about doing anything illegal nor breaking any copyright laws. If you've got something constructive to add re: web scraping then please do contribute - if you're going to carry on trying to flog a dead horse re: copyright infringement then please go and start your own thread - I'd also suggest that if you do so then pay a bit more attention to the link that you yourself posted as it doesn't back your position.