Spyke

They're already ignoring robots.txt, so I'm not sure why anyone would think they won't just ignore this too. All they have to do is get a new IP and change their useragent.

50

I have an idea. Why don’t I put a bunch of my website stuff in one place, say a pdf, and you screw heads just buy that? We’ll call it a “book”

25
lemmings.world

Put a page on your website saying that scrapping your website costs [insert amount] and block the bots otherwise.

22
melroyreply
kbin.melroy.org

Also you don't want to block legit search engines that are not scraping your data for AI.

5
sh.itjust.works

Again: hard to differentiate all those different bots, because you have to trust that they are what they say they are, and they often are not

7

It certainly can be a cat and mouse game, but scraping at scale tends to be ahead of the curve of the security teams. Some examples:

https://brightdata.com/

https://oxylabs.io/

Preventing access by requiring an account, with strict access rules can curb the vast majority of scraping, then your only bad actors are the rich venture capitalists.

4

As someone who uses invidious daily I've always been of the belief if you don't want something scraped, then maybe don't upload it to a public web page/server.

19

There's probably not many people here who understand the connection between Invidious and scraping.

5
sh.itjust.works

Imagine a company that sells a lot of products online. Now imagine a scraping bot coming at peak sales hours and looking at each product list and page separately for said service. Now realise that some genuine users will have a worse buying experience because of that.

0

Yeah there's way easier ways to combat that without trying to prevent scraping.

Maybe don't ship 20 units to the same address.

1

You reached the end

Cloudflare plans marketplace to sell permission to scrape websites | Spyke