/u/spez finds out

AineLasagna@lemmy.blahaj.zone · 1 year ago

/u/spez finds out

sealneaward@lemmy.ml · 1 year ago

Creating a web scraper vs actually maintaining one that is effective and works is two different things. It’s very easy to fight web scraping if you know what you are doing.

argv_minus_one@beehaw.org · edit-2 1 year ago

Right, but these are big companies with lots of talented programmers on hand. If anyone can overcome such an obstacle, it’s them.

Also, Google and Microsoft already have a search index full of Reddit content to scrape.

sealneaward@lemmy.ml · 1 year ago

You are right. You would need a team of skilled scrapers and network engineers though would know how to get around rate limiters with some kind of external load balancer or something along those lines.

MrPoopyButthole@lemmy.world · 1 year ago

Rate limiters work on IP source. This is easily bypassed with a rotating proxy. There are even SaaS that offer this. The trick is to not use large subnets that can be easily blocked. You have to use a lot of random /32 IPs to be effective.