Distil Networks

Scraping Just Got a lot More Dangerous

scraping-image_blog

News organizations have been fighting to stay alive for years, now they have a paved road to profitability. A Federal Court just severely restricted fair use by upholding the NY Times and The AP’s claim to copyright against Meltwater, a web scraping service, that finds mentions of clients in the news. This ruling means that news organizations, and any content producer, can monetize the use of their content to 3rd parties who previously have been freely scraping content. This isn’t just about syndication of your content anymore; there are a thousand other ways that companies profit off of the content that a publisher creates and now it is their right to share in those profits.

Why is this important? For years everyone on the internet have been under the assumption that when something is posted online, it’s free and fair to use. That meant that despite all the hard work and effort that went into writing an online article, nobody respected the value of that particular article- until now. The court ruled that a web scraper that is monetizing off of someone else’s content is not entitled to fair use and is in essence “stealing”.

Wait. Isn’t Google a web scraper? Well, yes. But the difference is for a search of “The New York Times”, 56% of people see that an exert on Google clicked through, as opposed to .08% for Meltwater because Google has established a reputable reputation for correctly giving credit to articles and web content verses a lesser known site. That is the distinction that separates theft from search engines. It is a slightly blurry line but I believe it will become clearer as more organizations start enforcing their rights.

So moving forward, any online publisher can and should:

  1. Monitor their site for content scrapers by either examining their log files manually or using Distil Networks in monitor only mode
  2. Go after any infringing scrapers to protect their copyright.
  3. Set up a monetization policy and perhaps build an API to sell access to their content to scrapers that need to have continued access to this data.

Ruling
http://www.scribd.com/doc/131847330/Meltwater-AP-Ruling

Reference Article
http://venturebeat.com/2013/03/24/why-scraping-online-news-stories-could-land-you-in-hot-water/

Take Control of Your Website

Up to 60% of your website traffic could be bots! These non-human visitors are automated attacks responsible for fraud, data theft, and slowing down your website performance.

Sign Up For Your Free Trial Today