Behavioral Analytics
Once you join the Distil network, our platform starts analyzing and understanding your website’s traffic pattern, learning your site’s unique user behavior heuristics to detect any anomalies from the norm. We know what pages are most frequented, what geographic regions your traffic comes from, and what time of day your website is busiest. Aggregating that information, we are able to statistically identify bot behavior, and stop them in their tracks.
HTTP Stream Manipulation
The Distil edge nodes inject subtle traps and other elements into every page served. While these elements are completely transparent to human visitors and search engines, bots blindly interact with these elements allowing for easy detection.
JavaScript Based Testing
Our system executes complex Javascript-based tests to verify that the HTTP client connect to your site is running an actual web browser. These tests are designed to load after all other JavaScript elements and won’t interfere with the loading of your native JavaScript code. All tests are done asynchronously and transparently to ensure page performance is not impacted in any way.
Rate Limiting
Rate limiting is one of the best ways to prevent scrapers from stealing your content. All of your users or clients are going to access your site at page browsing rates unique to you and you only. Botters would have no way of knowing what that statistical breakdown is, and couldn’t hope to mimic it. As a result, we’ve observed that almost all botters become very apparent when compared to the browsing rates of average users to a specific website.
Distil actively monitors all connections and sessions to your site and can instantly act when abusers exceed thresholds you’ve set. Our Rate Limiting breaks down under the following three metrics:
Pages Per Minute – How many pages per minute can a normal user traverse through your site? Most all bots will blow through nearly any PPM setting in a matter of seconds or minutes.
Pages Per Session – If a script is slowly crawling your data, it would be almost entirely futile to try and spot it as abnormal behavior by looking at access logs. We keep track of how many pages per session users tend to traverse so you can make intelligent decisions on what constitutes excessive use.
Maximum Session Length – Track the average session length maintained by a majority of your users and then act on that data by defining what the acceptable maximum session lengths should be before anti-scraping action is triggered.