Crawling and scraping web data: what is prohibited “commercial use”?

For commercial entities engaged in web crawling to harvest data, there are a wide range of legal issues to consider when assessing the risk associated with such activities. From an empirical point of view, corporate lawyers seek the support of outside lawyers to solve these problems. The risk profile may vary somewhat from jurisdiction to jurisdiction. In the UK alone, in-house lawyers contemplating these risks may find themselves faced with (at least) potential infringement of copyright and database rights, data privacy, possible offenses under the Computer Misuse Act 1990 and compliance with the Website Terms of Use. It’s the last of these (i.e. website terms of service) where a particularly vexing issue has gained traction recently, particularly – but not exclusively – among commercial entities that seek to harvest input training data for artificial intelligence systems.

Website operators frequently include provisions in their terms of use limiting the uses that may be made of their website content. Sometimes these restrictions will be specific in that they refer to a particular act that is prohibited – for example, a prohibition on making copies of content to feed a database. Very often, however, they are formulated in general terms and anchored on the use that may or may not be made of the content. A very common prohibition is to use the content of a website for commercial purposes (i.e. a general ban on commercial use). For commercial entities that deploy web crawlers and scrapers to collect data, these terms can be problematic, even assuming that their data collection methods otherwise comply with all technical protocols/guidelines present on the web server regarding what can or cannot be crawled/scraped. .

An important but sometimes overlooked question is whether the terms of use of the website, including a prohibition on commercial use, are likely to be contractual. If the site asks the user to positively confirm the acceptance of the conditions of use, for example by checking a box mentioning “I accept” and provided that the other elements of formation of the contract (counterparty, intention to create legal relations, etc.) are established, then the conditions must be contractual. If there is no such requirement, it is likely that the operator of the website will seek to rely on a “browsewrap” agreement (where the terms of use generally state that use of the website constitutes an acceptance of these conditions). However, unless the user receives actual or implied notice of the terms of use of the navigation wrapper, it is debatable whether a binding contract exists with the user – the owner of the site. Web should provide compelling evidence to demonstrate that this notification requirement has been met. .

Assuming that the terms of use are contractual, the question arises: what is “commercial use”? In the UK at least, there is no accepted definition of what constitutes ‘commercial use’, neither in legislation nor in case law. It is of course possible that the website operator includes in its terms of use a tailor-made definition of “commercial use”. It could also be possible to deduce certain uses which are not intended to be covered by a statement generally prohibiting commercial use. For example, the Creative Commons “Attribution – Non-Commercial 4.0 International” license terms include a definition of “non-commercial” such as “not primarily intended or directed toward commercial advantage or monetary compensation”. (see https://creativecommons.org/licenses/by-nc/4.0/legalcode). The converse conclusion is that “commercial use” in this context would be anything primarily intended or directed toward commercial advantage or monetary compensation.

If there is no definition of commercial use – which in fact is often the case – things get complicated. There are more questions than answers, which essentially means that there is greater legal uncertainty and much more difficult to profile likely risk. Use by an individual in a domestic setting or use by a public body for non-profit purposes might be less likely to be considered “commercial use” by a website operator. However, should activities that merely relate to a commercial entity but have no tangible financial benefit (i.e. revenue and profit) be considered “commercial use”? What if these activities produce an indirect financial gain or a commercial advantage that cannot be reduced to a monetary value? To what extent should “commercial use” be interpreted? In the majority of use cases, the only time these questions would be definitively answered is by a judge once the case has been argued and the risk has already materialized.

This sometimes makes those crawling and collecting data feel like they’re taking a step back, mitigating risk as much as possible – for example, by avoiding scraping websites that require positive acceptance or terms of ” click” – and hoping that all residual legal risk will not lead to execution by the website operator. In a world where data is increasingly recognized and realized as a valuable asset, the likelihood of enforcement action remains a significant possibility.

…should activities that merely relate to a commercial entity but have no tangible financial benefit (i.e. revenue and profits) be considered “commercial use”? What if these activities produce an indirect financial gain or a commercial advantage that cannot be reduced to a monetary value? To what extent should “commercial use” be interpreted?

chrome://newtab/

Rosemary S. Bishop