The Deep Web, the Dark Web, and Memex

February 12, 2015

internet webThe Deep Web and Dark Web garnered global attention again this week, with DARPA announcing Memex. “Memex seeks to develop the next generation of search technologies and revolutionize the discovery, organization and presentation of search results.” In sum, Memex is a search technology that searches the Deep Web and Dark Web, as well as provides a variety of analyses among and between search results.

As discussed in the first part of this blog post, the Deep Web is: “[t]he portion of the Web that is not theoretically indexable through the use of “spidering” technology, because other Web pages do not link to it.”  The Deep Web includes academic library databases, proprietary databases, results of database queries, and form- and/or password-controlled access databases. Thomson Reuters’s Westlaw is an example of data that is available on the Deep Web – that is, reachable via a common web browser, but additionally requiring password-only, search query access for most of its core content.

The Dark Web is generally defined as a small subset of the Deep Web: like the Deep Web it can be only accessed using specialized tools or interfaces. Unlike the rest of the Deep Web, the Dark Web is not accessible via a common Surface Web web browser such as Chrome or Internet Explorer, but must be accessed using a specialized anonymized browser such as Tor. Using Tor (the only Dark Web browser used by the author in preparation of this post), the “hidden services” at assigned .onion domains within the Dark Web can be accessed without disclosing a user’s IP address, thus concealing a user’s network identity and location.

It is well-publicized that criminal elements use Tor; however, lesser-known is that a “branch of the U.S. Navy uses Tor for open source intelligence gathering” and other uses, and “law enforcement uses Tor for visiting or surveilling web sites without leaving government IP addresses in their web logs, and for security during sting operations.”In addition to the criminal and military/law enforcement uses of Tor, the users within that community include business executives, journalists and their audiences, activists and whistleblowers, IT professionals, bloggers, and “normal people”. All of these groups undoubtedly do more while in Tor than work, or buy and sell illegal goods and services. Many “normal” Internet users are simply seeking additional anonymity to escape the privacy-invasive commercial browsers and search engines, and Tor accordingly has a subset of the surface web users who may be attracted to conducting increasing amounts of their Surface Web activities while within the Tor environment.

There is a massive amount of data in all forms (e.g. text, videos, music, photographs, etc.) on the Dark Web that is not, due to anonymity demands, generally available on the “Surface Web” or even the upper level of the Deep Web. For example, the Dark Web includes the following:

bank account details, passwords, so-called “personally identifiable information”(PII) such as social security numbers, customer credit card details, patents, blueprints and other trade secrets, are all for sale. The Dark Web is used extensively by hackers, who break into company networks and search for this kind of potentially valuable information. . .extract the data, break it up into smaller bundles, and sell it to black market [Deep Web] vendors.

In addition, the Dark Web holds a significant amount of information that is relevant to business intelligence (including trade secrets), trademark, copyrights, and patents. This content is discussed in more detail in the 2015 edition of my treatise, Intellectual Property Due Diligence in Corporate Transactions.

My next blog will outline some of the more pertinent trademark issues arising out of use in the Deep Web and Dark Web, particularly in light of Memex.