Googlearchy and the Dawn of Web Time

Summarized here are the main organizations and ideas that have shaped the WWW from my perspective of a 4 decade Internet user. Included is a simple, effective technique for better web searching, dubbed the “Controversy Explorer”.

Defining Moments: Googlearchy and the Dawn of Web Time

“Googlearchy” is a coined word expressing the powers of a search service to dominate public activities such as political thinking. The term generally invokes questioning about how people use, and are used by, search services under the business model called “Surveillance capitalism”.

“The “Dawn of Web Time” is that point, circa 1993, where the general public became aware of, and drawn to, computer-based networks. These networks evolved from governmental responses produced soon after the age of Sputnik. Then came a tipping point where networked life changed business, politics, education, entertainment, and personal lives. It’s ironic that the modern, massive index of web pages complicates understanding of people, events, and concept sin the ancient history of the 1950s into the 1990s.

Before The Web Dawn of Time

Arpanet to NSF Net

First, let’s back up to understand how the Internet fell into our hands. That year, roughly 1993, the Arpanet had morphed through Internet into NSF net and became open to commercial traffic.

It began with the early days of quiet development at a few universities and military contractors, blatant commercial use had been prohibited. Research by Supercomputer Centers was an established goal based around development of national computing capabilities and large scale experiments, funded by the National Science Foundation and other government science agencies. Human-computer collaboration slowly evolved as a research field and as motivation for government funded networks. The 1991 “High Performance Computing” bill promoted by Senator Al Gore opened the Internet to commercial domain holders under auspices of the Network Division of the National Science Foundation which had been supervising the bundled Arpanet and NSF Net, separated from the military MILNet. One more acronym in the mix was NREN for National Research and Education Network, an umbrella for advancing networks in schools, libraries, along the “Information Superhighway”.

Remembering the ISP industry

Those not sufficiently affiliated with a university or research center in the decade before the Dawn had other choices for email and file transfer through the ISP (Internet Service Provider) industry. In those days people shelled out on the order of $20 a month to send their email through the network of networks.

Another group of services sprouted up as CompuServe, Prodigy, and American Online then offering dial up services with email and forums. Who of us with a PC at that period can forget the regular arrival of shiny CDs with their America Online software and signup?

Usenet was the early Social Media

Yet another network was Usenet, popular throughout the 1980s for its massive hierarchical organization of newsgroups. Since group postings were transferred in bulk through a distributed partnership of UNIX servers, access was provided by employers or individual ISP accounts.

Usenet user interfaces often resembled email managers where subscribed groups and threaded posting were read online after download from a server. Usenet was not funded by the USG nor industry but rather organized and managed by individuals sprouting from a North Carolina graduate student project.

So called news groups covered an enormous range of topics under top levels such as “soc.”, “comp.”,and “alt.*”, the latter including binaries and possibly unsavory topics.

Lessons from pre-Dawn Social Media

Usenet and the AOL type services were the social media of the day. Groups announced products and conferences, discussed issues and people, and consumed hours of professional and personal time. Group rituals and communication habits formed in the spirit of personal accountability and democratic management. Profits were minimal compared to today.

“Privacy”, what is that? Usenet newsgroups were mostly open, sometimes moderated. Individuals managed release of their own information by password and personal reputation. Advertising was minimal to nonexistent although pressure to sign up for services was steady in the frequent arrival of those shiny CD bundles.

Usenet news groups were at one point archived by Deja News, gobbled by Google, and now hard to find. A great trove of computing history is buried in the rubble of those forums. , People talked without friending, or profiling, or even posting pictures.

Flame wars had a locale in which to gather force and annoy readers. The famous “Godwin’s Law” observed that any discussion would eventually garner heated opinions and result in name calling of a Nazi or Hitler, thereby losing the argument. Observations of issue-based discussions showed that threads often wandered through opinions then stopped when someone posted a factual response that took the fun out of the debate. Gender differences became apparent when a woman’s opinion would be countered as simply not how it was, expressed at a 15 to 1 male:female rate, until she was drowned out.

Thank You, Sir Berners-Lee

To summarize, the networked world left its grant funded high-minded academic routes and entered a brief period of out of pocket paying customer bases. A parallel path was the invention of the WWW lingering until 1993 when a Supercomputer Center project designed the Mosaic browser that became the first major Internet company product through Netscape.That interface popularized an European physicist’s dream of inter-connected files around the globe.

Before the WWW was real hyper “text”

While the WWW implemented a version of hypertext, the concept itself had been around long enough to gain both academic and industrial innovators. An early, 1945, suggestion of personal information management and associated professions appeared in Vannevar Bush’s “As We May Think” article.

Notably, structured documents required models such as “mapping hypertext”. Types of links, trails, maps, and icons. Architectural principles provided the bases for organizing website’s as they came online. Links held semantic significance which abhorred “click here” annotation.

The Web Is a Big Place. We need Search

Pioneering Search Engines

The WWW — then riding across the Internet — became the natural place for companies gaining domains to create website’s. Soon a few million locations were vying for public attention. Some sites had actual informative content to read beyond company brochures. Thus arose search engines notably Lycos and directories Yahoo and Excite.

The first well functioning search engine was Altavista from the decaying Digital Equipment Corporation (R.I.P), soon to be taken over by Compaq. Intermediate search services such as Gopher and WAIS grappled with organizing files with metadata to become amenable to browsing and search. Search engines like Lycos and Altavista took the opposite approach of sending “spiders” to crawl web links to retrieve pages for indexed searched on their own servers.

The mid 1990s saw a period of innovation in search. Other engines were All-The-Web (Fast Search from Norway), Teoma, Info Seek, Northern Light, Overture. Meta searchers, such as DogPile, gained traction by combining search results from multiple engines. A Search Engine Shootout measured the quality and quantity of results among competing engines. Competition steadily improved search competence. Targeted advertising was nonexistent-.

Professional librarians held positions of respect in judging the quality of the web as a research tool. Small company products experimented with managing search results as objects, e.g. merging searches and downloading pages automatically. Clustering algorithms came into use as search results required disambiguation (is ‘battery’ a place or a thing?).

Enter the Great God Google

Whether from exhaustion of search engine choices or simple brute force, Google and its page rank and stream-lined portal-free interface raked away the competitors in the late 1990s. And so went many competition, innovation, and research qualification activities. We have a winner, the Great God Google! All hail…

Raking in Markets

But what was the business model? Advertisements were naturally displayed near products. Clicks became the currencies of search. Now the story gets interesting as the money flowed. Could the click economy and the power of links change people and their searches? Would all those ‘href=”http-to-someplace’ HTML elements structure the search economy?

Ruling the Web

One question was whether the search strategy of ranking by link-popularity offered fair and stable results. No, answered “Googlearchy”, because highly ranked pages gathered more visitors then more links then more popularity. This was not unhealthy as long as less popular sites could still gain attention by appearing within a page or two of the top. And queries on esoteric topics might surface the few sites that mattered, albeit with few links.

When the topics were divisive, however, a whole point of view might be lost in a deluge of links that favored a popular position. Fast forward to the 2016 election and the proliferation of “fake news”. Targeted advertising itself fueled an industry dedicated to degrading search results.

Organizational and Analytical Web Layers

Another perspective was that the web was evolving into an Organizational layer overlaying an Analytic substrate. Businesses and associations ranked high because they linked to each other, lots and lots of links. Papers or confrontational website’s gained few links and slid down the pages of search results. This was like entering a library where the front rooms held tightly laced corporate brochures, then into news morgues, and eventually to randomly scattered opinion pieces, with a side room of pay walled off technical papers.

Controversies were hard to find unless a searcher knew they existed so the web became superficial as a research tool. Real search skills were required, such as using logic and specialized vocabulary in queries.

One approach was to simply ask for pages that used works like dispute or controversy or argument or evidence since Organizational Web Pages rarely held discussions or informational content. That is, searches needed to slice through the Organizational Web into the Analytic Web to access research results or deep skepticism.

Swallowed by the Filter Bubble

Another condition appeared called the “filter bubble”. As a person’s searches could be tracked, then combined with other personal data, it became possible to consider that searches were not objective. One person’s search could be so tailored that they might gain a totally different world view from another person searching on the same term. And searches might be tailored for advertising potential leading to differential pricing. And a person actually seeking to practice critical thinking might find it increasingly difficult to even find pages with views contrary to their expressed personal preferences. FaceBook’s mis-named “news feed” blended personal tidbits with mainstream media that attracted pure rumor mongering beyond the commercial motives of its founders.

Wait a Minute! What is the Product? Oh, it is Me!

And now the business model is clear, “you are not paying for the product so you are the product”. Advertisers pay the billions a few cents at a time when a person is entice to click. And personalization and limited perspectives deliver people to advertisers. And, there is no easy way for a person to pay for solid media and privacy even if they want to.

Monopolies must maintain themselves. Search companies need to grow to deliver more value to their owners. So there must be more ways, like free email and picture space, to get additional personal information. The “attention merchants” must gobble up an hour a day from their client/user/slaves to hold on.

With more people coming online who never knew the world before the Dawn, easily used entry software gained a foothold. First, Geocities for web pages, MySpace for personal networking, WordPress for blogs, then Facebook’s ‘social graph’, and picture sharing, and later Twitter micro-blogging. And it’s ALL FREE! Why would anyone pay for using the Internet? the Web? Google? FaceBook?

Assessing the Current Web

Has this business model contributed to degraded or untrustworthy access to web information? You judge!

Research experiments can reveal certain trends. Professional intuition can suggest training and skills for quality searching. Curators can carefully select sites and pages. But this is a difficult area for research with billions of pages requiring powerful learning algorithms.

Even great research may have difficulty influencing search services bound by shareholders. Currently, the question of web search quality is secondary, if not meaningless. A better engine entering the market faces competition with Yahoo, Microsoft, and Google’s billions of $$$ gained from advertising revenues and increasingly diversified investment. Small products that present different web organizational models can exist only as personal experiments. The competitive analyzers and librarian style information questioners have closed up shop. As one example of a great loss, consider clustering that uses links and vocabulary to divide lists of results into cognitively rational, if occasionally wrong, categories. surely, a web search on a person’s name, given other collected search information and tracked data,a Google search should be able to distinguish among a computer scientist, a rectal surgeon, a marine biologist, an urban planner, a tech writer, a software engineer, and deceased bearers of that same name. Not in my case! What a waste!

Ease of Use

Believe it or not, not everybody reading this page can see it in the traditional sense of eyesight. Others see wrongly or with difficulty and process information laboriously. Some cannot manipulate the devices to scroll the page content or even move focus to content. Just about everybody curses the clutter of ‘like buttons’ and navigational links outside the use case. Yet Web practices called ‘accessibility’ make pages interact effectively with ‘assistive technology’ that offers text to speech for page elements and content, among other capabilities.

WCAG standards from embody a ‘science of accessibility’ and codes of practice. If designed according to standards, 20% of the population that experience the above difficulties are fully enabled. And since the other 80% are TABs, for Temporarily Able Bodied, every web user is affected.

Might there be a flaw in the Web architecture that values visual design over content presentation? One chapter of “A Chip On Her Shoulder” suggests that a combination of ageism, over-design, poor process, and excess dependence could leave us vulnerable if too many bad things happen at once? Could the Web collapse? Could even the Internet degrade? What would we do then?

Globalism means different rules, and rulers

Is the web now Balkanized? How often do we Americans read pages from other English speaking countries? How do we cross language boundaries? Yes, there is translation but are we all using the same underlying search bases? How has this biased our world views?

When did privacy concerns arise? Fairly early on, it turns out, in a prescient 1970 Rand Corporation advisory group report that laid out principles of data ownership and sharing. How quaint!!

Finally, there are emerging differences in laws and social norms that illustrate the biases of our business models, e.g. the EU ‘right to be forgotten’ movement.

A Mini-manifesto

Thinking in terms of Googlearchy and the Web Time suggests viewpoints independent from the search and social media oligarchies.

  1. Anybody who ignores events before 1993 is missing a big chunk of useful information. Ignorance of that break point the Dawn of the Web signals literacy problems. Unfortunately reference materials are less available than and outranked by pages after the Dawn.
  2. The web is a big place that requires serious research to understand its characteristics. We should be continuously questioning the information, of course, but also the services that provide that information.
  3. Those characteristics play major roles in how web users conduct their lives. We are more than mental cyborgs using the Web as assistive technology. When our social and emotional lives are attached to services, we become vulnerable to exploitation with high penalties for our mistakes.
  4. The business model of social media companies has tipped the scales of personal data ownership with consequences we are just beginning to understand. Our ” novel “A Chip On Her Shoulder” describes ‘packets of data formerly known as people’. The economic model is exchange of our packets to be monetized allowing advertisers to find us and pinpoint our preferences.
  5. Technology is ultra powerful and that power is compounding according to Moore’s Law. Forces in our society enlist us to avoid wasting those increasingly cheap computing which leads into situations unprecedented in our thinking. Consider the saying that “Good judgement comes from experience, which itself comes from bad judgement.”. We learn from our mistakes, but when technology accelerates we cannot learn fast enough to overcome their consequences.
  6. Privacy is one of those least understood properties of modern life. It’s really complicated. Privacy is an orphaned academic subject, a bit philosophical, a bane to business, a sidelined for engineering ethics, a weapon of war. Only that one moment of embarrassment, or intrusion, or exploitation teaches us practical privacy. The “You are the product” business model is a tipping point for humanity.We were seduced by “free” and lost our ability to transact for services, as in the Usenet days.
  7. ‘Computational thinking’ is an under-valued tool for managing human interactions with technology. Our packets of data are subject to duplication, replication, and algorithmic processing. Computer scientists learn these techniques. Software engineers acquire the skills for performing the functions efficiently. Security engineers defend our packets according to privacy policies. Privacy engineering is a nascent field both socially and technically. The general public should be educated to understand algorithms and data as real forces in their lives.
  8. Just as we stepped into the Faustian bargain of “free” content and services, we are ignoring the next wave of not necessarily needed web devices, the Internet of Things. Attaching a device, such as a camera, to the Internet without a UL (Underwriters Laboratory) level of security assurance is just stupid. But that’s “stupid” at a society level, which requires regulations and respect for our Internet commons.

As you read this history, ask when it became evident that privacy was a significant factor in the progress of the Arpanet, Internet, WWW, and social media. And when did you first grapple with managing your own packets of data beyond employing the ubiquitous password mechanism?

Our novel, “A Chip On Her Shoulder”, takes diverse characters through reasoning about privacy, flavored by the above history.

Rethinking Search: the Experimental “Controversy Explorer”

Attached is a page you can download to your own computer to shake up your thinking about web searches. The idea is simple: add a vocabulary of terms to your normal search queries. Of course, you can type in these more complex queries, but the process is easily automated with a search primer. The magical effect is to drive searches through the glossy inter-linked Organizational Web into the distributed longer form Analytic Web.

This scheme can be applied to many topics. Our ControversyExplorer addresses issues raised in our 2004 experimental report asking “Do Search Engines Suppress Controversy?”. Adding synonyms for “controversy” and seeking “forms of support” generates a different space of searches. Try it by loading the HTML text. Modify as you wish. Search better, deeper, and darker. Maybe this is a partial answer to fake news, information overload, attention wasting, advertising gluttony, and stale searching.

…More to come on fake news, cyber-democracy, …


Getting the file:

Copy the following page from a DropBox onto your own computer then change the extension from TXT to HTML:

TheControversy Explorer

