Google’s insanely great data kingdom

Photo of part of the physical infrastructure for Google's data system.
Photo: Google/Connie Zhou

Steven Levy wrote the book on Google (In The Plex: How Google Thinks, Works, and Shapes Our Lives); now Google’s let him into its formerly top secret data center in Lenoir, North Carolina. The massive data infrastructure is a wonder to behold. [Link]

This is what makes Google Google: its physical network, its thousands of fiber miles, and those many thousands of servers that, in aggregate, add up to the mother of all clouds. This multibillion-dollar infrastructure allows the company to index 20 billion web pages a day. To handle more than 3 billion daily search queries. To conduct millions of ad auctions in real time. To offer free email storage to 425 million Gmail users. To zip millions of YouTube videos to users every day. To deliver search results before the user has finished typing the query. In the near future, when Google releases the wearable computing platform called Glass, this infrastructure will power its visual search results.

Trust, reputation, collaborative consumption and service networking

This TED talk by Rachel Botsman describes the online evolution of trust and reputation that’s feeding into new ways of doing business, “collaborative consumption” (AirBNB) and “service networking” (TaskRabbit). More people doing business directly with other people via virtual mediation. Via this trend, people are learning to be more trusting, and with more trust there’s more of this kind of biz.

David Weinberger: Too Big to Know

David Weinberger

I’m leading a discussion on the WELL with David Weinberger, inspired by his latest book, Too Big to Know: Rethinking Knowledge Now That the Facts Aren’t the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room. David’s been writing about the transformation of knowledge in the Internet era, in the book and sometimes on his blog.

Link to the discussion.

the central hypotheses of the book is that knowledge is moving (= has moved) from living in skulls, books, and libraries to living on networks and the Net.

So, if you want to know about some topic beyond the occasional fact, you’re likely to spend time on some network on the Net. It might be a mailing list, or a Google hangout, or Reddit, or a set of web sites… In fact, The Well provides a convenient example, and also lets me do some basic pandering. (Love ya, The Well!) A network of people connected in discussion and argument know more than the sum of what the individual people know. In that sense, knowledge lives in the network.

For me, the most interesting aspect of this is another of the book’s hypotheses: Knowledge is taking on the properties of its new medium, just as it had taken on properties of the old. Among those properties: networked knowledge is unsettled, and includes differences and disagreements that traditional knowledge insisted on removing (or at least marginalizing).

How cool to have a gigabit Internet? Not so much.

Chattanooga, Tennessee Gigabit Internet Banner

Is it really cool to have gigabit network connectivity? Or is it more like it would’ve been to be one of the first thirty offices with a fax machine?

Google plans to launch its fiber to the home network in Kansas City on Thursday with the goal of seeing what people there can do with a gigabit connection. But as one city that already has a gigabit network can tell you, the answer so far may be, “Not much.”

For the last two years, Chattanooga, Tenn.’s public utility (EPB) has offered customers a gigabit fiber-to-the-home connection costing roughly $300 a month, so I touched base with a group of investors and entrepreneurs who have built a program to try to see what people can do with that fast a connection. So far, the limits of equipment, the lack of other gigabit networks (much of the Internet is reciprocal so it’s no fun if you have the speeds to send a holographic image of yourself but no one on the other end can receive it) and the small number of experiments on the network have left the founders of the Lamp Post Group underwhelmed.

More from Stacey Higginbotham at GigaOM.

Infinite spectrum vs scarcity hype

David Isenberg explains that spectrum for various forms of wireless transmission and communication is treated as scarce, similar to real estate, because a scarcity model works for “cellcos” (cellular communication companies, former telcos) In fact, spectrum is infinite. [Link]

The core of the story is whether or not spectrum is a rival good. A rival good is something that when it’s used by one party can’t be used by another. The cellcos say it is. Current FCC regulation does too. But David Reed has repeatedly pointed out that physics — our understanding of physical reality — says otherwise. The article paraphrases him: electromagnetic spectrum is not finite. Not finite. In other words, infinite.

Technology Use Among Youth (Pew Internet)

The Pew Internet and American Life project’s Amanda Lenhart presented at the Annenberg Public Policty Center on Internet use among the young. No huge surprises here – access is often mobile, and texting is a big part of the online experience for young people.

“In her talk, Amanda focused on bringing together data that highlights the demographic differences among groups of youth in their adoption, use and experiences with technology and social media. While such data may have illustrated what was called a ‘digital divide’ in the past, it now highlights a variety of digital differences among groups of youth.”

Bruce Sterling talk at ATX Hackerspace

I shot this video of Bruce at an EFF-Austin-sponsored event February 25 at ATX Hackespace. We were rallying the troops. “You will not have the Internet that you had 20 years ago, that’s not possible. But you don’t have to roll over at the site of bluster from the Internet’s increasingly desperate enemies…”

Howard Rheingold: Net Smart

In 2009, Howard Rheingold created an excellent mini-course in network literacy, a substantial resource for those who want to learn more about the Internet. Here’s the introductory video:

Howard’s written a book on network and digital literacy called Net Smart: How to Thrive Online.

Internet Code Ring! (Interview with Phil Zimmermann, circa 1993)

Discovered that this interview is no longer findable online, so I’m republishing it here. A version of this was published in bOING bOING (the ‘zine) in 1993 or 1994.

We were sitting in a circle on the floor at the Computers, Freedom,
and Privacy conference, March ’93 in San Francisco, St. Jude and I
with Tom Jennings, Fen La Balme, et al, discussing encryption and
other neophiliac rants when a dapper fellow wandered by with a
beard on his face and a tie hanging from his neck. He picked up
Jude’s copy of bOING-bOING number 10 and glanced through it,
clearly interested. I later learned that this was Phil Zimmerman,
creator of PGP (“Pretty Good Privacy”), so I tracked him down and
we talked for the record.

Jon: I’m fairly nontechnical, and I’m also new to encryption. I spent
some time recently on the cypherpunks’ list, and I have a pretty
good sense of what’s going on, but maybe you can tell me in your
own words how you came to write PGP, and what your philosophy
is, especially with distribution.

Phil: Well, okay. PGP, which means “Pretty Good Privacy” is a
public key encryption program, it uses a public key encryption
algorithm, which means that you can encrypt messages and you can
send them to people that you’ve never met, that you’ve never had a
chance to exchange keys with over a secure channel. With regular
encryption, the kind that everybody has heard about, you encrypt a
message, it scrambles it up, renders it unintelligible, and then you
send it to someone else, and they can descramble it, decrypting it.
They have to use the same key to decrypt it as you used to encrypt
it. Well, this is a problem, this is inconvenient, because how are you
going to tell them what that key is, what’re you going to do, tell
them over the telephone? If someone can intercept the message, they
can intercept the key. So this has been the central problem in
cryptography for the past couple of millenia. There’s been a lots of
different ways of encrypting information, but they all have this
problem.

If you had a secure channel for exchanging keys, why do you
need any cryptography at all? So, in the late 1970s, somebody came
up with an idea for encrypting information with two keys. The two
keys are mathematically related. You use one of the keys to encrypt
the message, and use the other key to decrpyt the message. As a
matter of fact, the keys have a kind of yin-yang relationship, so that
either one of them can decrypt what the other one can encrypt. So
everybody randomly generates a pair of these keys, the keys are
mathematically related, and they can be split apart like cracking a
coin in half, and the jagged edges stick together just right. They can
publish one of the keys, and keep the other one secret. Now, unlike
cracking the coin in half, you can’t look at the jagged edge, and
figure out what the other jagged edge is going to look like. In fact,
you can’t look at the published key and figure out what the secret
key is without spending centuries of supercomputer time to do it.
This means that any time anybody wants to send you a message,
they can encrypt that message with your public key, and then you
can decrypt the message with your secret key. If you want to send
them a message, then you can encrypt the message with their public
key, and then they can decrypt it with their secret key. Everybody
who wants to participate in this system can generate a pair of these
keys, publish one of them, and keep the other one secret.
Everybody’s published key can end up in a big public key directory,
like a phone book, or an electronic bulletin board, or something like
that. You can look up somebody’s public key, encrypt a message to
them, and send it to them. They’re the only ones that can read it,
because they’re the only ones that have the corresponding secret
key.

J: Are there any such directories now?

P: Well, actually, there are starting to be directories like that. For
PGP, there are some public key directories on Internet. You can just
send an electronic inquiry saying “Give me the key for
[somebody],” and it’ll send you their key back, their public key.

J: The convention I’ve seen has been the inclusion of the public key
in an email message posted to a mailing list.

P: You can do that, you can include your own public key when you
send a message to someone, so that when they send you a reply,
they’ll know what public key to use to send the reply. But the
problem…there is an achilles heel with public key cryptography, and
I’ll get to that in a minute. But first, let me explain authentication. If
I want to send you a message, and prove that it came from me, I can
do that by encrypting it with my own secret key, and then I can
send you the message, and you can decrypt it with my public key.
Remember I said that the keys are in this yin-yang relationship, so
that either one can decrypt what the other one encrypts. If I don’t
care about secrecy, if I only cared about authentication, if I only
wanted to prove to you that the message came from me, I could
encrypt the message with my own secret key and send it to you, and
you could decrypt it with your public key. Well, anyone else could
decrypt it to, because everyone has my public key. If I want to
combine the features of secrecy and authentication, I can do both
steps: I can encrypt the message first with my own secret key,
thereby creating a signature, and then encrypt it again with your
public key. I then send you the message. You reverse those steps:
first you decrypt it with your own secret key, and then you decrypt
that with my public key. That’s a message that only you can read
and only I could have sent. We have secrecy and authentication. So
you get authentication by using your own secret key to decrypt a
message, thereby signing the message. You can also convince third
parties like a judge that the message came from me. That means that
I could send you a financial instrument, a legal contract or some
kind of binding agreement. The judge will believe that the message
did come from me, because I am the only person with the secret key,
that could have created that message.

Now, public key cryptography has an achilles heel, and that
achilles heel is that, suppose you want to send a message to someone,
and you look up their public key, on a bulletin board, for example.
You take their public key and you encrypt the message and then
send it to them, and presumably only they can read it. Well, what if
Ollie North broke into that BBS system? And he subsituted his own
public key for the public key of your friend. And left your friend’s
name on it, so that it would look like it belonged to your friend. But
it really wasn’t your friend’s public key, it was Ollie’s public key that
he had created just for this purpose. You send a message, you get the
bulletin board to tell you your friend’s public key, but it isn’t your
friend’s public key, it’s Ollie’s public key. You encrypt a message
with that. You send it, possibly through the same bulletin board, to
your friend. Ollie intercepts it, and he can read it because he knows
the secret key that goes with it. If you were particularly clever,
which Ollie North isn’t because we all know that he forgot to get
those White House backup tapes deleted…but suppose he were
clever, he would then re-encrypt the decrypted message, using the
stolen key of your friend, and send it to your friend so that he
wouldn’t suspect that anything was amiss. This is the achilles’ heel of
public key cryptography, and all public key encryption packages
that are worth anything invest a tremendous amount of effort in
solving this one problem. Probably half the lines of code in the
program are dedicated to solving this one problem. PGP solves this
problem by allowing third parties, mutually trusted friends, to sign
keys. That proves that they came from who they said they came
from. Suppose you wanted to send me a message, and you didn’t
know my public key, but you know George’s public key over here,
because George have you his public key on a floppy disk. I publish
my public key on a bulletin board, but before I do, I have George
sign it, just like he signs any other message. I have him sign my
public key, and I put that on a bulletin board. If you download my
key, and it has George’s signature on it, that constitutes a promise
by George that that key really belongs to me. He says that my name
and my key got together. He signs the whole shootin’ match. If you
get that, you can check his signature, because you have his public
key to check. If you trust him not to lie, you can believe that really is
my public key, and if Ollie North breaks into the bulletin board, he
can’t make it look like his key is my key, because he doesn’t know
how to forge a signature from George. This is how public key
encryption solves the problem, and in particular, PGP solves it by
allowing you to designate anyone as a trusted introducer. In this
case, this third party is a trusted introducer, you trust him to
introduce my key to you.

There are public key encryption packages currently being
promoted by the U.S. Government based on a standard called
Privacy Enhanced Mail, or PEM. PEM’s architecture has a central
certification authority that signs everybody’s public key. If everyone
trusts the central authority to sign everyone’s key, and not to lie,
then everyone can trust that they key they have is a good key. The
key actually belongs to the name that’s attached to it. But a lot of
people, especially people who are libertarian-minded, would not feel
comfortable with an approach that requires them to trust a central
authority. PGP allows grassroots distributed trust, where you get to
choose who you trust. It more closely follows the social structures
that people are used to. You tend to believe your friends.

J: Did you make a conscious decision up front, before you started
programming PGP, that you were going to create something that
would be distributed in this grassroots way, free through the
Internet.

P: Well, there were some software parts of PGP that I developed
some years ago, as far back as 1986, that I developed with the
intention of developing commercial products with it someday. Over
the years that followed, I developed a few more pieces that I hoped
someday to turn into a commercial product. But, when it finally
came down to it, I realized that it would be more politically effective
to distribute PGP this way. Besides that, there is a patent on the
RSA public key encryption algorithm that PGP is based on. I wrote
all of the software from scratch. I didn’t steal any software from the
RSA patent holders. But patent law is different from copyright law.
While I didn’t steal any software from them, I did use the algorithm,
the mathematical formulas that were published in academic journals,
describing how to do public key cryptography. I turned those
mathematical formulas into lines of computer code, and developed it
independently.

J: Did you originally intend to license that?

P: When I first wrote the parts of it back in 1986, I did. But I began
in earnest on PGP in December of 1990. At that time, I had decided
that I was going to go ahead and publish it for free. I thought that it
was politically a useful thing to do, considering the war on drugs
and the government’s attitude toward privacy. Shortly after I stared
on the development, I learned of Senate Bill 266, which was the
Omnibus Anticrime Bill. It had a provision tucked away in it, a sense
of Congress provision, that would, if it had become real hard law,
have required manufacturers of secure communications gear, and
presumably cryptographic software, to put back doors in their
products to allow the government to obtain the plain text contents
of the traffic. I felt that it would be a good idea to try to get PGP out
before this became law. As it turned out, it never did pass. It was
defeated after a lot of protest from civil liberties groups and industry
groups.

J: But if they could get away with passing it, they would still take the
initiative and try.

P: Well, yeah, actually…it started out as a sense of Congress bill,
which means that it wasn’t binding law. But those things are usually
set to deploy the political groundwork to make it possible later to
make it into hard law. Within a week or so after publishing PGP,
Senate Bill 266 went down in defeat, at least that provision was
taken out, and that was entirely due to the efforts of others, I had
nothing to do with that. PGP didn’t have any impact, it turned out,
at all. So that’s why I published PGP.

J: Several of my friends are involved in cypherpunks, and I’ve been
on their mailing list…are you affiliated in any way with
cypherpunks? Are you getting their mailing list?

P: I was on their mailing list for a couple of days, but I found that
the density of traffic was high enough that I couldn’t get any work
done, so I had them take me off the list.

J: The reason I bring cypherpunks up is that they seem to have
almost a religious fervor about encryption . I was
wondering if you share that.

P: I don’t think of my own interest in cryptography as a religious
fervor. I did miss some mortgage payments while I was working on
PGP. In fact, I missed five mortgage payments during the
development of PGP, so I came pretty close to losing my house. So I
must have enough fervor to stay with the project long enough to
miss five mortgage payments . But I don’t think it’s a
religious fervor.

J: I’m impressed with the way encryption in general and PGP in
particular have caught on with the press, how it’s become within the
last year.

P: Well, PGP 1.0 was released in June of ’91. It only ran on MS
DOS, and it didn’t have a lot of the features necessary to do really
good key certification, which is that achilles’ heel that I told you
about. Theoretically, you could use it in a manual mode to do that,
but it wasn’t automatic like it is in PGP 2.0 and above. The current
release of PGP is 2.2. It’s a lot smoother and more polished that 2.0
was. 2.0 was tremendously different than 1.0, and the reason the
popularity has taken off so much since September, when it was
released, is because it ran on a lot of UNIX platforms, beginning
with 2.0. Since the main vehicle for Internet nodes is UNIX
platforms, that made it more popular in the UNIX/Internet world.
Since Internet seems to be the fertile soil of discourse on
cryptography, the fact that PGP 2.0 began running on UNIX
platforms has a lot to do with it’s popularity since that version was
released…Tthat was in September of ’92.

J: The easiest way to get PGP is through FTP from various sites?

P: Yeah. Most of them European sites. PGP 2.0 and above was
released in Europe. The people that were working on it were out of
reach of U.S. patent law…and not only are they out of reach of patent
law, but it also defuses the export control issues, because we’re
importing it into the U.S., instead of exporting it. Also PGP 1.0 was
exported, presumably by somebody, any one of thousands of people
could have done it…but it was published in the public domain. It’s
hard to see how something like that could be published, and
thousands of people could have it, and it could not leak overseas. It’s
like saying that the New York Times shouldn’t be exported, how can
you prevent that when a million people have a copy? It’s blowing in
the wind, you can’t embargo the wind.

J: And by beginning in Europe, you sort of fanned the flame that
much better.

P: Yeah.

J: It seems to have spread globally, and I’m sure that you’re hearing a
lot about it, getting a lot of response.

P: Particularly at this conference (CFP93), yes.

J: Do you plan to do more development of PGP, or are you satisfied
with where it is….

P: PGP will be developed further. My personal involvement is more
in providing design direction and making sure that the architecture
stays sound. The actual coding is taking place overseas, or at least
most of it is. We do get patches sent in by people in the U.S. who
find bugs, and who say, “I found this bug, here’s a patch to fix it.”
But the bulk of the work is taking place outside the U.S. borders.

J: Is there a Mac version as well as a DOS version now?

P: Yeah, there is a Mac version…there was a Mac version released
shortly after PGP 2.0 came out. Somebody did that independently,
and I only found out about it after it was released. People have
written me about it, and it did seem to have some problems. The
same guy who did that version is doing a much improved version,
Mac PGP version 2.2, which I believe should be out in a few
days…that was the last I heard before I came to the conference. The
second Mac development group, that’s working on a very “Mac”-ish
GUI, is being managed by a guy named Blair Weiss. That takes
longer, it’s difficult to write a good Mac application, so it’s probably
going to be a couple of months before that hits the streets.

J: Were you involved in the UNIX version, too?

P: I did the first MS-DOS version entirely by myself, but it’s not
that big a distance between MS-DOS and UNIX, so most of it was
the same. The UNIX board took place soon after PGP 1.0 was
released. After that, many other enhancements were added, and
major architectural changes took place to the code, and that’s what
finally made its way out as version 2.0.

J: You’re doing consulting now?

P: That’s how I make my living, by consulting. I don’t make
anything from PGP.

J: Do you think you’ll just let PGP take a life of its own, let other
people work on it from here out?

P: Other people are contributing their code, and other people are
adding enhancements, with my design direction. Perhaps someday
I’ll find a way to make money from PGP, but if I do, it will be done
in such a way that there will always be a free version of PGP
available.

J: I was thinking of the UNIX thing, where everybody’s modified
their versions of the UNIX Operating System so that some
[customized versions] weren’t even interoperable. I was wondering
if there was a chance that PGP would mutate, whether you’re going
to keep some sort of control over it, or whether people will start
doing their onw versions of it….

P: Well, I don’t know, that could happen. There are so many people
interested in the product now, it’s hard to keep track of everybody’s
changes. When they send in suggested changes, we have to look at it
carefully to see that the changes are good changes.

J: But you don’t have some sort of structure in place where you do
some kind of approval if somebody wants to make some kind of
mutant version of PGP….

P: There is a kind of de facto influence that I have over the product,
because it’s still my product, in a kind of psychological sense. In the
user population, they associate my name with the product in such a
way that, if I say that this product is good, that I have looked at this
and that I believe the changes made sense the last version are good
changes, that people will believe that. So I can determine the
direction, not by some iron law, not by having people work for me
that I can hire and fire, but more by my opinion guiding the product.
It would not be easy for a person to make a different version of PGP
that went in a different direction than how I wanted it to go, because
everybody still uses the version that I approved, so to be
compatible…this has a kind of intertia to it, a de facto standard. PGP
currently, I believe, is the world’s most popular public key
encryption program, so that has potential to become a de facto
standard. I don’t know what that means in comparison to the PEM
standard. PEM is for a different environment than PGP, perhaps,
although the PGP method of certifying keys can be collapsed into a
special case that mimics in many respects the PEM model for
certifying keys.

How should the Internet be governed?

This piece hints at the politicization of the Internet and the complexity of its future. The Internet Corporation for Assigned Names and Numbers (ICANN) is the closest thing we have to “Internet governance.” It’s the organization that coordinates the standards and processes associated with Internet addresses – the assigned names and numbers referenced in the organization’s name. In “ICANN’s ‘Unelected’ Crisis” Michael Roberts write about the controversy over ICANN’s unelected leadership and multistakeholder model. “If ICANN is to maintain its quasi-independence, a hard boiled, Kissinger-like brand of pragmatic statesmanship will be necessary.” [Link]

More on bandwidth: light and darkness

My friend Robert Steele emailed me in response to my last post, saying there’s more to consider, and I agree. He mentions Open Spectrum.

I’m feeling cynical. Here’s how I responded:

I’m aware of open spectrum… I’m in other conversations with various wonks & engineers who’re discussing bandwidth, spectrum, etc. Of course we could have a much different scene if we weren’t constrained by markets and politics. People how can see one sense of the obvious often miss another, which is that the world we’re in is not an ideal world, and the ideals we can conceive are not necessarily easy or even possible to implement. I pay less attention to the “next net” list we’re both on because so much of it is fantasy and masturbation.

I own a nice home in rural Texas but I can’t live there because I can’t even get 500kbps. I thought it was amusing that Vint is arguing for gigabit bandwidth when most of the U.S. is dark and there’s too little monetary incentive to bring light to the darkness. Of course I think we need a public initiative to make it happen, but in this era “public” is a dirty word. I halfway expect to see all roads become toll roads; a world where only the elite can travel, and only the elite will have broadband access. Though aging, I’m struggling to remain part of the elite… *8^)

Increase bandwidth exponentially

Internet prime mover Vint Cerf echoes what I’ve been hearing from other architects of the TCP/IP network: we should focus on building much fatter pipes, and get away from the enforced/legacy scarcity and build gigabit broadband networks. Nothing here about the cost of providing gigabit access, nothing here about the fact that much of the (rural) U.S. has no access to broadband at any speed. What policies do we need to have pervasive gigabit broadband, urban and rural, in the U.S.? Who will pay for the buildout? [Link]

Emerging thoughts

I’ve been in conversation with a diverse group of people who are interested creating a next version of the Internet that’s more peer to peer, more open source/open architecture, less vulnerable to government or corporate restriction. Some aspects of the various threads of conversation are idealistic – not wholly unrealistic, but so far a bit fuzzy and not fully baked. However there’s substantive, useful, and promising discussion in the air, and I’m hopeful that something viable and helpful will emerge.

Coincidentally, the concept of emergence came up, via this article by Margaret Wheatley, who calls emergence “the fundamental scientific explanation for how local changes can materialize as global systems of influence” as networks evolve into communities of practice, and then systems of influence begin to emerge. This she calls the life cycle of emergence.

This resonates with the Emergent Democracy discussion and paper that Joi Ito, Ross Mayfield, and I (along with several others) worked on in the early 2000s. But what’s missing in this talk about emergence and changing the world is the role of intention. Who sets the goals for changing the world? Who catalyzes networks and drives them in a particular direction? No person or group decides to make something emerge or to make specific changes – emergence is about force and evolution, not human intention. And when you talk about changing the world, by whom and for whom, and with what force, become relevant questions.

The Tea Party and the Koch Brothers want to change the world, too. Is their vision less valid than mine or yours?

But there are forces that transcend Internet theorists and instigators, Tea Parties, partisan movements, idealistic next-net theorizers, rebels in the street, corporations, governments, etc. – forces that emerge out of control; evolution that occurs, not created or driven by some interest group, but driven by complex social physical, psychic, and social factors that have unpredictable effects.

We’re just another set of smart people who think we know how the world should work, and we probably need more humility. How can we be effective in a context where there are forces that are truly beyond our control? What intentions should we support and honor?

Connectivism

Have you ever thought about how completely irrelevant structured learning is? Indeed. “The illiterate of the 21st century will not be those who cannot read or write, but those who cannot unlearn and relearn.” – Alvin Toffler. The video below advocates a change in how we learn – network-centric, personal, based on your context, not based on some institution’s agenda. (Thanks to Judi Clark for sending me the link to this video.)