The Plotkin Inquiry: IP and Tech Law Review: 07/01/2011

Sunday, July 31, 2011

Analysis of the Google Books settlement and its judicial rejection

The Google Books project and the suit

In 2004, Google undertook a gargantuan project to digitize a massive catalogue of books. With the cooperation of a four university libraries and one public library, Google began scanning and cataloguing millions of books and volumes. In the court’s written rejection of the Settlement agreement, the judge puts the number of books scanned at 12 million (though a recent estimate puts the number closer to 15 million).

As part of the deal, Google would give the libraries a digital copy of their own catalogues and would create a master catalogue which would constitute the Corpus of the Google Books collection. In typical Google fashion, this Corpus would be made searchable by the public. For works in copyright, people would only be able to see snippets of the book containing the text of their search. For works known to be in the public domain, the whole work would be viewable by the public.

Photo by Renjith Krishnan

To the objective observer this seems like an all around win for all parties involved. Google grows its value while accomplishing a morally commendable enterprise; The public gains access to a wealth of culture and information otherwise lost or inaccessible; People with visual handicaps suddenly gain access to millions of works never before available to them; Authors of in-print books benefit from the muscle of Google’s impressive search algorithms to funnel otherwise untapped readers into purchasing their full work (once they’ve wet their appetites with the free snippet); Finally, authors of out-of-print books suddenly begin to see revenue again from material that was until now, commercially unavailable. Indeed it seemed as though the Google Books project would bring the world closer to Shangri-La and “open knowledge”. Certain people, however, saw things a little differently.

In September 2005, two lawsuits were brought against Google; One by the Authors Guild (Authors Guild v. Google) and the other by 5 major publishers (McGraw Hill v. Google). The Authors Guild’s suit was a class action, ultimately joined by the publishers.

The rights holders claimed “massive copyright infringement “on the part of Google. Google contended that their acts were under the protection of the fair use doctrine.[1] Though public domain works were made available in their entirety, only snippets of protected works could be viewed for free. Google also specifically withheld the launch of ads on Google Books so that the claim couldn’t be made that they were engaging in commercial activity somewhat of a faux pas for a fair use defence.[2]

To this day, both parties stick by their original positions on the infringement/fair use issue even though such matters are no longer relevant (at least to this case). On October 28^th, 2009, a settlement agreement between the plaintiff class and Google was reached.

The Settlement Agreement

The agreement has taken the form of a massive 166 page document (plus appendices) outlining the rights and obligations incumbent on Google and the plaintiffs.[3] The agreement was originally supposed to cover all authors (including their heirs, successors and assignees) and publishers with an American copyright interest as of January 5^th, 2009 (this scope was later changed. See “International Law” below.).

Google must also establish a “Book Rights Registry” where rights holders may register to receive royalty payments for their works that are included in the Corpus. Google is to pay $34.5 million to establish and fund the operation of this registry.

Not only that, Google must pay an additional $45 million into a settlement fund to pay out rights holders whose works were already digitized by Google without permission (as of May 5^th, 2009). In actual fact, the settlement mandates that each author who makes a claim be paid out a set amount making the $45 million a primary figure. Any claims above that amount would still be paid out by Google as they arise and should the number be lower, the remainder is divided among rights holders, not returned to Google as surplus.

In addition to these “sanctions”, the settlement gives Google extensive rights. The document lays out five:

Google may continue growing the Corpus by digitizing books and inserts.
Google may sell subscription based access to the Corpus (such as institutional subscriptions for universities.
Google may sell online access to individual books to users in an online store.
Google may sell advertising on pages of books.
Other prescribed uses.

The settlement makes clear that those rights are non-exclusive and can therefore be licensed by rights holders to anyone, including direct competitors of Google.

Google must share the revenues from these uses giving up 63% to the rights holders for works published before January 5^th 2009. For new works going forward, Google must hand over 70% of revenues from all sales (both subscription and per-use based) and advertising revenue derived from the Corpus- less a 10% deduction to cover Google’s operating costs.

The settlement draws an important distinction between two classes of books included in the Corpus: In-print books and out-of-print books. In-print books are those that are still commercially available. Concordantly, out-of-print books are those that are no longer being produced and are not commercially available. An interesting question arises when a publisher offers online versions of books that it no longer prints for retail. Is that book still in-print? It is commercially available but only in a digital format.

It’s also worth mentioning that the settlement agreement stipulates the amount to be paid to the lawyers representing the class is $45.5 million. Though this consideration is immaterial to the meat of the issue, it’s still a tad disconcerting that the lawyers are making more than the amount being paid out to establish a registry for the authors of what some say is up to 15 million books and counting.

Rejection of the Settlement by the Court of the Southern District of New York

On March 22^nd, 2011, Judge Chin of the Court of the Southern District of New York rejected the settlement agreement that received preliminary approval five months earlier.[4] Judge Chin disagreed with Judge John E. Sprizzo on the fairness of the settlement agreement. In truth, Judge Chin had the benefit of reading the hundreds of objections that were filed with the court between the preliminary ruling and his ruling.

Before stating his reasons for rejecting the settlement, Judge Chin admitted two factors considered that weighed in favour of approval. First, he commended the “arm’s length negotiation between experienced, capable counsel, with assistance from the DOJ” (Department Of Justice). Second, he rightly asserted that proceeding with a full blown trial (and the inevitable appeals process) would be very lengthy and wildly expensive.

That being said, Judge Chin had a number of valid reasons for rejecting the proposed settlement:

· Inadequate representation of the class members: The class of plaintiffs in the case is very large. Google sent 1.26 million notices in 36 languages to copyright holders and potential class members as well as publishers and authors’ rights collectives. For example, Scholars and academics usually publish with a motivation that is very different from a commercial writer. Both are included in the settlement. It is even conceivably arguable that publishers have very different motivations from authors and authors’ rights collectives.

· Encroachment by the court on Congress’ power: Under the settlement agreement, Google would have the rights to all orphan works[5] without any prior consent (a rights holder may step forward after the fact and make a claim). Judge Chin acknowledged Sony Corp. Of America v. Universal City Studios Inc[6]. where the Supreme Court held that it is congress’ duty to keep up with technological advances and their effects on copyright law. He further cited Eldred v. Ashcroft[7] where the Supreme Court said “It is generally for Congress, not the courts, to decide how best to pursue the Copyright Clause’s objectives.”

· Google should not be allowed to take a “shortcut”: Instead of undergoing the long and prohibitively costly process of licensing and tracking down of rights holders, Google decided to take a “sue me” shortcut. In violating copyright on an epic scale, they hope to circumvent both the time and cost associated with the above scenario. Ironic that seven years and millions of dollars in billable hours later, we still have neither a verdict, nor a settlement. As it was eloquently put by Stanford Law School Professor Pamela Samuelson “We’re giving Google a blank check to essentially engage in activity that would be considered clearly infringing activity but for the settlement”

· Anti-trust concerns: The settlement agreement would have the effect of giving Google the exclusive monopoly right over all orphan works. This means that any institution or competitor who wishes to make use of or offer any of the millions of orphan works in the corpus cannot do so without paying Google a royalty. Google also has the right to refuse a license to anyone it sees fit. This is a prime example of what anti-trust law is supposed to prevent- anti-competitive and monopolistic behaviour. The fact that this is a settlement for a legal action adds insult to injury in that it would create an injustice, potentially greater than the initial reason Google was sued in the first place. There’s no Fair Use defence in anti-trust for Google to hide behind either.

· Privacy concerns: No one is better at collecting data than Google. Would this extend to information about the books we’re reading? Not only will Google know what you’re reading and when, they’ll know for how long, how many pages and which pages you read. This is clearly a breach of our reasonable expectation of privacy. What we read can often be personal. It isn’t the sort of data everyone is ok with others collecting. There is also the fear that such information can be forcibly turned over to government entities by way of the aging Electronic Communications Privacy Act (ECPA). If Google Books ever does get started, don’t be surprised to hear about the F.B.I or the Department of Homeland Security compelling Google to hand over reader data of certain subscribers.[8] Never before has law enforcement been so well equipped with readily available and invasive access into our lives (e-mails, online profiles and even cell phone location data). Google Books subscriber data would simply be one more tool available to them.

· International law: At first, the settlement was worded to include any book that had a U.S. copyright interest. This would include almost every single book in the world. The U.S is a signatory to the Berne Convention. This means that the U.S. must give equal recognition and protection to any work published or performed in the U.S. regardless of its origin. Objections to this language being used in the settlement resulted in the definition being narrowed to exclude non-American works that were not affirmatively registered in the U.S. Copyright Office. That being said, Canadian, British and Australian works were still included in the settlement regardless of U.S. copyright registration.

Uncertainty ahead

With this rejection on the books, it’s back to the drawing board for Google and the plaintiffs. Though impressive opposition to the settlement has been mounted, it is unclear whether it will be abandoned entirely or not.

The judge alluded to the fact that a matter like this is one for Congress, not the court to decide. Which beckons the question, can Congress really do any better? There are mixed opinions as to how that question should be answered. However, one might posit that regardless of one’s faith in Congress to come to a better result, the Constitution mandates it.

Article 1 Section 8 Clause 8 of the U.S. Constitution gives Congress the power to “To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.”[9]

The Settlement has far reaching implications on an entire category of works contemplated in the Copyright Act, orphan works. This fact alone should militate in favour of a Congressional rather than judicial response.

On its face, this settlement appears to be a misuse of the class action mechanism. In general, a class action suit is usually taken to compensate the victims of damages caused by the plaintiff(s). Though the settlement covers compensatory considerations, it also provides for a forward looking licensing and royalty scheme. This scheme also implicates millions of people who some contend are not being accurately represented by the class representatives.

In a debate on the Settlement at the University of Richmond Law School, Professor James Grimmelmann of New York Law School delivered a brilliant analogy to illustrate this point:

Say that 5 years ago there was a minor oil spill (nothing on the magnitude to the one that recently occurred) in the Gulf of Mexico caused by BP. A class forms that comes to represent all residents and business of Gulf Coast states. In a settlement, to facilitate the pay-out of any future damages caused by an oil spill in the area, BP sets up a compensation fund and a streamlined process by which people can easily make claims. Imagine that BP agrees to put $500 million in that fund to assure that no damage that could possibly arise would remain uncompensated. Fast forward to the events of last year. That settlement, already having been signed and approved would exonerate BP from liability for the astronomical damages it has caused to date (far exceeding the once seemingly adequate sum of $500 million).[10]

This scenario helps highlight exactly what may be wrong with a significantly forward looking settlement like this one.

Professor Samuelson brings up an interesting point on the future implications of the settlement on the Corpus itself. What if the settlement goes through and 10, 15 or 20 years down the line Google shuts down or goes bankrupt or otherwise? We will all have become so dependent on Google Books that its failure would become an unacceptable reality. A service like this truly is a public good and as such, should be protected. This presents a “too big to fail” scenario in which the Government may eventually have to step in out of necessity.

Google also has the right to sell the Corpus to anyone. Some people fear that Google has carte blanche to engage in “price-gouging” considering they are the only source for a large part of the Corpus. Even if we presume Google will not abuse their right, who’s to say the next owner wont? As Samuelson suggests, those who aren’t afraid of price-gouging on the part of Google may reassess that fear should the database be sold in the future.

This entire saga started out with Google trying to do what Google does best, cataloguing and indexing data in order to make it searchable. The matter has evolved into what could be one of the most important and impacting copyright cases in history. While experts continue to speculate on possible and desirable outcomes, people concerned with copyright around the world are watching intently as the situation continues to slowly unfold.

[1] Fair use is a part of the American Copyright act that presents a non-exhaustive list of uses of a work that would otherwise infringe copyright but do not because they are considered to be fair uses of the original work. Canadian Copyright law has a concurrent (but not identical) regime referred to as fair dealing. These regimes are not to be viewed as a simple defence, but as an integral part of the copyright act and therefore a clear delimitation on the exclusive right afforded a copyright holder. See McLachlin CJ. In the Supreme Court of Canada’s decision in CCH Canadian Ltd. v. Law society of Upper Canada: “...the fair dealing exception is perhaps more properly understood as an integral part of the Copyright Act than simply a defence.”

[2] One of the factors considered in the Fair Use analysis is the purpose and character of the use. It is harder to mount a fair use defence if the use in question is highly profitable to the alleged infringer. That being said, the commercial nature of a use does not automatically preclude it from being fair. The U.S. Supreme Court has even said that this factor is not the most important to be considered.

[3] Google makes the updated settlement agreement as well as an FAQ on the settlement available at the following link: <http://books.google.com/googlebooks/agreement/press.html>.

[4] For a complete PDF version of the decision see <http://www.openbookalliance.org/wp-content/uploads/2011/03/JudgeChinGBSCourtOrder.pdf>

[5] An Orphan Work is one where the Copyright owner is unknown or cannot be located but is still under copyright protection. One problem relating to Orphan works is that if they are used without clearing the rights, a task that cannot be completed, the user opens themselves to the risk of the rights holder “popping up” out of nowhere with an infringement suit.

[6] 464 U.S. 417 (1984). For the full text of the decision see: <http://supreme.justia.com/us/464/417/case.html>

[7] 537 U.S. 186 (2003). For the full text of the decision see: <http://caselaw.lp.findlaw.com/scripts/getcase.pl?court=US&vol=000&invol=01-618>

[8] For more information on the ECPA and it’s implications on online privacy, please see: <http://jamesplotkin.blogspot.com/2011/07/digital-due-process-bid-to-modify-ecpa.html>

[9]Text available at <http://www.house.gov/house/Constitution/Constitution.html>

[10] Video of the debate: <http://www.youtube.com/watch?v=uWd2O6jNJZY>

Sunday, July 24, 2011

Copyright in Databases?

Can one have copyright in a database? For those familiar with the law, it is clear that raw data itself is not protected by copyright. If it were, the phonebook (or whoever else was “first to fix” my information) would have a copyright on my name, address and telephone number.

Photo by Renjith Krishnan

Databases are considered “compilations” under copyright law. Both the Canadian and American Copyright Acts grant copyright protection to the compiler. Section 2 of the Canadian act defines a compilation as: “a work resulting from the selection or arrangement of literary, dramatic, musical or artistic works or of parts thereof, or a work resulting from the selection or arrangement of data” (Emphasis added). This means that the data collected in the database need not be original or creative. A list of names and numbers may be eligible for protection as a compilation. The copyrightable element of the work is the “selection and arrangement” on the part of the compiler.

For example, last year, Professor Michael Geist put together a collection of scholarly, peer-reviewed articles on Canadian copyright and the (then) new bill C-32 (electronic version available free at the above link). Each chapter comprised a paper written by a different academic. Some of the articles contain a very short forward by Geist. In this case, professor Geist has copyright in the compilation. Though he didn’t write the majority of the collection, he was the organizer (or to use the terms of the law the “selector and arranger”) of the content.

How much creativity or effort a person must put into this selection process is not readily discernable. In the 1991 American Supreme Court decision known commonly as Feist, the principles of which have been adopted in Canada, the court solidified the applicable test in an effort to demystify this standard.

The court’s decision in Feist comes down to a two part test:

Is the material (selection and arrangement) copyrightable in favour of the compiler?
Has the defendant reproduced an infringing amount of the copyrightable material?

The answer to the first question varies from one jurisdiction to another. There are essentially two competing doctrines to determine the copyright-ability of a work: the “creativity” approach and the “sweat of the brow” approach.

In jurisdictions like Australia and the UK, a compiler will be granted copyright in their database if they dedicated a substantial amount of time, money and/or resources to its completion. This has largely been the Commonwealth tradition of which Canada is historically a part.

The creativity approach demands that the person claiming copyright in their database demonstrate that they exercised a minimal amount of creative energy in selecting and arranging the data. This is the approach held up in Feist which had to do with a telephone directory.

In that case, the appellant Feist was a company whose business it was to compile large telephone directories from smaller local directories. The plaintiff, Rural, sued Feist for copyright infringement and lost due to there having been no originality whatsoever exercised by them in the selection and arrangement of their data. The court tells us that originality (by way of creativity) has always been the cornerstone of copyright in the US. That being said, the threshold for creativity is very low. One must simply demonstrate a spark of originality. For example, alphabetically listing the names and phone numbers of everyone living in a geographical area is totally unoriginal. However, perhaps a register of restaurants in a designated geographical area listed by style or genre would be eligible. To pass the creativity test, it is unnecessary that the compiler show true novelty or non-obviousness as in patent law. They must simply show that “spark” the court speaks of so romantically.

As mentioned above, Canada has historically followed its commonwealth roots and the “sweat of the brow” approach. However, the precedent setting Canadian case in this area shows great consistency with the American approach.

In Tele-Direct v. ABI, the Federal Court of Appeals upheld the Trial Court’s decision that the elements of Tele-Direct’s Yellow Pages directory were not in themselves sufficiently original. While ABI admitted from the outset that Tele-Direct had copyright in the overall compilation, the court found that insufficient originality was exercised in the selection and arrangement of the “sub-categories” (the information in each individual listing). ABI didn’t copy the overall form of Tele-Direct’s Yellow Pages; they simply copied the data in the individual listings, which were themselves, not organized or arranged in a sufficiently original manner to be worthy of copyright protection.

This break with the Commonwealth trend is probably due in part to Canada’s adherence to NAFTA (North American Free Trade Agreement) and TRIPs (trade-related aspects of intellectual property rights). In his 1998 submission to Industry Canada and Canadian Heritage, Professor Robert Howell highlights article 1705(1)(b) of NAFTA which reads as follows:

“1. Each Party shall protect the works covered by Article 2 of the Berne

Convention, including any other works that embody original expression within the

meaning of that Convention. In particular:

. . .

“(b) compilation of data or other material, whether in machine readable or other

form, which by reason of the selection or arrangement of their contents constitute

intellectual creations, shall be protected as such” (emphasis added).

He further purports that the intention in defining compilation this way was likely to coincide with the definition given in the 1976 US Copyright Act: “A ‘compilation’ is a work formed by the collection and assembly of pre-existing materials or of data that are selected, coordinated, or arranged in such a way that the resulting work as a whole constitutes an original work of authorship. The term ‘compilation’ includes collective works.” (emphasis added).

In the EU, databases benefit from the protection of two legislative regimes. As in North America, database creators in Europe may avail themselves to regular copyright protection. They may also make use of the “European Database Directive” (DIRECTIVE 96/9/EC). This Sui Generis regime doesn’t require the database creator to have invested any level of creativity or originality in his work.

Article 7 section 1 of the directive reads as follows:

“Member States shall provide for a right for the maker of a database which shows that there has been

qualitatively and/or quantitatively a substantial investment in either the obtaining, verification or presentation of the contents to prevent extraction and/or re-utilization of the whole or of a substantial part, evaluated qualitatively and/or quantitatively, of the contents of that database.” (emphasis added).

This regime clearly endorses the “sweat of the brow” approach. Though some may disagree with this standard, the Directive comprises another provision that is by far more controversial and altogether unsettling.

Though Article 7 section 1 of the directive says that the term of protection afforded the database under the law is 15 years, section 3 of the same article reads as follows:

“Any substantial change, evaluated qualitatively or quantitatively, to the contents of a database, including any substantial change resulting from the accumulation of successive additions, deletions or alterations, which would result in the database being considered to be a substantial new investment, evaluated qualitatively or quantitatively, shall qualify the database resulting from that investment for its own term of protection.”

This essentially means that by making simple alterations, additions or subtractions of content or structure, the term resets for another 15 years. This amounts to what is clearly “evergreen” protection for databases in the EU.

While this directive has been going strong in Europe for the past 15 years, neither the US or Canada have enacted similar legislation, most likely because they aren’t convinced it’s necessary to do so.

As the EU continues to distinguish itself in the area of database protection, the west seems to be coming together. Though originality with reference to copyrightable works is constitutionally mandated in the US (Article 1, section 1, clause 8), no such Constitutional obligation exists in Canada. That being said, both international agreements and the interest of uniformity have seen Canada and the US align closely on the judicial treatment of copyright in databases. It is my belief that with the monumental growth of cloud computing and online data storage, congruity in this area may prove beneficial to the attraction of American companies who wish to be certain that they may obtain Canadian copyright in their property.

Monday, July 11, 2011

Cyber-insurance: Limiting liability in the cloud

Cyber-security has become a stock-value influencing issue. With the seemingly periodical reports of data breaches occurring, companies that have an online presence or make use of online cloud computing services are reassessing the manner in which they protect their data.

Photo by Renjith Krishnan

What happens, however, when the security fails? The answer: cyber-insurance. This relatively new form of insurance (being offered for approximately five years now) makes available coverage for liability resulting from any number of possible scenarios including misappropriation of data, hacker/virus infiltration, data corruption etc.

Considering the amount of damages resulting from a data breach can rise into the millions (or potentially billions), cyber-insurance presents itself as a sober contingency in the event that an entity’s security or the security of their third party provider fails.

Though businesses that make use of third-party cloud vendors for the storage and/or processing of their data should look into acquiring cyber-insurance, it is a common practice in the industry that cloud vendors themselves have insurance policies in the event that they are found to be liable for damages.

Take the case of a law firm utilizing a third-party cloud provider to store valuable information containing case strategy, financial information and so forth. The standard rules of professional liability apply in that the data owner (in our scenario the law firm) is ultimately responsible for the safety and integrity of the data entrusted to them. This means that if the client is going to sue someone it will most definitely be the law firm, not the cloud service provider.

What if it is found out that in storing the data, the third-party cloud service provider committed some act of gross negligence in being reckless or careless with the firm’s data? In such a case, the cloud service provider is open to liability and would be able to make use of insurance coverage that extends to this sort of situation.

A 2009 estimate placed the cyber-insurance business at $450 million. Some have put forth research however, that though the cyber-insurance industry is growing in size, its long term sustainability is in question. The logic behind this claim has to do with the trend in online tech of everyone gravitating towards the big fish. This makes it considerably easier for hackers to affect a large number of people with a single attack. An effective attack on Facebook for example would have a profound effect due to the sheer amount of personal data in play. Those who make use of smaller competitor software solutions will ultimately be safer even though these smaller firms cannot afford to spend the same amount on security as larger firms.

Insurance companies may see the size of the “big fish” and decide that the risk of taking them on as a client is simply too high and that the companies potential for growth is insufficient to justify taking on such a high risk.

At the 2010 Gartner's Catalyst Conference, Bob Parisi said that the common practice in the cyber-insurance industry is for businesses to purchase a main policy and then layer that policy with additional coverage from other providers to form a single mega policy that will offer an acceptable level of coverage. This seems to be the current practical response to the theoretical quagmire mentioned above.

At the same conference, Drew Barkowitz pointed out that at one point, any business will see a diminishing return on security expenditures. He explains that if you spend 12% of your capital on security and are at a certain level, spending 36% will not necessarily make you three times more secure. From this perspective, cyber-insurance seems like an almost obvious contingency measure and one that makes the most financial sense.

The fact of the matter is that we haven’t reached the point yet where insurance providers are categorically shying away from offering large firms cyber-insurance coverage. Cyber-insurance is still a viable way for businesses to mitigate risks associated with storage and processing of data online.

The industry is still growing and learning about itself and will likely become a lot bigger before it gets any smaller.

Considering the increase in both intensity and frequency of data breaches, cyber-insurance may be the only true financial back-up plan amenable to a business in the event of a massive security failure. It will be interesting to see how this up and coming domain of insurance flourishes in the world of high volume cloud computing and how the former will be influenced by it.

Friday, July 8, 2011

Why is my information vulnerable in the public cloud?

Free, public cloud based services have been around for a while. Microsoft’s Hotmail, Google’s Gmail and Docs, News Corp’s Myspace and so on, have all been offering free and user friendly cloud based software that most anyone can use. Along with any cloud service come the issues of security and privacy. How do these organizations go about treating your private information and for what purposes are they using it? How secure are these free services that have so many members storing all this information? What is the common practice when law enforcement or some other government entity asks them for access to your data?

Photo by digitalart

In his essay “Caught in the Cloud”, scholar and activist Chris Soghoian rightly points out that these businesses aren’t charities. News Corp and Microsoft aren’t in the habit of spending large amounts of money on servers and resources so that millions of users may enjoy free service. The primary manner in which these companies make money is through the organization and sale of their users’ private data.

For example, when we write e-mails in Gmail, all the text we type is sent through one of Google’s algorithms. That algorithm spits out data and tells Google how to intelligently advertise to us based on the contents of our conversation. So if I’m e-mailing a friend for class notes from the Intellectual Property law class I didn’t show up to, I may see ad’s for higher legal education, online copyright protection services or even patent drafters. Google’s algorithm intelligently determines what ads will hit home with me by analyzing all of my communications. The same process applies to Google Docs.

Naturally, this type of targeted, consumer specific advertising (known as behavioral targeting) is worth a premium in comparison with randomly sending ads to Gmail subscribers hoping that they are appropriately targeted. This is one of the primary ways in which Google not only makes up the costs of running it’s free cloud based offerings but turns a profit.

Microsoft uses similar behavioral targeting techniques with their Hotmail service. Microsoft will analyze your search data. In 2006, Chris Dobson, Microsoft’s global head of advertising sales told Seeking Alpha that Microsoft has increased its click-though rate by 76% since the implementation of behavioral targeting in its ad services.

When you ask the average person how important it is to them that their data be secure online (specifically their private and intimate data like the content of their e-mails), they’ll generally reply by telling you it is very important. However, one look at these public cloud services and one quickly realizes, people like to talk.

Possibly the best and most widespread method to protect data is encryption. When we log into our online banking sites, our sessions on those sites are encrypted for obvious reasons. I doubt too many people would use an online banking service with security practices like Gmail or MySpace.

Chris Soghoian explains that most cloud service providers have “Network Encryption” which essentially protects you as you log into your service. They do not, however, have what is called “Data Encryption” which is what protects your information once it is already in the cloud. Though Soghoian tells us that one of the main reasons for this is the total lack of awareness of the average consumer and the lack of consumer demand for encryption, I am of the belief that this is first and foremost a cost issue.

Large public cloud providers do not implement data encryption because it is more resource intensive and would have the effect of slowing down the service (making things more costly for them). Also, as we noted that these services make their money selling data for advertising, that data becomes a lot less valuable if not worthless if no one can understand it (because it’s encrypted).

This is a stark contrast with private cloud providers whose success very much hinges on the security and integrity of their network. Medical service providers, large businesses, law firms and alike do not store their information in the public cloud for reasons of liability. What insurance company will cover a law firm that was the victim of a data breach upon storing valuable client information on Google Docs? Such a move would be monumentally foolish for any entity needing to store private, sensitive or valuable information.

Finally, what happens when law enforcement tries to compel one of these service providers to hand over your private information. In many cases a warrant isn’t even required! For example, the Patriot Act allows law enforcement to ask for a court order and search your private data without ever informing you. What’s more, through the use of what is called a “National Security Letter”, the Patriot Act allows the F.B.I. and other law enforcement agencies to access your data without any form of judicial hearing or oversight. That means that the F.B.I. can look at your documents without establishing probable cause that your data is or may be useful to a criminal investigation. These broad and sweeping powers have been the subject of much debate and the constitutionality of these measures (on 4^th Amendment grounds) has been seriously called into question. Perhaps this is the reality of online life in a post 9/11 world.

There are, however, solutions to this problem. As mentioned before, data encryption makes it so data, while stored in the cloud remains unintelligible. It is only once it is decoded with the encryption key that it can be read again. Some services do offer data encryption to their clients. However, if the service provider is in possession of the encryption key, law enforcement can compel them to hand it over along with the data itself. That isn’t the case if the user is the only person who possesses the encryption key. In that event, a service provider can comply with the demand without actually exposing your information.

People have been and will continue to use free cloud based services for e-mail and alike. I’m not suggesting we all stop doing so. That being said, I do believe that knowing how your data is (and can be) treated is important and should be in the back of everyone’s mind when clicking “I agree”. You may decide that certain things are best left out of the cloud.