This new Google Research Blog post contains several charts that are worth a look.
The web is vast and infinite. Its pages link together in a complex network, containing remarkable structures and patterns. Some of the clearest patterns relate to language.
Most web pages link to other pages on the same web site, and the few off-site links they have are almost always to other pages in the same language. It’s as if each language has its own web which is loosely linked to the webs of other languages. However, there are a small but significant number of off-site links between languages. These give tantalizing hints of the world beyond the virtual.
To see the connections between languages, start by taking the several billion most important pages on the web in 2008, including all pages in smaller languages, and look at the off-site links between these pages. The particular choice of pages in our corpus here reflects decisions about what is `important’. For example, in a language with few pages every page is considered important, while for languages with more pages some selection method is required, based on pagerank for example.
You might wonder whether off-site links landing on English pages can be explained simply by the number of English pages available to be linked to. The webs of other languages in our corpus typically have sixty to eighty percent of their out-language links to English pages. However, only 38 percent of the pages and 42 percent of sites in our set are English, while it attracts 79 percent of all out-language links from other languages.
Here’s an example of a chart: Language Graph of the Web (2008)
Direct to Full Text Blog Post: “Languages of the World (Wide Web)