11 Temmuz 2012 Çarşamba
Bringing relevant news to you, regardless of language -- translated news
Breaking down language barriers with translated English-language results
10 Temmuz 2012 Salı
9 Temmuz 2012 Pazartesi
AJAX Custom Search Gadget on Blogger
One cool feature of the gadget is the 'Linked From Here' feature that searches the pages you've linked to from your blog posts. As you create new posts, we automatically update your search engine to include all the linked pages, as well as all the pages linked from your link lists and blog lists. Check out the gadget — the search results match the look and feel of your blog and show up inline, as shown in the screenshot below. You can click a button to dismiss the results when you are done, and go back to reading the current post.
If you are not using Blogger, you can still create something similar for your website using the Custom Search element (read more about this new element at the Custom Search blog).
Arabic Transliteration added to the AJAX Language API
We're happy to announce that we've now added Arabic to the list of supported languages. Now, you can allow your users to easily input Arabic-language text into any text field or text area on your web page without switching to a non-Latin alphabet keyboard - just like on the Labs page. For example, if the user types 'mar7aban bekom', the API will transliterate each word, with the result 'مرحبا بكم' (Arabic for welcome). The API will even automatically adjust the direction of the text area to support this right-to-left language.
Take a look at the documentation and then head over to the Code Playground to give it a try for yourself. If you have any questions, stop by the Google AJAX API developer forum or IRC channel.
Ext-core ready to go
// directly access itYou can also use it from the loader:
http://ajax.googleapis.com/ajax/libs/ext-core/3.0.0/ext-core.js
// alias the newest 3.0.x version
http://ajax.googleapis.com/ajax/libs/ext-core/3.0/ext-core.js
// alias the newest 3.x.x version
http://ajax.googleapis.com/ajax/libs/ext-core/3/ext-core.js
// directly access the uncompressed code
http://ajax.googleapis.com/ajax/libs/ext-core/3.0.0/ext-core-debug.js
google.load('ext-core', '3.0');Thanks to all for the requests to add ext-core to our Libraries API, and big thanks to Ext JS for providing their library so openly! For more information, head over to their blog post.
google.load('ext-core', '3', {uncompressed : true});
Introducing the Virtual Keyboard API
It is often difficult for Internet users to input text in many non-Latin script-based languages for a variety of reasons. The correct keyboard layout may not be installed on the computer they're using - sometimes such a layout may not be well developed or widely available. This poses a challenging problem for web developers because there is no way they can ensure that their users have access to this very basic input technology. Our Transliteration API can help, but requires that the user know multiple languages.
Right on the heels of introducing support for translating Persian (Farsi), we've added a new Virtual Keyboard API into the Google AJAX Language API to further assist with text input. With this, developers can help their users input text without relying on the right software being installed on the computer they happen to be using.
It couldn't be easier to get this on your page. Simply load the right package:
google.load("elements", "1", {
packages: "keyboard"
});
Then create a keyboard, specifying the keyboard layout and text field to bind to:
var kbd = new google.elements.keyboard.Keyboard(
[google.elements.keyboard.LayoutCode.Arabic],
['myTextArea']);
And here's what it looks like:
This gives you the control to provide a better user experience, even for multilingual websites. By creating multiple keyboards with different layouts, each text field can be bound to the appropriate keyboard - and the user will see only the keyboard attached to whichever text field has the focus.
But don't take my word for it - check out a sample for yourself. Notice that in addition to allowing users to click on the virtual keyboard, it also temporarily transforms the key assignments on their physical keyboard, allowing rapid typing for those users accustomed to a given layout.
With this initial release, we are launching 5 language layouts. They are:
- Arabic (العربية)
- Hindi (हिन्दी)
- Polish (Polski)
- Russian (Русский)
- Thai (ไทย)
We plan to roll out support for more keyboard layouts in the future. But in the meantime, read through the class reference and see the rest of the Code Playground samples.
After you've had time to experiment, let us know what you think and which other layouts you'd like to see. Feedback is always welcome in our support forum and IRC channel.
Behind the scenes with two AJAX API Developers
AroundMe
AroundMe is an iPhone application where the AJAX APIs are central to the user experience. Specifically, the app utilizes the Local Search API to enable users to find information about their surroundings. In the videos below, Marco Pifferi (the developer behind AroundMe) gives a demo of his app, his thoughts on using the AJAX APIs, and tips for integrating them in mobile apps.
Mibbit
Mibbit is a web-based chat application that uses the AJAX APIs to enhance the chat experience. The language APIs help users to translate their messages into a number of different languages. Mibbit also uses the Maps API and YouTube API to display embedded maps or YouTube videos if a user includes a Maps or YouTube URL in their chat. Jimmy Moore, creator of Mibbit, walks through Mibbit in his video below.
Do you have a great app that uses the AJAX APIs? Submit a video about it and we may feature it on this blog. Questions? Stop by our support forum or IRC channel.
8 Temmuz 2012 Pazar
Custom Search with Custom Style: Peanut Butter and Jelly
Creating a custom look and feel for your website can have significant benefits in everything from improving usability to setting a professional or playful tone for your website. In many cases, letting users search the content of your site and related sites gets them the information they need faster. After all, a speedy user experience is a happy user experience. Here are some examples of how Custom Search and custom styles are as easy (and delicious) as peanut butter and jelly.
We start with a Custom Search Element, which uses the CustomSearchControl to add a Custom Search Engine to my web page. If you've never used a Custom Search engine before, I think you'll find a lot to love: it uses Google's search technology to include a specific group of websites for indexing, and you can share in ad revenue.
One of the many benefits of using the AJAX Search APIs to dynamically add search capabilities to your web pages is that you can also control the look and feel of the search input and results by using open web standards like cascading style sheets (CSS). When you combine this styling with Custom Search, you can create a seamless search experience for your users.
You can begin by changing the search input box (dynamically added to your page by default) to use an input box that you've placed on the page wherever you like.
// Set drawing options to use my text box
// as input instead of having the library create one.
var drawOptions = new google.search.DrawOptions();
drawOptions.setInput(
document.getElementById('query_input'));
// Draw the control in content div
customSearchControl.draw(
'results', drawOptions);
With the above changes we get a page that looks like this:
Now that we're able to use a Custom Search box (look ma, no button push required!) we can add CSS rules to change fonts, colors, and more in the search results.
For example, see: http://ajax-apis.appspot.com/cse-style which has CSS rules that effect the styling of the search results and compare it to our first step which uses the default styles.
Take a look at the CSS rules to get an idea for how this works, and how you can do custom styling to fit your own website.
We can change the font and add a border around each search result:
#results .gsc-results {
/* Sets font for titles, snippets, and URLs. */
font-family: arial, helvetica, sans-serif;
}
#results .gsc-result {
position: relative;
border: 1px solid #eee;
border-left: 10px solid #eee;
padding: 8px 8px 8px 20px;
border-radius: 8px;
-webkit-border-radius: 8px;
-moz-border-radius: 8px;
}
We can also change the style of a single result when the user moves the mouse cursor over it:
#results .gsc-result:hover {
border: 1px solid #888;
border-left: 10px solid #888;
}
These are just a couple of examples. Since the CSS styling is handled by the browser, you can use any supported CSS rules to select portions of the search results. For more information on the result HTML structure and the CSS classes you may want to select in your own customizations, see the documentation on styling AJAX Search results.
More Languages, More Keyboards
The language APIs keep right on trucking, released recently are a handful of new translation languages, pairs, and keyboard layouts.
We've added the ability to use machine translation to or from the following languages:
- Afrikaans
- Belarusian
- Icelandic
- Irish
- Macedonian
- Malay
- Persian
- Swahili
- Welsh
- Yiddish
With the addition of the above the total count for language pair combination comes to a mind boggling 2550 pairs. In addition, we find the above additions exciting because, for the first time, African languages are available through the API and we now support all 23 Official European Union languages.
A few months ago we announce our virtual keyboard API and this month we've added nine new keyboard layouts:
- Bulgarian
- Czech
- Greek
- Hebrew
- Hungarian - 101 layout
- Slovak
- Slovenian
- Turkish - Q layout
- Ukrainian - 101 layout
Here's a simple example of using the Slovak keyboard layout.
Google Chrome Frame Ajax Detection
In partnership with the Google Chrome Frame team, we are making available a library to allow your web application to detect the presence of Google Chrome Frame. We on the Ajax team are excited about the possibilities of this add-on improving JavaScript performance and enabling some of the new features available in HTML5. If you have a web application which makes use of these new features, you can use this library to prompt the user to install Google Chrome Frame, or recognize when a user has just installed it. The library provides granular controls so that you can create the user experience which best suits your site.
As a short example, I've created the following simple demo which just detects whether Google Chrome Frame is installed or not with an alternate message if you are in a browser for which this plugin is not available.
Do you have Google Chrome Frame installed?We're checking on that now.
Ben Lisbakken has also added detection for Google Chrome Frame to the Ajax Playground. If you view source on the page you can see another example of a customized CFInstall.check implementation which is designed to fit the page.
For more details on the Google Chrome Frame Ajax API, see the documentation and for questions, please visit the discussion group.
Web Search in Your Country
I am happy to announce the addition of the ability to scope your searches to a specific country in the AJAX Web Search API. Now, if you have a lot of visitors in Madagascar, you can make sure that the search results displayed on your site are tailored to them. All it takes is a small change to your code.
There are three possible ways to implement, depending on how you're using the API:
- If you use the loader, you can simply load jsapi on the domain you're interested in (example), such as:
<script src="http://www.google.es/jsapi"></script>
- Alternately, you can set this with the web search object's .setRestriction method (example):
var ws = new google.search.WebSearch();
ws.setRestriction(google.search.Search.RESTRICT_EXTENDED_ARGS,
{'gl' : 'es'}); - Finally, if you're using the RESTful interface, all you have to do is append a "gl" URL parameter to your request:
http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=flowers&gl=fr
Most valid country codes will work, as long as Google has a home page on that country's top level domain (e.g. google.es). If you use an invalid or unsupported country code, you'll get an error message letting you know.
We're excited to bring you this addition to the API, and look forward to seeing the innovative ways in which you use this new feature to improve your users' experience. Please drop us a line with your thoughts (or questions) on our discussion group.
New Parameter for Server Side API Calls
To help us discourage this behavior without affecting legitimate developers, we're adding a new parameter to the RESTful interface, userip. With this parameter, developers have the option of supplying the IP address of the end-user on whose behalf they are making the API request. Doing so will help us distinguish this legitimate server-side traffic from the more abusive scraping in which there are no end-users.
Use of this new parameter is *not* required. However, if it is not included with server-side requests, those requests are more likely to be interpreted and automatically blocked as abuse, especially in situations where a server is sending a high volume of traffic to the API. Additional safeguards you can take include setting setting a valid HTTP referer (as required by our Terms of Use) and using an API key. These additional measures will help us contact you in case there are problems with your website or application (sometimes a programming error results in massive traffic, forcing us to block your access if we are unable to contact you). In choosing to utilize this parameter, please be sure that you're in compliance with any local laws, including any laws relating to disclosure of personal information being sent.
Check the documentation for usage of the new parameter. We'd love to hear any comments, questions or problems you're having in the support forum.
7 Temmuz 2012 Cumartesi
Celebrate freedom. Support a free and open Internet.
In the summer of 1776, 13 disenfranchised colonies spoke. It took days for their declaration to be printed and distributed throughout the colonies, and it took weeks for it to be seen across the Atlantic.
Today, such a document could be published and shared with the world in seconds. More than any time in history, more people in more places have the ability to have their voices heard.
Powering these voices are billions of Internet connections around the world—people on their mobile phones, tablets, laptops and desktops. The Internet is a powerful platform that makes it easier for people to speak, to assemble, and to be heard. This is true no matter where freedom is taking root.
Yet we’ve only just begun to see what a free and open Internet can do for people and for the freedom we cherish.
Today we’re sharing a video we made to celebrate our freedom and the tools that support it. Please take a moment to watch it, share it with your friends, and add your voice.
Join us in supporting a free and open Internet.
Posted by Susan Molinari, Vice President of Public Policy and Government Affairs, Americas
Excellent Papers for 2011
UPDATE: Added Theo Vassilakis as an author for "Dremel: Interactive Analysis of Web-Scale Datasets"
Googlers across the company actively engage with the scientific community by publishing technical papers, contributing open-source packages, working on standards, introducing new APIs and tools, giving talks and presentations, participating in ongoing technical debates, and much more. Our publications offer technical and algorithmic advances, feature aspects we learn as we develop novel products and services, and shed light on some of the technical challenges we face at Google.
In an effort to highlight some of our work, we periodically select a number of publications to be featured on this blog. We first posted a set of papers on this blog in mid-2010 and subsequently discussed them in more detail in the following blog postings. In a second round, we highlighted new noteworthy papers from the later half of 2010. This time we honor the influential papers authored or co-authored by Googlers covering all of 2011 -- covering roughly 10% of our total publications. It’s tough choosing, so we may have left out some important papers. So, do see the publications list to review the complete group.
In the coming weeks we will be offering a more in-depth look at these publications, but here are some summaries:
Audio processing
“Cascades of two-pole–two-zero asymmetric resonators are good models of peripheral auditory function”, Richard F. Lyon, Journal of the Acoustical Society of America, vol. 130 (2011), pp. 3893-3904.
Lyon's long title summarizes a result that he has been working toward over many years of modeling sound processing in the inner ear. This nonlinear cochlear model is shown to be "good" with respect to psychophysical data on masking, physiological data on mechanical and neural response, and computational efficiency. These properties derive from the close connection between wave propagation and filter cascades. This filter-cascade model of the ear is used as an efficient sound processor for several machine hearing projects at Google.
Electronic Commerce and Algorithms
“Online Vertex-Weighted Bipartite Matching and Single-bid Budgeted Allocations”, Gagan Aggarwal, Gagan Goel, Chinmay Karande, Aranyak Mehta, SODA 2011.
The authors introduce an elegant and powerful algorithmic technique to the area of online ad allocation and matching: a hybrid of random perturbations and greedy choice to make decisions on the fly. Their technique sheds new light on classic matching algorithms, and can be used, for example, to pick one among a set of relevant ads, without knowing in advance the demand for ad slots on future web page views.
“Milgram-routing in social networks”, Silvio Lattanzi, Alessandro Panconesi, D. Sivakumar, Proceedings of the 20th International Conference on World Wide Web, WWW 2011, pp. 725-734.
Milgram’s "six-degrees-of-separation experiment" and the fascinating small world hypothesis that follows from it, have generated a lot of interesting research in recent years. In this landmark experiment, Milgram showed that people unknown to each other are often connected by surprisingly short chains of acquaintances. In the paper we prove theoretically and experimentally how a recent model of social networks, "Affiliation Networks", offers an explanation to this phenomena and inspires interesting technique for local routing within social networks.
“Non-Price Equilibria in Markets of Discrete Goods”, Avinatan Hassidim, Haim Kaplan, Yishay Mansour, Noam Nisan, EC, 2011.
We present a correspondence between markets of indivisible items, and a family of auction based n player games. We show that a market has a price based (Walrasian) equilibrium if and only if the corresponding game has a pure Nash equilibrium. We then turn to markets which do not have a Walrasian equilibrium (which is the interesting case), and study properties of the mixed Nash equilibria of the corresponding games.
HCI
“From Basecamp to Summit: Scaling Field Research Across 9 Locations”, Jens Riegelsberger, Audrey Yang, Konstantin Samoylov, Elizabeth Nunge, Molly Stevens, Patrick Larvie, CHI 2011 Extended Abstracts.
The paper reports on our experience with a basecamp research hub to coordinate logistics and ongoing real-time analysis with research teams in the field. We also reflect on the implications for the meaning of research in a corporate context, where much of the value may be less in a final report, but more in the curated impressions and memories our colleagues take away from the the research trip.
“User-Defined Motion Gestures for Mobile Interaction”, Jaime Ruiz, Yang Li, Edward Lank, CHI 2011: ACM Conference on Human Factors in Computing Systems, pp. 197-206.
Modern smartphones contain sophisticated sensors that can detect rich motion gestures — deliberate movements of the device by end-users to invoke commands. However, little is known about best-practices in motion gesture design for the mobile computing paradigm. We systematically studied the design space of motion gestures via a guessability study that elicits end-user motion gestures to invoke commands on a smartphone device. The study revealed consensus among our participants on parameters of movement and on mappings of motion gestures onto commands, by which we developed a taxonomy for motion gestures and compiled an end-user inspired motion gesture set. The work lays the foundation of motion gesture design—a new dimension for mobile interaction.
Information Retrieval
“Reputation Systems for Open Collaboration”, B.T. Adler, L. de Alfaro, A. Kulshrestra, I. Pye, Communications of the ACM, vol. 54 No. 8 (2011), pp. 81-87.
This paper describes content based reputation algorithms, that rely on automated content analysis to derive user and content reputation, and their applications for Wikipedia and google Maps. The Wikipedia reputation system WikiTrust relies on a chronological analysis of user contributions to articles, metering positive or negative increments of reputation whenever new contributions are made. The Google Maps system Crowdsensus compares the information provided by users on map business listings and computes both a likely reconstruction of the correct listing and a reputation value for each user. Algorithmic-based user incentives ensure the trustworthiness of evaluations of Wikipedia entries and Google Maps business information.
Machine Learning and Data Mining
“Domain adaptation in regression”, Corinna Cortes, Mehryar Mohri, Proceedings of The 22nd International Conference on Algorithmic Learning Theory, ALT 2011.
Domain adaptation is one of the most important and challenging problems in machine learning. This paper presents a series of theoretical guarantees for domain adaptation in regression, gives an adaptation algorithm based on that theory that can be cast as a semi-definite programming problem, derives an efficient solution for that problem by using results from smooth optimization, shows that the solution can scale to relatively large data sets, and reports extensive empirical results demonstrating the benefits of this new adaptation algorithm.
“On the necessity of irrelevant variables”, David P. Helmbold, Philip M. Long, ICML, 2011
Relevant variables sometimes do much more good than irrelevant variables do harm, so that it is possible to learn a very accurate classifier using predominantly irrelevant variables. We show that this holds given an assumption that formalizes the intuitive idea that the variables are non-redundant. For problems like this it can be advantageous to add many additional variables, even if only a small fraction of them are relevant.
“Online Learning in the Manifold of Low-Rank Matrices”, Gal Chechik, Daphna Weinshall, Uri Shalit, Neural Information Processing Systems (NIPS 23), 2011, pp. 2128-2136.
Learning measures of similarity from examples of similar and dissimilar pairs is a problem that is hard to scale. LORETA uses retractions, an operator from matrix optimization, to learn low-rank similarity matrices efficiently. This allows to learn similarities between objects like images or texts when represented using many more features than possible before.
Machine Translation
“Training a Parser for Machine Translation Reordering”, Jason Katz-Brown, Slav Petrov, Ryan McDonald, Franz Och, David Talbot, Hiroshi Ichikawa, Masakazu Seno, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP '11).
Machine translation systems often need to understand the syntactic structure of a sentence to translate it correctly. Traditionally, syntactic parsers are evaluated as standalone systems against reference data created by linguists. Instead, we show how to train a parser to optimize reordering accuracy in a machine translation system, resulting in measurable improvements in translation quality over a more traditionally trained parser.
“Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation”, Ashish Venugopal, Jakob Uszkoreit, David Talbot, Franz Och, Juri Ganitkevitch, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).
We propose a general method to watermark and probabilistically identify the structured results of machine learning algorithms with an application in statistical machine translation. Our approach does not rely on controlling or even knowing the inputs to the algorithm and provides probabilistic guarantees on the ability to identify collections of results from one’s own algorithm, while being robust to limited editing operations.
“Inducing Sentence Structure from Parallel Corpora for Reordering”, John DeNero, Jakob Uszkoreit, Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).
Automatically discovering the full range of linguistic rules that govern the correct use of language is an appealing goal, but extremely challenging. Our paper describes a targeted method for discovering only those aspects of linguistic syntax necessary to explain how two different languages differ in their word ordering. By focusing on word order, we demonstrate an effective and practical application of unsupervised grammar induction that improves a Japanese to English machine translation system.
Multimedia and Computer Vision
“Kernelized Structural SVM Learning for Supervised Object Segmentation”, Luca Bertelli, Tianli Yu, Diem Vu, Burak Gokturk,Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2011.
The paper proposes a principled way for computers to learn how to segment the foreground from the background of an image given a set of training examples. The technology is build upon a specially designed nonlinear segmentation kernel under the recently proposed structured SVM learning framework.
“Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths”, Matthias Grundmann, Vivek Kwatra, Irfan Essa, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011).
Casually shot videos captured by handheld or mobile cameras suffer from significant amount of shake. Existing in-camera stabilization methods dampen high-frequency jitter but do not suppress low-frequency movements and bounces, such as those observed in videos captured by a walking person. On the other hand, most professionally shot videos usually consist of carefully designed camera configurations, using specialized equipment such as tripods or camera dollies, and employ ease-in and ease-out for transitions. Our stabilization technique automatically converts casual shaky footage into more pleasant and professional looking videos by mimicking these cinematographic principles. The original, shaky camera path is divided into a set of segments, each approximated by either constant, linear or parabolic motion, using an algorithm based on robust L1 optimization. The stabilizer has been part of the YouTube Editor (youtube.com/editor) since March 2011.
“The Power of Comparative Reasoning”, Jay Yagnik, Dennis Strelow, David Ross, Ruei-Sung Lin, International Conference on Computer Vision (2011).
The paper describes a theory derived vector space transform that converts vectors into sparse binary vectors such that Euclidean space operations on the sparse binary vectors imply rank space operations in the original vector space. The transform a) does not need any data-driven supervised/unsupervised learning b) can be computed from polynomial expansions of the input space in linear time (in the degree of the polynomial) and c) can be implemented in 10-lines of code. We show competitive results on similarity search and sparse coding (for classification) tasks.
NLP
“Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections”, Dipanjan Das, Slav Petrov, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL '11), 2011, Best Paper Award.
We would like to have natural language processing systems for all languages, but obtaining labeled data for all languages and tasks is unrealistic and expensive. We present an approach which leverages existing resources in one language (for example English) to induce part-of-speech taggers for languages without any labeled training data. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in a hidden Markov model trained with the Expectation Maximization algorithm.
Networks
“TCP Fast Open”, Sivasankar Radhakrishnan, Yuchung Cheng, Jerry Chu, Arvind Jain, Barath Raghavan, Proceedings of the 7th International Conference on emerging Networking EXperiments and Technologies (CoNEXT), 2011.
TCP Fast Open enables data exchange during TCP’s initial handshake. It decreases application network latency by one full round-trip time, a significant speedup for today's short Web transfers. Our experiments on popular websites show that Fast Open reduces the whole-page load time over 10% on average, and in some cases up to 40%.
“Proportional Rate Reduction for TCP”, Nandita Dukkipati, Matt Mathis, Yuchung Cheng, Monia Ghobadi, Proceedings of the 11th ACM SIGCOMM Conference on Internet Measurement 2011, Berlin, Germany - November 2-4, 2011.
Packet losses increase latency of Web transfers and negatively impact user experience. Proportional rate reduction (PRR) is designed to recover from losses quickly, smoothly and accurately by pacing out retransmissions across received ACKs during TCP’s fast recovery. Experiments on Google Web and YouTube servers in U.S. and India demonstrate that PRR reduces the TCP latency of connections experiencing losses by 3-10% depending on response size.
Security and Privacy
“Automated Analysis of Security-Critical JavaScript APIs”, Ankur Taly, Úlfar Erlingsson, John C. Mitchell, Mark S. Miller, Jasvir Nagra, IEEE Symposium on Security & Privacy (SP), 2011.
As software is increasingly written in high-level, type-safe languages, attackers have fewer means to subvert system fundamentals, and attacks are more likely to exploit errors and vulnerabilities in application-level logic. This paper describes a generic, practical defense against such attacks, which can protect critical application resources even when those resources are partially exposed to attackers via software interfaces. In the context of carefully-crafted fragments of JavaScript, the paper applies formal methods and semantics to prove that these defenses can provide complete, non-circumventable mediation of resource access; the paper also shows how an implementation of the techniques can establish the properties of widely-used software, and find previously-unknown bugs.
“App Isolation: Get the Security of Multiple Browsers with Just One”, Eric Y. Chen, Jason Bau, Charles Reis, Adam Barth, Collin Jackson, 18th ACM Conference on Computer and Communications Security, 2011.
We find that anecdotal advice to use a separate web browser for sites like your bank is indeed effective at defeating most cross-origin web attacks. We also prove that a single web browser can provide the same key properties, for sites that fit within the compatibility constraints.
Speech
“Improving the speed of neural networks on CPUs”, Vincent Vanhoucke, Andrew Senior, Mark Z. Mao, Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.
As deep neural networks become state-of-the-art in real-time machine learning applications such as speech recognition, computational complexity is fast becoming a limiting factor in their adoption. We show how to best leverage modern CPU architectures to significantly speed-up their inference.
“Bayesian Language Model Interpolation for Mobile Speech Input”, Cyril Allauzen, Michael Riley, Interspeech 2011.
Voice recognition on the Android platform must contend with many possible target domains - e.g. search, maps, SMS. For each of these, a domain-specific language model was built by linearly interpolating several n-gram LMs from a common set of Google corpora. The current work has found a way to efficiently compute a single n-gram language model with accuracy very close to the domain-specific LMs but with considerably less complexity at recognition time.
Statistics
“Large-Scale Parallel Statistical Forecasting Computations in R”, Murray Stokely, Farzan Rohani, Eric Tassone, JSM Proceedings, Section on Physical and Engineering Sciences, 2011.
This paper describes the implementation of a framework for utilizing distributed computational infrastructure from within the R interactive statistical computing environment, with applications to timeseries forecasting. This system is widely used by the statistical analyst community at Google for data analysis on very large data sets.
Structured Data
“Dremel: Interactive Analysis of Web-Scale Datasets”, Sergey Melnik, Andrey Gubarev, Jing Jing Long, Geoffrey Romer, Shiva Shivakumar, Matt Tolton, Theo Vassilakis, Communications of the ACM, vol. 54 (2011), pp. 114-123.
Dremel is a scalable, interactive ad-hoc query system. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. Besides continued growth internally to Google, Dremel now also backs an increasing number of external customers including BigQuery and UIs such as AdExchange front-end.
“Representative Skylines using Threshold-based Preference Distributions”, Atish Das Sarma, Ashwin Lall, Danupon Nanongkai, Richard J. Lipton, Jim Xu, International Conference on Data Engineering (ICDE), 2011.
The paper adopts principled approach towards representative skylines and formalizes the problem of displaying k tuples such that the probability that a random user clicks on one of them is maximized. This requires mathematically modeling (a) the likelihood with which a user is interested in a tuple, as well as (b) how one negotiates the lack of knowledge of an explicit set of users. This work presents theoretical and experimental results showing that the suggested algorithm significantly outperforms previously suggested approaches.
“Hyper-local, directions-based ranking of places”, Petros Venetis, Hector Gonzalez, Alon Y. Halevy, Christian S. Jensen, PVLDB, vol. 4(5) (2011), pp. 290-30.
Click through information is one of the strongest signals we have for ranking web pages. We propose an equivalent signal for raking real world places: The number of times that people ask for precise directions to the address of the place. We show that this signal is competitive in quality with human reviews while being much cheaper to collect, we also show that the signal can be incorporated efficiently into a location search system.
Systems
“Power Management of Online Data-Intensive Services”, David Meisner, Christopher M. Sadler, Luiz André Barroso, Wolf-Dietrich Weber, Thomas F. Wenisch, Proceedings of the 38th ACM International Symposium on Computer Architecture, 2011.
Compute and data intensive Web services (such as Search) are a notoriously hard target for energy savings techniques. This article characterizes the statistical hardware activity behavior of servers running Web search and discusses the potential opportunities of existing and proposed energy savings techniques.
“The Impact of Memory Subsystem Resource Sharing on Datacenter Applications”, Lingjia Tang, Jason Mars, Neil Vachharajani, Robert Hundt, Mary-Lou Soffa, ISCA, 2011.
In this work, the authors expose key characteristics of an emerging class of Google-style workloads and show how to enhance system software to take advantage of these characteristics to improve efficiency in data centers. The authors find that across datacenter applications, there is both a sizable benefit and a potential degradation from improperly sharing micro-architectural resources on a single machine (such as on-chip caches and bandwidth to memory). The impact of co-locating threads from multiple applications with diverse memory behavior changes the optimal mapping of thread to cores for each application. By employing an adaptive thread-to-core mapper, the authors improved the performance of the datacenter applications by up to 22% over status quo thread-to-core mapping, achieving performance within 3% of optimal.
“Language-Independent Sandboxing of Just-In-Time Compilation and Self-Modifying Code”, Jason Ansel, Petr Marchenko, Úlfar Erlingsson, Elijah Taylor, Brad Chen, Derek Schuff, David Sehr, Cliff L. Biffle, Bennet S. Yee, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2011.
Since its introduction in the early 90's, Software Fault Isolation, or SFI, has been a static code technique, commonly perceived as incompatible with dynamic libraries, runtime code generation, and other dynamic code. This paper describes how to address this limitation and explains how the SFI techniques in Google Native Client were extended to support modern language implementations based on just-in-time code generation and runtime instrumentation. This work is already deployed in Google Chrome, benefitting millions of users, and was developed over a summer collaboration with three Ph.D. interns; it exemplifies how Research at Google is focused on rapidly bringing significant benefits to our users through groundbreaking technology and real-world products.
“Thialfi: A Client Notification Service for Internet-Scale Applications”, Atul Adya, Gregory Cooper, Daniel Myers, Michael Piatek,Proc. 23rd ACM Symposium on Operating Systems Principles (SOSP), 2011, pp. 129-142.
This paper describes a notification service that scales to hundreds of millions of users, provides sub-second latency in the common case, and guarantees delivery even in the presence of a wide variety of failures. The service has been deployed in several popular Google applications including Chrome, Google Plus, and Contacts.