We’ve said before that the future of UX is Artificial Intelligence because of the huge possibilities it opens up, but for most projects the benefits of AI have been out of reach, due to the (not unsubstantial) cost and technical implications. So, last Wednesday, when Google announced the Cloud Vision API, there was excitement among the more geekily-inclined Real Adventurers.
Suddenly huge AI potential is at our fingertips, for any project.
What is Google Cloud Vision API?
Google is seemingly on a mission to open up AI and related technology, such as Deep Learning, to the masses, through initiatives such as TensorFlow, the fruit of the brilliantly named Google Brain team. The Brain team have flexed their collective cerebral cortex once more and given us Cloud Vision, an API that allows your app or website to understand the content of images.
In Google’s words, ‘it changes the way applications understand images’.
For the less technical among you, an API is essentially a service that sends data back and forth over the Internet. Google’s APIs allow anyone to access the power of their cloud-based supercomputing. Want your app to include maps or directions? Use the Google Maps API. Want to harness the capabilities of Google Search on your website? No problem, use the Search API.
The Cloud Vision API allows your website or app to send images to Google’s servers and, in turn, receive data that describes the content of the image. So send it a photo of a beach, and its computer vision technology will analyse it and tell you that it contains a palm tree and a sun, almost instantaneously.
Anyone who has used the brilliant Google Photos app will have experienced the clever tech behind the Cloud Vision API. Google Photos automatically categorises the thousands of photos on your phone (the ‘average’ person takes 1,800 photos a year with their phone), allowing you to sort through your pictures in new ways. For example, Google Photos will automatically group all the photos of your cat together, so you can ‘paw’ over them at your leisure. This is only possible because the software knows what a cat looks like – and that’s the key to the power of the Cloud Vision API – it allows software such as websites and apps to ‘see’.
5 reasons this technology is awesome:
1. Object recognition
The biggest deal has to be the potential uses for object recognition in photos. See something you’d like to buy? Point your camera at it, and then find it for the cheapest price online. Maybe you have a healthy eating app – point your camera at a food item and see its likely nutritional values. Is the food safe to eat in pregnancy? The possibilities are vast and the experience should feel effortless for the user.
2. Facial detection
The API allows you to detect multiple faces within an image, along with the associated key facial attributes like emotional state or wearing headwear. Although Google have made it clear that it can’t personally identify people’s faces due to privacy issues, meaning it can’t be used for personalisation, the ability to detect people and their emotions still has great potential. Automated chat AI could harness this capability and respond differently, based on the user’s likely emotional state. Support lines could prioritise support queries from the most irate customers – or make them wait so they can cool down. You decide!
3. Make your products faster and more useful
There’s a tendency to think big with these new bits of tech, but I think it’s a good idea to think small as well. AI-powered micro-interactions could be an opportunity to make an interface more useful, faster and a delight to use. For example, if your coffee-themed app has a ‘share your latte art’ feature, why not suggest photos of latte art from the user’s phone or computer, instead of making them wade through all of their photos looking for them? Developers could also use the API to add metadata to their image catalogues to make it easier for people to find what they are looking for.
Moderation isn’t at the top of many people’s lists, but it remains important. On a large community or user-generated content project it can be a costly overhead, particularly if it means people trawling through millions of images looking for photos that break guidelines or terms and conditions.
The Cloud Vision API can tap into Google’s SafeSearch functionality and flag photos with inappropriate content (e.g. pornographic or violent). It can also detect popular product logos in photos, potentially useful in scenarios where logos or brands aren’t allowed in a competition entry. The API even has the ability to detect text within images, along with automatic language identification – another potentially useful tool in the moderation of user-driven content.
5. Beyond apps and websites
Google has made it clear that this technology isn’t just for websites and apps. Drones, robots, automated cars and the whole Internet of Things can benefit from being able to see and understand what they are looking at. A robot could approach someone smiling, but avoid someone aggressive (but let’s avoid building Robocop please).
Sony is already using the technology to process millions of pictures being taken by its Aerosense drones.
The boring (but important) bit
As always, we need to remember and respect people’s right to privacy and data protection, and let them know that their content will be processed by or stored on Google’s cloud servers. But if we use it in the right way, and lead with the benefits, it should be a no-brainer. Google also needs to unveil the pricing plan for the product – it’s likely to start off free and then have tiered pricing for ‘enterprise’-level access to the API (the more you use it, the more likely it is you will have to pay for it).
We are only just getting started with AI. As more of its power becomes publically available through APIs and open source software, consumers’ expectations will start to change. What seems pie in the sky today will be commonplace tomorrow. It’s an exciting time to be working in this industry.
Check out this video from the Google Cloud team to get those ideas flowing…