Movidius’ Fathom Neural Compute Stick isn't your conventional PC. It is instead designed to analyze pixels and provide the right context for images.
Fathom provides the much-needed horsepower for devices like drones, robots and cameras to run computer vision applications like image recognition. These devices alone typically don't have the ability to run computer vision applications.
Fathom uses an embedded version of Google's TensorFlow machine learning software for vision processing. The device can be plugged into the USB port of a device or a developer board like Raspberry Pi, which in turn can power a drone or robot. It needs a 64-bit Linux OS and 50MB of hard drive space.
With a Fathom stick, simple robots or drones could do a lot more than they typically can do now. For example, a drone could use the Fathom to avoid obstacles and automatically navigate to specific locations. Or when riding a bike, a helmet camera could automatically start recording video after identifying a certain object like a street sign.
It could also bring a higher level of situational awareness to IP-based home security systems. Connected cameras are expected to be able to differentiate between humans and animals, with the computing handled by a Fathom stick plugged in the USB port of a home security hub.
Other applications for Fathom include 3D modeling and scanning, immersive gaming, augmented reality and gesture recognition.
In a way, the Fathom is a smaller and more power-efficient version of the Nvidia Jetson TX1 developer board, which is also targeted at robots, drones, self-driving cars and Internet of Things devices. Fathom is like a mobile equivalent of the TX1 -- it doesn't have the raw horsepower, but it's very fast at doing specific vision recognition tasks while consuming less power.
Fathom was described as a "discrete deep learning accelerator," by Jack Dashwood, the marketing communications director at Movidius.
Fathom is based on the Myriad 2 processor already in DJI’s flagship Phantom 4 autonomous drone, which can sense obstacles. Dashwood couldn't say if Fathom could be plugged directly into products like GoPro.
Movidius estimated the price of Fathom to be under $100. An initial run will ship to researchers, hobbyists and companies that are developing, testing and playing with products. Fathom will become commercially available in the fourth quarter of this year.
Google is a major backer of Movidius' vision processing technology. The Myriad 2 chip will be in an upcoming next-generation deep learning device from Google, Dashwood said. He couldn't comment further about the Google device. Movidius processors have been used in a Google Project Tango tablet.
Fathom delivers 150 gigaflops of performance while consuming under 1.2 watts of power. The vision processing happens locally; there's no need for devices to connect to cloud services to recognize and identify images, Dashwood said.
Fathom relies on machine-learning to crunch images, and needs to be trained to analyze pixels and provide the right context to images. That entails the creation of rich data sets against which images can be verified. That learning model is usually developed on a PC, and then transferred to work with the TensorFlow software stack on the smaller Fathom.
In most cases, there are many pixels that must be analyzed in order to get a complete understanding of an image – for example, when a person is happy, the lips take on a different structure. There's no one way to train Fathom to recognize all images, and learning models may be different for cameras, drones, robots and self-driving cars.
The creation of the rich data sets needed for image understanding involves steps like classification and labeling of pixels. Fathom uses a combination of algorithms and pixel association to understand images. In machine learning models, sentiment and face recognition capabilities have become fairly common, while distance measurement and simultaneous localization and mapping -- which involves analyzing images to update a map -- remain a challenge, Dashwood said.
Fathom also has 12 vector processors that can be programmed to do a variety of tasks. The computer also has a custom GPU subsystem that is central to vision processing.