LOADING...

BLOG
Check out our progress!
UPDATE 8
Demo, Demo, Demo
MAR 30, 2021
By: Ali Toyserkani

Summary

After completing the mechanical, electrical, and software builds, it was time to test the end-to-end system, and see whether or not our system meets our project objectives. The video below shows a compilation of some of the tests we performed. We analyze the performance of the individual subsystems in a controlled setting, as well as the end-to-end system in an outdoor setting.

UPDATE 7
Introducing, EyeMove Technologies
MAR 26, 2021
By: Ali Toyserkani

Summary

At this point, we have completed our system - it is time to showcase EyeMove's product to the world. A brief blurb about the product:

Patients with motor neuron conditions (e.g. ALS, cerebral palsy, muscular dystrophy) often lose control of their hands, robbing their ability to use traditional joysticks on electric wheelchairs. EyeMove's add-on assistive product enables patients to control their wheelchair with eye movements. By simply looking at where they want to go, the wheelchair moves in that direction. This provides an intuitive and non-intrusive interface for controlling existing powered wheelchairs, an interface that competitor assistive devices such as sip n' puff and head control systems do not provide.

Below is our marketing video which provides a glimpse of what EyeMove Technologies is about. Enjoy!

UPDATE 6
Gaze Tracking Pipeline
MAR 15, 2021
By: Saeejith Nair

Summary

In our previous blog post, we described how the final gaze tracking model did not meet our accuracy specification of being within ±5 degrees of a patient's exact gaze location. However, because the general magnitude and direction of the vector was correct, we decided to change our approach and user experience such that we could achieve robust control, even with slightly worse accuracy.

Image not found. Please try reloading.

Essentially, we determined that we could accurately map a user's gaze to one of five sections on a virtual plane parallel to the user's face. These sections could then be mapped to various system inputs such as turn left, turn right, drive forward, toggle start/stop, and toggle reverse - all the states required for robust wheelchair operation. Sufficient padding between each section as well as timed latching mechanisms ensured that users wouldn't keep accidentally switching between states. Once the state got calculated, it was converted to a velocity and sent to the motor control interface. The state machine diagram is shown below.

Image not found. Please try reloading.

This approach worked rather well, except for the occasions when the model failed to detect the facial landmarks required for computing the gaze vector. This was seen to happen in low light conditions, in environments with lots of shadows, in rainy weather, and when there was excessive glare on eye glasses. Thus, a way was required to interpolate between successful detections for the period when no gaze was detected. To do this, the gaze vector was passed through a linear Kalman Filter to interpolate between missed detections, and smooth out the noisy predictions. The Kalman filter took in the point of gaze as a 2D coordinate {x,y} and used a constant velocity motion model to predict the change in gaze over time. To account for the measurement error, the sensor was modelled with an uncertainty of N(0,102) pixels, a value that was empirically found to provide a good balance between minimizing accuracy while maximizing speed of convergence. The result was a real-time gaze tracking system that accurately mapped a user's gaze to the corresponding input state.

Image not found. Please try reloading.

UPDATE 5
Where are you looking?
MAR 9, 2021
By: Saeejith Nair

Summary

While the electrical interfacing and embedded systems work was being done, we were also working on improving the gaze tracking system and overall software pipeline. At first, we trained a machine learning model on the GazeCapture dataset based on the iTracker architecture . Based on a standard CNN architecture, this model was trained to take in cropped images of the left eye, right eye, face, as well as a binary mask indicating the location of the face on the original image, with the network outputting a point on the screen indicating the location of gaze. The following image shows the architecture in detail.

Image not found. Please try reloading.

This trained model yielded good results on the test dataset, however, it failed to perform as well for our use-case. After extensive debugging, it was determined that the root cause was due to irregular detection of facial landmarks (e.g. face and eye bounding boxes given an input image). Due to the difficulty in detecting consistently sized landmarks, the team decided to pivot from our custom trained model to a pre-trained model architecture released by NVIDIA: Few-Shot Adaptive Gaze Estimation.

Image not found. Please try reloading.

The above image (courtesy of Nvidia), shows the architecture of their novel architecture which allows further refinement of the network, by retraining it using images obtained from a personalized calibration sequence. This network was also advertised as being capable of being adapted to any new person to yield significant performance gains and an overall accuracy improvement of 19% compared to the iTracker architecture we previously tried. Thus, although it wasn't developed specifically for low-power edge devices (like the Nvidia Jetson we had selected as our compute platform), we decided to give it a shot. After building an inference pipeline that was able to run this model on the Jetson, qualitative testing however showed that the accuracy wasn't any better than iTracker. Even worse, the latency was significantly worse, taking almost 1.5 seconds to perform inference. This latency was far worse than our response speed specification of minimum 10Hz (100 ms), forcing us to look into two other alternatives:

  • Improve the current Nvidia FAZE model by employing techniques such as layer pruning, model quantization, and deployment using Nvidia's TensorRT framework.
  • Seek another model or architecture that can provide good results while running in real-time.

  • Due to the extensive development time required for the first approach, we decided to focus on the second approach for the time being, to get an end-to-end system fully working as soon as possible. How did we do this? We'd earlier tried running the the demo provided by Antoine Lamé on our Jetson, and found that it was incredibly fast and could run in real-time. Inspired by this, we dug into the demo and found that it relied on a pretrained model provided by DLIB for estimating facial landmarks and pupil locations. Thus, by extracting the pretrained model, we were able to build a real-time gaze-tracking pipeline around it which performed as follows.

    Image not found. Please try reloading.

    This gaze-tracking pipeline consisted of a series of cascaded blocks. A histogram of oriented gradients (HOG) feature extractor was used as a first stage to detect low level features from the frame - edges, corners, and their orientations. These features were then passed into a linear support vector machine (SVM) which grouped these features and determined whether a face was present in the image. If a face was present, it output a bounding box around the face, a bounding box around each eye, and the (x,y) coordinates of each pupil. The gaze estimation took in the bounding box and pupil location and computed the distance between the pupil and the edges of the eye bounding box. By taking the ratio of the distances from the pupil and right edges, the horizontal component of the gaze direction could be estimated; likewise for the vertical gaze component. The following figure depicts this graphically.

    Image not found. Please try reloading.

    The face bounding box is not explicitly used in determining the direction of the gaze, but rather as a filtering step to avoid false positives for the eye bounding boxes; the HOG features of a face are more uniquely identifiable than the HOG features that make up an eye. Therefore, constraining eye bounding boxes to be inside the face bounding box helps avoid false negatives in regions of the image that have HOG features similar to eyes. This pipeline worked well and met our latency specifications, as we could detect a patient's eye-gaze within 60ms. However, it did not consistently meet our accuracy specification of being within ±5 degrees of a patient's exact gaze location, although the general gaze direction and magnitude was correct.

    How did we get around this challenge? Find out in the next blog post!

    UPDATE 4
    Mechanical Assembly & System Build
    FEB 24, 2021
    By: Ali Toyserkani

    Summary

    To mount the internal and external, a popular 6560 aluminum material was purchased from McMaster Carr at lengths of 1 ft and 2 ft, with a 40x20 mm profile. This profile was selected to maximize rigidity, since symmetrical 20x20mm profiles are very susceptible to twist and bending in multiple axes. To connect the extrusions together at a 90 degree angle, a set of T-slotted gusset brackets were connected to rigidly connect the two pieces. At the end of the 2 ft extrusion, separate 3D printed mounts were made to securely mount each of the sensors in place to the aluminum extrusion. These mounts are attached using 4 M5 T-nuts to the aluminum extrusions. The adapter mounts were fabricated in a tough PLA plastic with a 3D printing device, at a 60% infill. The sensor mounts were fabricated at a 10 mm thickness to ensure that the sensors stay rigid with respect to the cantilevered extrusions, but also thin enough to allow for easy assembly and reconfiguration. The Logitech C920 cameras were secured to the printed sensor mounted through the provided tripod screw interface (a single 1/4” screw), making the attachment process simple. The RealSense D435i was also attached through a similar tripod interface. The RealSense T265 was attached to its sensor mount with two M3 screws which fasten to the back of the tracking sensor.

    Image not found. Please try reloading.

    Since there are a considerable amount of compute components, including the Jetson Xavier, Arduino Uno, signal processing board, and power converters, it was important to mount all the electronics securely to the back of the wheelchair, while keeping the compute components close together to decrease cable lengths and potential interference. The backplate was developed to mount all the electronics in a layout which minimizes cable length, while also allowing for extra components to be mounted later. The piece includes a 5 by 15 matrix of M3 screw holes to easily mount the smaller electronics such as the Arduino Uno, signal processing board, and the DB9 breakout board. It also includes additional custom spaced holes for the larger electronics including the Jetson Xavier and DC-DC converters. The backplate was fabricated using PLA plastic at 60% infill, to ensure that the rigid body can withstand the potential shock forces transmitted through harsh movements of the wheelchair base. The backplate is secured to the wheelchair with 5 1/4” screws on the slot location in the middle of the backplate, to a mountable aluminum bracket on the back of the wheelchair. This mounting bracket also holds the wheelchair’s internal system bus for connecting wheelchair specific devices (e.g. motor base and joystick), conveniently placing all the hardware in one place.

    Image not found. Please try reloading.

    After putting all the electronics onto the mechanical mounts and aluminum extrusions, the assemblies were mounted onto the wheelchair. The final system build can be seen below.

    Image not found. Please try reloading.

    Image not found. Please try reloading.

    UPDATE 3
    Embedded and Electrical Interface
    FEB 20, 2021
    By: Arjun Narayan

    Summary

    Now that we had a wheelchair, we needed a way to drive it without user input. All powered wheelchairs came with joysticks, but we needed some expandable interface we could integrate with our NVIDIA Jetson compute platform. We reached out to a Chief Engineer at Quantum who has been extremely helpful in our process. He pointed us to the right technical documentation, hardware, and necessary firmware upgrades. In short, the wheelchair uses a proprietary CAN-based system internally to communicate between the base and other devices. However, there are external devices we can utilize that expose a DB9 interface we can use with analog voltage signals, much easier than CAN. This was the avenue we decided to pursue. The DB9 interface is shown here:

    Image not found. Please try reloading.

    Pin 3 (center) expects the "neutral" voltage, which is half of the power voltage (12V), so 6V. This is the voltage at which the wheelchair will be at stand-still. By increasing the voltage slightly (< 2V) on Pins 1 & 2, a proportional change in the wheelchair's forward speed & angular speed (direction) will be applied. We found in other sources that the 2 power pins (7 & 9) must be shorted together, and also that Pin 8 acts as the ground for any external device. All other pins can remain untouched. Of the potential devices we attempted integrating with the wheelchair, the Quantum QLogic Enhanced Display worked the best for interfacing over DB9. Thus, our embedded interface looks like the following:

    Image not found. Please try reloading.

    We initially planned on passing analog voltages directly from our Xavier to the DB9 interface. However, this did not work as planned for several reasons. Primarily, none of our on-hand devices could output analog voltages, only PWMs at best. We also did not have easy access to a small DAC for conversion. Also, the voltages expected by the wheelchair were in the 4-8V range, and so could not be supplied by a 5V microcontroller directly. This all means some light signal processing is needed.


    To begin, we originally expected to output these PWMs from the Xavier. However, Jetpack requires the hardware PWM peripherals to be configured before flashing. This would require us to recongifure them & reflash the device, which would erase several weeks of work carefully setting up the ML environment and dependencies, and so using an Arduino as a middle-man was an easier alternative. The Jetson and Arduino talk over a USB serial connection with the downstream packets containing a "mock" value of the X & Y joystick deflections.


    To solve the first problem, a basic passive RC filter can be used to convert the PWM to an analog voltage between 0 - 5V. The Arduino can output PWMs (using hardware) at 62.5kHz. We referenced several articles for selecting component values, such as these:

    Accurate Fast Settling Analog Voltages from Digitial PWM Signals

    Using PWM to Generate an Analog Output

    After the PWM is now an analog voltage, it must be scaled to fit inside a 4-8V range. This is done with a simple non-inverting opamp amplifier that amplifies 2x the input signal. The final step is to ensure the incoming PWM signals are bounded to between 4-8V, which is easily done in the Arduino software. The final circuit diagram looks like so:

    Image not found. Please try reloading.

    The Enhanced Display (shown on the left) requires the "neutral" voltage to be the center of the power voltage, which is 12V. This is the purpose of the voltage divider on the left, with a voltage buffer going to Pin 3 (center reference). Pins 1 & 2 are then used for controlling speed & direction of the wheelchair. The final extra addition to the signal processing was an extra 10uF filter capacitor to remove any excess ripple or noise. Below is our initial breadboard mock-up of this circuit, and our final protoboard version:

    Image not found. Please try reloading. Image not found. Please try reloading. Image not found. Please try reloading.


    This circuit worked well-enough for our purposes, but there definetly are improvements to be made. For instance, there is no easy hardware kill-switch currently. If the Arduino were to fail and create irrational PWM outputs, there is no easy way to disconnect the hardware components without turning off the chair entirely.


    Now that Electrical is done, we can dive into the low-level embedded interface. Later posts will deal more with the pure software aspect of embedded, as well as with the motion controls. Now, the Arduino can only output a PWM at a 0-255 resolution, so our downstream message to the Arduino will be 1 byte each for the speed & direction values. The Arduino code will listen on the serial port for a package containing 2 bytes, or else will bring the wheelchair speed back to 0. The wheelchair base already has a velocity controller that handles the speed ramp up and down for us, and so this is not a concern. In terms of user control, we initially tested the setup by creating a small teleop interface using the keyboard. This lets us control the speed & direction magnitudes with the arrow keys, and enable us to sanity check everything worked together. The final step in testing was to mount the wheelchair up on cinderblocks. This allowed us to spin the wheels freely for testing without having the wheelchair run off on us (which it did once). Below is an early test video showcasing the initial breadboard setup and the resulting motion:

    This progress greatly enables the rest of the team to test on the wheelchair platform, and still has some work in cleanup and optimization to do. Stay tuned for much more!

    UPDATE 2
    System Architecture
    JAN 20, 2020
    By: Arjun Narayan

    The Problem

    Our goal is to enable powered wheelchair users to navigate with only their eyes. Without use of joysticks, head tilt, sip and puff, or other existing solutions that come with a laundry list of potential issues. In order to do this, we have 3 main problems to solve:

        1. Accurately Detect Eye Gaze


        2. Convert to Real-World Position


        3. Send Motion Commands to Wheelchair


    The individual solution to these 3 problems will be the subject of the remainder of the blog posts on this site.

    Initial Hardware Summary

    The eye tracking solution will make use of a simple common RGB cameras and a Machine Learning / Computer Vision based approach. In order to navigate it's environment and provide necessary safety for the user, the wheelchair will make use of modern autonomy solutions such as depth and tracking cameras. This is illustrated below, with the eye tracking example curtisy of Antoine Lamé.

    Basic Illustration

    Image not found. Please try reloading.

    Eye Tracking Example

    Image not found. Please try reloading.

    Compute Selection

    To begin our hardware summary, we focused on our compute hardware selection. The main requirement early on was something with enough raw AI computational power to support the simulatnous gaze tracking and autonomy stack. We immediately began researching the NVIDIA Jetson line and focused on the Xavier and Nano devices. Since we would be likely running 2 separate computationally intensive pipelines, the computational power was our main concern over power draw or size. Thus, we went with the NVIDIA Xavier AGX. Since these platforms are meant to be general purpose, they are great for this early stage while we iron out exactly what our pipelines will look like and exactly how much power we need.

    Image not found. Please try reloading.

    The Xavier runs Jetpack 4.4, an NVIDIA custom distro of Ubuntu 18. This drove our choice to use Robot Operating System (ROS) as our middle-ware since it easily supports Ubuntu-based systems. This will allow us to easily prototype our various software systems separately, and easily integrate them later during testing.

    Sensor Selection

    For our sensors we wanted devices that had easy open-source drivers that we could use, and that had sufficiently strong specifications. For autonomy we went with the Intel Realsense line because of their existing driver libraries, and many good thing we have seen others build with them. We opted for a depth camera to be used to obstacle avoidance and easy navigation, and a tracking camera for easy closed loop motion control at all levels. For gaze tracking we wanted an RGB camera that was cheap, common enough to purchase, and could operate fast enough. After some research we discovered the common V4L2 (Video for Linux 2) camera interface, and an open source ROS driver for it too. More research shows us the Logitech C920 HD camera that can do 720p at 60 fps, which was more than enough for our needs. The selection summary is shown below:

    Image not found. Please try reloading.

    UPDATE 1
    Wheelchair Acquired
    NOV 30, 2020
    By: Ali Toyserkani

    Summary

    In order to get the project started, we needed to purchase a wheelchair that we will be able to prototype on. We chose a Quantum Q6 Edge HD powered chair as it was the best in terms of value, features, and extensibility. As an added benefit we are receiving engineering help from a contact at Quantum. The wheelchair operates with 2 12V car batteries, which powers an internal AAM (Advanced Actuator Module) which control the motors.

    Photos

    Image not found. Please try reloading. Image not found. Please try reloading.