Hybrid Inference
Background
In my final year at RHUL I had a final project that was on the subject of Autonomous MAVs (Drones/Quadcopters). The original brief was to use planning techniques to perform some task of interest using an MAV, first simulated in Gazebo and using ROS and later using a DJI Matrice 100 to demonstrate in real life.
While I was at UCSD on my year abroad, I attended a number of guest lectures in the CS department which were all very interesting but one stood out to me because I saw some relevance to the project that I was carrying out at the time, the quadcopter project, but also to my upcoming final year project. The talk was about a proposed new technique called Hybrid Inference that proposed to unify mathematical models and Machine Learning Models. The relevance was that we were using Kalman Filters for pose estimation in our little quadcopter, and one of the applications mentioned in the talk was exactly that but only using Kalman Filters.
I proposed to evaluate the technique on the more complicated domain space of pose estimation in MAVs, evaluating the difference in performance between the Extended Kalman Filter and the Hybrid Inference implementation, both in relative error as well as real world performance by using the different filters in Simulation using Gazebo and ROS.
Initial Phase - Gathering Information
Initially I had two tracks of work, learning ROS and how to work with Gazebo to simulate an MAV for the evaluation work. And implementing the Hybrid Inference paper. I wrote to the professor that had given the talk at UCSD as the paper had not been published yet, even on Arxiv so he very kindly sent me a pre-print version of the paper which I could then work from.
ROS and Gazebo
ROS and Gazebo and widely used tools in the world of Robotics, ROS stands for Robot Operating System, it provides a level of abstraction for software developers to write software that controls Robots. It uses a distributed system model of a Robot, with sensors and motors each being services/nodes that can consume or provide data. It has bindings into a variety of languages and it supports heterogenous nodes, so you can have your sensor nodes running C++ and your logic nodes running Python if that is what makes sense to you. You just need to define the messages that pass between the nodes and the channels they read and write to and from.
It is a very flexible architecture but it can take some time to get your head around and become comfortable in writing code for. I found a very useful resource was https://www.robotigniteacademy.com/en/course/ros-basics-in-5-days_1_0/ which had an online environment that simplified the onboarding and environment setup so that you could quickly gain understanding of the concepts and reality of ROS without wasting too much time in setup at the start.
Unfortunately the documentation for ROS and Gazebo is pretty bad so it does take a lot of time and experimentation to get things right or until you get to useful and reliable resources.
Once I had become familiar with this I set up my local environment. There are a couple of gotchas and I had to setup my local system a couple of times before I got it right for the packages I was working with. Firstly you need Linux, and preferably Ubuntu as ROS is only really validated on Ubuntu. Each version of ROS is also only validated on a particular version of Ubuntu so if you need an older version of ROS you will probably also need the older version of Ubuntu that matches that.
Gazebo comes bundled with the most common way of installing ROS and that is almost always the one you want to use.
Hybrid Inference
Hybrid Inference was proposed to try and take advantage of both mathematical models as well as Machine Learning/Deep Learning. The concept is to use mathematical models to solve the problem proposed and then to use Neural Networks to learn the residual error. The proposed advantages were that you can reduce the amount of data needed dramatically to solve the same problem compared to Pure Deep Learning approaches, while gaining advantages in accuracy over purely mathematical models.
In order to unify the mathematical and Deep Learning models, a common structure is needed and for this the authors decided to use graphical models. So the Kalman Filter was reformulated to be a graphical model that uses message passing to arrive at the solution, and the Deep Learning component was formulated as a Graph Neural Network - specifically a Message Passing Neural Network. They have equivalent structures that means that the messages are equivalent and the Graph Neural Network can learn the residual error as intended.
The paper is now available here.
While the paper is very good and outlines the mathematical basis as well as exploring applications. I found it quite hard to extract any implementation from it. Thankfully the paper has an associated GitHub page here which was extremely helpful in understanding the practical aspects of implementing it.
I decided to reformulate the GNN section which turned out to be a major mistake. My thinking was that the authors implementation was more complicated than it needed to be and that I could reformulate it in such a way that the outputs would be identical but much simpler. This turns out to not preserve the gradient operations that are critical to effective training.
Simulation - and Integration Hell
In order to test I used a package from TUDarmstadt for simulating quadcopters in Gazebo. This did save me the time of creating or importing a quadcopter model and simulating it effectively in Gazebo and ROS however as the saying goes “there’s no such thing as free lunch”. That turned out to be very true in this case. The cost was complexity and difficulty in adapting their code to integrate my new pose estimation model.
The code for the quadcopter was spread across around 5 repositories and I ended up needing to modify all of them to change out this module. I need to review all the work again to fully understand why I needed to do that.
One of the major adaptations I needed to make was to change the mode of operation of the pose-estimation module. The standard EKF takes measurements at different frequencies and makes partial updates to the hidden state. With Hybrid Inference that is not possible, you must have a full measurement vector at every time step that you compute for. In addition you need to have around 50 measurements before you can start to compute effectively.
The final difference is the time, this is related to the mode of operation of the algorithm. With Kalman Filters there are only two operations per update, with Hybrid Inference it is turned into an optimisation operation which requires a number of rounds to converge on a new state. This takes significantly longer as these steps are effectively not parallelisable and so require 100-200 times the amount of time for each update step. This is guaranteed to be around 2 orders of magnitude slower than any KF based system.
Testing and Fallback option
In my testing on a 2015 MacBook Pro 15inch, in simulation it required around 1-2 seconds of computation to settle on a new state, in real life operation this would need to be completed in less than 10ms to recalculate the pose at 100Hz which is a pretty standard rate of update. There are two drawbacks to extrapolating this performance to real life operation and they push in opposite directions though I don’t know by how much.
First is that the laptop was running other software at the same time, a VM for Gazebo to run in, Gazebo itself and all the associated overheads of that as well as other software running on the Laptop directly. We can expect that this would at least halve the performance of the algorithm.
In the other direction is that it was running on fairly power hungry hardware. The CPU was an Intel i7-4780HK - which was made on the pretty old 22nm process. The power draw of the packages as reported by Intel Power Gadget averaged around 45W. We would expect any MAV to be running on sub 10W hardware and even accounting for advances in process node technology we would not expect to have comparable computational power available for just the pose estimation part of the MAVs operation.
Due to the extreme time required to simulate Hybrid Inference running in simulation (1000x slowdown) and it’s instability. I decided to omit the live demonstration and focus on the numerical analysis of the performance. I recorded a large dataset of random paths of the quadcopter in the simulation for training and evaluation to enable this.
Paper and Presentation - Lessons learned
I did write up my conclusions into a paper that I presented at DASC 2020. I recorded my presentation ahead of time and came to regret it as I spent far too much time explaning the basic concepts underlying the paper rather than on my work and my findings. This was a major learning moment for me and though it was pretty painful at the time I think it will serve me well in the future.
I learned several lessons through this process - primary among which is to pay attention to the gradient flows in Neural Network applications, they are crucial to the model actually fitting.
Final conclusions
My final conclusions:
First is that the technique is interesting in it’s concept however restricted in it’s practical applications at this time and in this domain in particular.