LvTwoComputationalHorsepower

Thu, 29 Mar 2001:

Here's my attempt to estimate the number of floating point operations per second required to implement a reasonable INS on LV2. For the uninterested the answer looks to be about half a million.

Basic assumptions:

Input consists of 4 accelerometers, and 3 rate-gyros, augmented by periodic GPS and altimeter readings
Output consists of full 6 degree of freedom [DOF] position and velocity, therefore 12 numbers
Output update frequency 10 Hz
Accelerometer input sample frequency 2500 Hz
Rate-gyro input sampled at 625 Hz

Further tentative assumptions:
Calculations which occur less frequently than 10 Hz do not significantly effect the computational load (Earth-rate, gravity map, normalization)
The highest rate integrations are treated adequately as simple sums
Multiplication and addition take the same amount of time

INS calculation:

Calculation	Input	Output	Clocks	Rate [Hz]
velocity increment	ai	Deltav	4	2500
angle increment	omegai	Deltath	3	625
body transformation	Deltath,v	thL, vL	78	625
coning increment	Deltath,thL	betaL	12	625
sculling increment	a, v	vscul	24	625
summation to m	L values	m values	4	625
rotation vector	am, betam	phim	1	100
rotator update	phim	R[b,b-1]	44	100
navigation transform	R[b,b-1],R[b,n]	R[b,n]	45	100
velocity rotation comp.	am, vm	vrot	20	100
body velocity inc.	v(m,scul,rot)	v^body	3	10
nav velocity	R[b,n],v^body, vgee	v^nav	47	10
gravity Coriolis inc.	x^nav, v^nav	v_gee	50	10
summation to x	v^nav	x^nav	9	10

Total INS flops == 97615 [floating point operations / second ]

Kalman filter calculation:

In our proposed algorithm the measurements are GPS and altimeter measurements, while the states are INS states (The GPS corrects for INS drift). The number of measurements is therefore 6+1, for 3 GPS position, 3 GPS velocity, and 1 altitude. Actually the altitude may be folded into the GPS prior to filtering, but that will take some computation too, and this calculation is approximate. The number of modeled states is somewhat selectable, but includes at least the 4 accelerometer biases, 3 gyro biases, 3 IMU pointing errors, and one gravity bias. Easily a dozen more states could be thrown in, but it's probably enough to assume the terms stated, though perhaps we might add 4 accelerometer scale factors and 3 gyro scale factors. The total number of states would then be 18.

I have done the operation count given these assumptions, an arrived at about 50000. The details are hard to write out in text format but i would suggest that the calculation is only approximate, because the number of states is somewhat arbitrary since any implementation can always model more states in the quest for greater accuracy. In fact there are good reasons we might want to add states, but the complexity increases, and a cost vs benefit consideration enters. As a very poor rule of thumb the operations count scales like the square of the number of states (i'm sure better estimates are in the literature). So adding 8 more states would more than double the operation count.

The prevailing situation appears to be that we need at least the 50k flops for Kalman operation, and can probably use more. We should certainly not choose a processor that is marginal at the calculated level (150k flops) because the calculations are uncertain, and there will be considerable program overhead involving data manipulation in memory and stuffing the FPU. If the FPU is as fast as we hope, the overhead will probably be about the same or more as the FPU time.

However, even if my calculation is off by 2, and the overhead is twice the FPU time, and the Kalman flops double, the load is only:

200k * 3 * 2 == 600k operations / second

So any of the processors we have been considering should be adequate.

It's worth recalling that on LV1b we were consuming a very large part of the CPU clocks bit-banging things at the 2500 Hz rate. If we repeated this performance on LV2 the operation count could increase considerably, but it would still be hard to see consuming 40 million instructions per second, which is the slowest of the processors being considered.

I feel that even if there are major mistakes in these operation counts, the computational power available is sufficient to make a useful system. Perhaps scaling back in some spots, but still capturing the essential characteristics of our desired system.

What i don't want to hear, then, is that there is some stupid gotcha' like the power consumption triples when the FPU runs, or the FPU only runs at 10 MHz, or it takes 10 cycles to load the registers for each single cycle multiply, or that the multiply is single cycle, but a floating point add takes 5 clocks, etc.