PSAS/ LvTwoComputationalHorsepower

Thu, 29 Mar 2001:

Here's my attempt to estimate the number of floating point operations per second required to implement a reasonable INS on LV2. For the uninterested the answer looks to be about half a million.

Basic assumptions:

  1. Input consists of 4 accelerometers, and 3 rate-gyros, augmented by periodic GPS and altimeter readings
  2. Output consists of full 6 degree of freedom [DOF] position and velocity, therefore 12 numbers
  3. Output update frequency 10 Hz
  4. Accelerometer input sample frequency 2500 Hz
  5. Rate-gyro input sampled at 625 Hz

    Further tentative assumptions:

  6. Calculations which occur less frequently than 10 Hz do not significantly effect the computational load (Earth-rate, gravity map, normalization)
  7. The highest rate integrations are treated adequately as simple sums
  8. Multiplication and addition take the same amount of time

INS calculation:

Calculation Input Output Clocks Rate [Hz]
velocity increment ai Deltav 4 2500
angle increment omegai Deltath 3 625
body transformation Deltath,v thL, vL 78 625
coning increment Deltath,thL betaL 12 625
sculling increment a, v vscul 24 625
summation to m L values m values 4 625
rotation vector am, betam phim 1 100
rotator update phim R[b,b-1] 44 100
navigation transform R[b,b-1],R[b,n] R[b,n] 45 100
velocity rotation comp. am, vm vrot 20 100
body velocity inc. v(m,scul,rot) v^body 3 10
nav velocity R[b,n],v^body, vgee v^nav 47 10
gravity Coriolis inc. x^nav, v^nav v_gee 50 10
summation to x v^nav x^nav 9 10

Total INS flops == 97615 [floating point operations / second ]

Kalman filter calculation:

In our proposed algorithm the measurements are GPS and altimeter measurements, while the states are INS states (The GPS corrects for INS drift). The number of measurements is therefore 6+1, for 3 GPS position, 3 GPS velocity, and 1 altitude. Actually the altitude may be folded into the GPS prior to filtering, but that will take some computation too, and this calculation is approximate. The number of modeled states is somewhat selectable, but includes at least the 4 accelerometer biases, 3 gyro biases, 3 IMU pointing errors, and one gravity bias. Easily a dozen more states could be thrown in, but it's probably enough to assume the terms stated, though perhaps we might add 4 accelerometer scale factors and 3 gyro scale factors. The total number of states would then be 18.

I have done the operation count given these assumptions, an arrived at about 50000. The details are hard to write out in text format but i would suggest that the calculation is only approximate, because the number of states is somewhat arbitrary since any implementation can always model more states in the quest for greater accuracy. In fact there are good reasons we might want to add states, but the complexity increases, and a cost vs benefit consideration enters. As a very poor rule of thumb the operations count scales like the square of the number of states (i'm sure better estimates are in the literature). So adding 8 more states would more than double the operation count.

The prevailing situation appears to be that we need at least the 50k flops for Kalman operation, and can probably use more. We should certainly not choose a processor that is marginal at the calculated level (150k flops) because the calculations are uncertain, and there will be considerable program overhead involving data manipulation in memory and stuffing the FPU. If the FPU is as fast as we hope, the overhead will probably be about the same or more as the FPU time.

However, even if my calculation is off by 2, and the overhead is twice the FPU time, and the Kalman flops double, the load is only:

200k * 3 * 2 == 600k operations / second

So any of the processors we have been considering should be adequate.

It's worth recalling that on LV1b we were consuming a very large part of the CPU clocks bit-banging things at the 2500 Hz rate. If we repeated this performance on LV2 the operation count could increase considerably, but it would still be hard to see consuming 40 million instructions per second, which is the slowest of the processors being considered.

I feel that even if there are major mistakes in these operation counts, the computational power available is sufficient to make a useful system. Perhaps scaling back in some spots, but still capturing the essential characteristics of our desired system.

What i don't want to hear, then, is that there is some stupid gotcha' like the power consumption triples when the FPU runs, or the FPU only runs at 10 MHz, or it takes 10 cycles to load the registers for each single cycle multiply, or that the multiply is single cycle, but a floating point add takes 5 clocks, etc.