GPS lag

I noticed that the GPS data seems to lag a bit (~10 sec) when I put it into a thread versus in the main program. I ran the example GPS code provided by EMLID with a few modifications: 4Hz update rate, velocity solution message, and writing everything to a file. I walked a box pattern outside and it gave what was expected.

Copied and pasted the code into a thread in a data logging script and noticed the GPS data lags. The main program runs at 50Hz and only writes data to a file when the gear switch is high. There are several other threads running as well that poll sensors for their current values. I don’t notice any lag in the data from the other sensor threads. A picture of the example program vs the threaded program GPS track is attached. The red track is the example script and the blue is the threaded data logging script. Each run was started as stopped in the same place (red) except the threaded version seems to have written the data of when I was heading over to the starting point even though the gear switch was low and not recorded the last leg. Its as if there is some kind of data buffer in the threaded version and the main program is pulling oldest data first. Any ideas?

void * GPS_Thread(void *arg){
if(gps.testConnection())
    {
	while(1){
		if (gps.decodeSingleMessage(Ublox::NAV_POSLLH, pos_data) == 1){
                gps_lat=pos_data[2]/10000000;
                gps_lon=pos_data[1]/10000000;
                gps_h=(pos_data[3]/1000)*3.28084;
            	}

		if (gps.decodeSingleMessage(Ublox::NAV_VELNED, pos_data) == 1){
                gps_Nv=(pos_data[1]/100)*3.28084;
		gps_Ev=(pos_data[2]/100)*3.28084;
		gps_Dv=(pos_data[3]/100)*3.28084;
		gps_3ds=(pos_data[4]/100)*3.28084;
                gps_2ds=(pos_data[5]/100)*3.28084;
                gps_course=pos_data[6]/100000;
	        }
		if (gps.decodeSingleMessage(Ublox::NAV_STATUS, pos_data) == 1){
                gps_stat=(int)pos_data[0];
		}
    		}
    } else {

        printf("Ublox test not passed\n");
	error_flag++;
    }
}

That probably happens because of the buffer overflow inside the Ublox receiver, seems like the threaded version isn’t reading the data fast enough.

Is there any way to speed it up?

Of course, for example you can set a small sleep time or call the method using a timer.

I added a usleep(200000) right after the last IF statement inside the while loop and that did not seem to help. GPS plot looked identical to the blue track. Am I putting the sleep function in the correct location?

I meant lowering the sleep value of a thread if there was any sleep. If you want a thread to be called on time, you need to switch its scheduling to SCHED_FIFO and give it some priority (shoudn’t be too high, lower than 50 would do).
Here’s how it works in APM:

I took a look at the scheduler file and tried to incorporate that without any success. GPS track still looks the same. I’m not great with programming so its possible that I’m not implementing this correctly or fully understanding it. I tried this inside the main program where all the other threads are created.

pthread_t GPSThread;    

struct sched_param param;
param.sched_priority = 40;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);
pthread_attr_setschedpolicy(&attr, SCHED_FIFO);
pthread_attr_setschedparam(&attr, &param);

if(pthread_create(&GPSThread, &attr, GPS_Thread, (void *)arg) != 0)
{
    error_flag++;
  printf("GPS thread error!\n");
}

pthread_attr_destroy(&attr);

Don’t know if it matters, but at the top of the main script I have this to get it to run in real time to meet the 50Hz update rate.

struct sched_param sp;
memset(&sp, 0, sizeof(sp));
sp.sched_priority = sched_get_priority_max(SCHED_FIFO);
sched_setscheduler(0, SCHED_FIFO, &sp);

Can you post all the code pls? I’m curious about how you use the gps_lat & gps_lon variables with the threaded version. I see no synchronization mechanism in place to notify the main thread when you have new valid data.

All code associated with the GPS has been posted in the previous posts. Just the GPS thread, thread setup, and then fprintf to a file along with other data. I can PM you the entire code if you like, its rather lengthy.

There is no synchronization to notify the main thread when new valid data is available. The way I have it written, based on testing of previous iterations of the code, is that almost every sensor has its own thread in which I assume it runs as fast as possible and data is taken when needed by the main program. The main program is set to run at 50Hz by measuring the time it took to complete the loop and adjusting the sleep time accordingly before the next loop. Data from all of the sensor threads is written to a file through an if statement in the main program when the gear switch is high. I initially had the main program read each sensor individually but found that the 50Hz loop rate was not achievable, so I put them into individual threads to speed things up. I just assumed that putting the GPS in a separate thread wouldn’t cause any problems.

I didn’t implement synchronization because the flight maneuvers I’m performing have very slow dynamics so even if the data is not quite matched up, it still provides a good average of the current conditions since it won’t be more than 0.02sec old. Except the GPS data which is only updated at 4Hz.

I don’t use the lat and lon for any computations, they are only for plotting of the flight path. Only N,E,D velocities, course, and altitude are used in post processing.

Even so, the variables are not atomic, so you might (and very probably will) get partial writes to those doubles if the GPS thread write to them at exactly the same time the main thread reads to write to file. This might result in random data corruption.
Regarding the lag, can you timestamp the data? You could store a timestamp everytime you get a GPS message and in the main thread write to file the message timestamp + the current timestamp. They should coincide plus/minus 2-300 ms.

I time stamped the GPS data the same way that I had been doing it through out the code and it gave some interesting results. After the last if statement, I added:

gettimeofday(&tvgps,NULL);
tgps = 1000000 * tvgps.tv_sec + tvgps.tv_usec;
tgps = (tgps - tstart) / 1000000.0;

where ‘tstart’ is defined in the main program after all of the initializations and just before the program heads into the main data collection loop so that all times can be referenced to it. The elapsed time timer in the main program correctly increments by 0.02 but the GPS timer reads 1225 plus minus 2 over the entire data collection time. I also tried defining a ‘tstart_gps’ variable at the beginning of the GPS thread but it yielded the same results. Does ‘gettimeofday’ work in threads or is there a different way I just be time stamping it?

I went back an added an integer counter that incremented by one at the end of the GPS loop and found that it did not increment every 0.25sec as it theoretically should. Sometimes it would increment by 0.25sec and other times it would take anywhere from half to a full second to increment. At least from what I can tell from it just printing out the integer on the screen since I can’t seem to get the time stamp to work.

gettimeofday is thread safe so it should work.

Do you mean that the GPS timer increments by 1225 ± 2 every iteration or that the result of (tgps - tstart) / 1000000.0 is 1225 ±2 all the time?

You should definitely use different variables in each thread or synchronize access to them using mutexes, otherwise you’ll get data races.

So far the problem you’re having seems like a threading issue - most probably a data race. Is it possible to post the entire code? If not - can you make a simpler sample that reproduces the issues and post it here?
I’d be glad to help pinpointing the issue.

With the help of a friend, we found the problem. The data logging script is using up nearly 100% of the CPU. Running the code in the background and using ‘top’ to check CPU usage, the program is using 97-98% of the CPU. I removed the GPS thread and the usage remained unchanged so I’m fairly confident that the lag was caused by excessive CPU usage. After recording the CPU usage of individual threads, we determined that the PPM decoder thread, and AHRS thread are the most resource hungry, especially the PPM thread. We were able to remove the GPS lag and reduce CPU usage from 97 to 90% just by changing the PPM ‘samplingRate’ from 1usec to 5usec. I’m assuming the interrupt to the read the GPIO pin is whats causing the high CPU usage. How slow can you go on the PPM sampling rate before ‘enough’ resolution is lost? (I’m guessing it is situationally dependent) Maybe its time to upgrade to the Pi2 …

Glad you sorted it out.
Regarding the PPM sample rate - I guess it depends on the minimum resolution you can live with. PPM is a sequence of several PWM signals, each one of max 1ms length. So with a sample rate of 5us you can get at most 200 steps per PWM signal (so per channel). 10us will give you 100 steps and this might be good enough.

What’s your CPU usage if you disable PPM? Maybe there’s something else going on there (like the GPS thread pooling for data).

Jean, with the PPM thread disabled CPU usage drops by 12%. I’ve been doing a bit of research and code restructuring and had a question about thread priorities. Mikhail posted the ardupilot scheduler file which got me thinking about assigning priorities to all of the threads based on their importance. Some questions I had are:

  1. I see all of the ardupilot threads are set as FIFO, would there be any benefit to setting a ‘round robin’ type priority for threads that are of somewhat equal importance so as to give them equal time?

  2. Based on sched(7) - Linux manual page priorities range from 1 (low) to 99 (high). How does one go about choosing priority values for each thread? The ardupilot threads appear to range from 10 to 15. Would this be a good starting point?

@larssoltmann

I see all of the ardupilot threads are set as FIFO, would there be any benefit to setting a ‘round robin’ type priority for threads that are of somewhat equal importance so as to give them equal time?

Main difference is that RR allows to suspend threads when their time slice is expired. In fact most RTOS use RR, but which suits better depends on what you’re working on.

Based on sched(7) - Linux manual page priorities range from 1 (low) to 99 (high). How does one go about choosing priority values for each thread? The ardupilot threads appear to range from 10 to 15. Would this be a good starting point?

51 is the priority of the interrupt handler threads, 99 is the priority of critical Linux stuff such as watchdogs. Simply use any values below 51.

Thanks for the info Mikhail. After some additional testing and adding back necessary sensors, I’ve found that the GPS lag is still present, but much much less. I’ve played around with a few things that bring up a few more questions:

  1. What does the thread priority value actually mean and is it the absolute value that counts or the value as compared to other threads? I.e. is there a difference between thread1 having 45 and thread2 having 44 and thread1 45 and thread2 20.

  2. What percentage of the CPU should be idle to be considered ‘safe’ from overloading it. I’ve got it down to 11-12% idle but like I said, there is still a bit of lag.

Continuing the search, I timed each thread and found that the GPS threads takes by far the most time. It takes exactly the amount of time given by the update rate so for my example the loop takes 0.25sec, give or take. Using the example code and some output statements, it appears the code waits for the message statement to become true instead of moving on. I would have thought that if the statement condition was not met, the loop would move on. Timing the if statements gives (sec).

If 1: 0.104347
If 2: 0.115723
If 3: 0.246686

If 1: 0.107524
If 2: 0.118456
If 3: 0.250111

If 1: 0.107468
If 2: 0.118683
If 3: 0.250055

The if statements correspond the the navigation solution, velocity solution, and status. Time is referenced to the beginning of the while(1) loop. I looked back at the Ublox.cpp file and only saw one 200usec sleep so I’m not sure why the program is waiting for the solution and why its taking so long. Starting to wonder if this might be contributing to the problem. Other threads take the following times (sec):

AHRS: ~0.0024
PPM decode: ~0.000007
MS5805 (similar to MS5611): ~0.042 [device conversion time is 2X0.02sec]

AFAIK values are relative, so in the case you mentioned there should be no difference unless another process will pop up with a priority between those.
10% idle should probably be okay.

I know one thing you could try to speed GPS up. SPI clock in GPS example is set to 200KHz, so that it’s compatible with Navio version with NEO-6T. But NEO7 and NEO8 can work with SPI clock up to 5MHz.

  1. Open Navio/Ublox.cpp at master · emlid/Navio · GitHub
  2. Find all SPIdev::transfer calls
  3. Change value of the last argument from 200000 to 5000000.

Changed all SPIdev transfer calls from 2e5 to 5e6 but that did not make a difference. Lag is still measured at 1.9sec and is constant across the data collection time. I know the IMU is also on the SPI bus so does it matter if the IMU and GPS are communicating at two different speeds?