PANIC: failed to take _spi_sem 100 times in a row, in AP_Compass_AK8963

@igor.vereninov Thought it’s better to create a new topic for this issue.

So I reinstalled everything anew and now ardupilot does start, but after 15 minutes of running terminates and gives that error:

" PANIC: failed to take _spi_sem 100 times in a row, in AP_Compass_AK8963::_update"

Then i found the code responsible for that message -

    bool AK8963_MPU9250_SPI_Backend::sem_take_nonblocking()
{
    /**
     * Take nonblocking from a TimerProcess context &
     * monitor for bad failures
     */
    static int _sem_failure_count = 0;
    bool got = _spi_sem->take_nonblocking();
    if (!got) {
        if (!hal.scheduler->system_initializing()) {
            _sem_failure_count++;
            if (_sem_failure_count > 100) {
                hal.scheduler->panic(PSTR("PANIC: failed to take _spi_sem "
                                          "100 times in a row, in "
                                          "AP_Compass_AK8963::_update"));
            }
        }
        return false; /* never reached */
    } else {
        _sem_failure_count = 0;
    }
    return got;
}

Any ideas on what might be causing the error?

Thanks.

@artmx Could you please post a bit more about your setup, is it Navio or Navio+? Which Raspberry Pi? How everything is powered and connected? Also, have you compiled the code yourself or used our binaries? Which command do you use run APM?

@ivereninov

It’s Navio+ with RPi2 and precompiled image from the “docs” section.

RPi is powered with it’s usb power supply, connected to my laptop’s hotspot WiFi with an Edimax adapter.

Nothing else besides APM is installed and running.

The command i used: sudo ArduCopter-quad -A udp:192.168.137.1:14550. It connects to mission planner and seems to work properly for around 15-20 minutes. Then the error pops up.

Got it, this is the first time I see something like this. Let me discuss it with our team and get back to you tomorrow.

@artmx meanwhile, could you run another test? Please try to reproduce this issue without APM. Leave the accelgyromag example running for a long time on the table. Would it fail?

I noticed that the compass value in MP was drifting constantly, but seems that the issue is gone so far after compass calibration… Sorry, haven’t seen that before.

@artmx

Could you please send us a log of a failed launch?

Logs are kept in /var/APM/logs.

@igor.vereninov
Ok, compass calibration thing only made the spi_sem error pop up more rarely, but it’s still there…
I’ve run the test that you suggested, left it for the whole day, it did’t fail.

@staroselskiy I’ve checked the directory, it’s empty. Is there a special line that you need to add to the launch command to get the log?

@igor.vereninov
By the way, the previous error that, the “Bad Device ID” on APM start, happened because I changed the overclocking options in raspi-config. It went away after I set it back to “Medium”. Why does that happen?

@artmx

By default logs are written only after arming. That’s why you might need to adjust logger parameter in your GSC of choice. In case of APM Planner it is in Config>Standard params>Log bitmask where you need to select “All+DisarmedLogging”.

Could you please make a fresh Raspbian install using our image? Thusly we could eliminate the custom adjustments you might’ve made.

@staroselskiy

Here’s the log after the fresh install. The error popped up after approx. 10 minutes of apm run time.

https://drive.google.com/file/d/0B2tng5FWxLIBRHVUQjV6VkNPODA/view?usp=sharing

@artmx

My guess is that something is utterly wrong with Raspberry. I’ve never seen the scheduler jitter like this on a Raspberry Pi 2. Do you have an opportunity to test it on a another RPi?

There’s nothing on a Navio that could lead to the issue at hand.

@staroselskiy

I’ll try to find another board to test then. Could you please explain how to determine the “jitter-ness” by reading the log?
Could it be a bad sd card?

@artmx

Firstly, I launched APM Planner and uploaded the log you provided. Then I looked at the PM section of the log where you could NLon and MaxT items.

First one tells you how many samples slips have happened. As you can see this value can count up to 300. NLoop tells how many samples are made (1000 in our case). Thus, scheduler missed ~300 of 1000 samples.

The second one (MaxT) helps observe how severe the misses are. On the regular basis the MaxT should oscillate around the 10000us (-> 100Hz) which corresponds to the Copter’s control loop frequency. In the log this value can be as high as 50000us.

There might be have something to do with the SD card’s state. But I’m not so sure about that.

1 Like

Thanks a lot for clarification!

Hello @staroselskiy and @igor.vereninov . I tried to search for the error I am getting, and I found this thread. I was wondering if you would possibly be able to help because I am having the same problem.

I was having issues, so I reimaged my disk with the rpi linux given. I have the navio+ board. I cloned the github and did all of the steps directly on my pi as listed in these instructions: http://docs.emlid.com/navio/Navio-APM/building-from-sources/
I downloaded 4.8.4 of gcc, and update the find_tools.mk file as suggested.
After make navio I received the messahe Firmware is in ArduCopter.elf

The way to run the APM as listed in the basic instructions (sudo ArduCopter-quad -A …) no longer works here, and as suggested above, I instead tried sudo ./ArduCopter.elf -A udp:GCSIP:14550 which returned the same error message again of PANIC: failed to take _bus->sem 100 times in a row, in AP_Compass_AK8963

If there is a known reason or a solution to this, please let me know. Thank you!!