[SOLVED] "AK8963: bad DEVICE ID" error during Ardupilot start

Emlid could at least have a go at reproducing this error

Emlid did have a go at reproducing this error, but it happens so rarely that it’s hard to catch it.
We will try your approach as soon as we stumble upon that error again.
Device tree will be enabled in next image release anyway.

The spidev module doesn’t load on my RasPi2 when device tree is disabled.

Maybe something was broken in your system. Spidev does load in our image.

I’d like to provide some explanation about the nature of that error.
AK8963 is a compass chip that is built-in in MPU9250.
The error can be caused by various reasons, for example:

  1. MPU9250 is not configured properly and does not provide access to AK8963 (that’s the bug we are hunting)
  2. MPU9250 is not working.
  3. SPI bus is not accessible.

Because compass initialization is one of the first in APM any problem with SPI and MPU9250 causes that error to appear.

We will rework the initial checks, so that errors are displayed properly.

@CodeChief could you please try to reproduce this error with a freshly written SD card with our RT Raspbian image? Like:

  1. Write image to SD card
  2. Check that /dev/spidev0.0 and /dev/spidev0.1 are available
  3. Check that AccelGyroMag example is working
  4. Install APM.deb
  5. Run APM, does that error happen?

Hi
Same error occurs every time i set overclock to Pi2.
I have tried to use Sudo shutdown -h now as suggested by @CodeChief but its not working.

If i Write fresh image to SD card and leave the overclock as default no errors occur.
It would be nice to use Pi2 overclock, but none/700Mhz will do for now.

It would be really nice to run Pi2 overclock as this improves gstreamer quality drastically.
@CodeChief did you get APM running with Pi2 overclock?

@CodeChief @Bernt_Christian_Egel

I’ll see what I can do about the overclocking. APM worked okay on overclocked RPi1, I wonder why there are troubles on RPi2.

HI @mikhailavkhimenia,

Now I’m really confused. Today I can only recreate the issue with overclock. It always fails at 1000MHz. Reset to default then it runs again. Perhaps that’s the only problem. Full logs of my testing are available here: http://1drv.ms/1JrIM3E

There are some other points which may need attention:

  1. Some kind of incompatibility reported during the SPI driver load:
    [ 3.369472] bcm2708_spi bcm2708_spi.0: master is unqueued, this is deprecated

  2. Other boot messages maybe specific to your image, perhaps fixed with some standard Linux tweaks:

Wed May 6 19:31:58 2015: […] Activating swap…^[[?25l^[[?1c^[7^[[1G[^[[32m ok ^[[39;49m^[8^[[?25h^[[?0cdone.
Wed May 6 19:31:58 2015: mount: you must specify the filesystem type
Wed May 6 19:31:58 2015: […] Cannot check root file system because it is not mounted read-only. …^[[?25l^[[?1c^[7^[[1G[^[[31mFAIL^[[39;49m^[8^[[?25h^[[?0c ^[[31mfailed!^[[39;49m
Wed May 6 19:32:04 2015: […] Checking if shift key is held down:Error opening ‘/dev/input/event*’: No such file or directory
Wed May 6 19:32:04 2015: No. Switching to ondemand scaling governor/etc/init.d/raspi-config: 26: /etc/init.d/raspi-config: cannot create /sys/devices/system/cpu/cpufreq/ondemand/up_threshold: Directory nonexistent
Wed May 6 19:32:04 2015: /etc/init.d/raspi-config: 27: /etc/init.d/raspi-config: cannot create /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate: Directory nonexistent
Wed May 6 19:32:04 2015: /etc/init.d/raspi-config: 28: /etc/init.d/raspi-config: cannot create /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor: Directory nonexistent
Wed May 6 19:32:05 2015: Error opening ‘/dev/input/event*’: No such file or directory

  1. The cgroups thing I already documented the fix for (add cgroup_enable=memory in cmdline.txt):
    Wed May 6 19:32:05 2015: […] Kernel lacks cgroups or memory controller not available, not starting cgroups. …^[[?25l^[[?1c^[7^[[1G[^[[33mwarn^[[39;49m^[8^[[?25h^[[?0c ^[[33m(warning).^[[39;49m

When my other boards arrive I should see if the overclock limitation is specific to my first board and/or RasPi2 (I’ve ordered new RasPi2s to go with the new boards)…

@CodeChief

[ 3.369472] bcm2708_spi bcm2708_spi.0: master is unqueued, this is deprecated

AFAIK that warning is displayed because spidev is enabled, something to do with the interface for an older kernels.

Wed May 6 19:31:58 2015: […] Activating swap…

This one looks correct.

Wed May 6 19:31:58 2015: […] Cannot check root file system because it is not mounted read-only.

That’s standard for Raspbian.

Wed May 6 19:32:04 2015: […] Checking if shift key is held down:Error opening ‘/dev/input/event*’: No such file or directory

This one appears when you do anything with overclocking options. When RPi is overclocked it may not boot properly, to fix that RPi allows to disable overclocking if you hold shift key on a keyboard during the boot process. As you probably don’t have a keyboard connected it displays that warning.

Wed May 6 19:32:04 2015: No. Switching to ondemand scaling governor/etc/init.d/raspi-config: 26: /etc/init.d/raspi-config: cannot create /sys/devices/system/cpu/cpufreq/ondemand/up_threshold: Directory nonexistent
Wed May 6 19:32:04 2015: /etc/init.d/raspi-config: 27: /etc/init.d/raspi-config: cannot create /sys/devices/system/cpu/cpufreq/ondemand/sampling_rate: Directory nonexistent
Wed May 6 19:32:04 2015: /etc/init.d/raspi-config: 28: /etc/init.d/raspi-config: cannot create /sys/devices/system/cpu/cpufreq/ondemand/sampling_down_factor: Directory nonexistent

I will fix this one in the next release. It appears because performance governor is the only governor in the system as we don’t want frequency to jump in a real-time os.

3) The cgroups thing I already documented the fix for (add cgroup_enable=memory in cmdline.txt):

Will include that in the next release too.

Thanks for catching all this!

@CodeChief @Bernt_Christian_Egel

Aha, I’ve tried to overclock third RPi2 and it produced that error:)
From time to time I see that SPI clock doubles and that is the real root of a problem. Seems like SPI clock depends on core_clock. But weird that it doesn’t happen all the time (wonder if core_clock changes dynamically).

A correct way to solve this is to fix the SPI driver, so that it always sets SPI clock dividers accordingly to the overclocking options.

For now, I recommend not to do any overclocking.

2 Likes

@mikhailavkhimenia great, good job. I was on my way to buy another RPI2.
So, the $1000 question will be, how do we fix this SPI driver? :slight_smile:

Well, probably we should open an issue on raspberrypi\linux github.
But first its better to check if that wasn’t already fixed there, I’ve seen a lot of SPI commits there a couple of weeks ago, but not sure if it has anything to do with that issue.

Sounds promising and reassuring that it’s a software issue. Got lots of plans for these boards and the more CPU power available the better. :slight_smile:

1 Like

This problem is not just dependent on overclocking.
If you restart the firmware often enough it will also lead to this init problem until powering off the device.
I will look for the responsible code. I guess it is in the library code.

Yes, that error appears not only because of the overclocking. Few posts earlier I was explaining that it can also appear because of unsuccessful AK8963 initialization as well because of hardware problems. There’s a guess that I2C master in MPU9250 may not be up to the initialization pace and may stumble upon bus errors, so probably AK8963 initialization should be rewritten to have more checks and some retries.

Just to be sure, you expect the problem to be somewhere here: ardupilot/AP_InertialSensor_MPU9250.cpp at navio · emlid/ardupilot · GitHub?
I think, I will try to find the problem this WE.

Best, Daniel

1 Like

The error is thrown here in the ardupilot/libraries/AP_Compass/AP_Compass_AK8963.cpp file: ardupilot/libraries/AP_Compass/AP_Compass_AK8963.cpp at 77a2b4acf6df6713f73df462120c2761231c2d48 · ArduPilot/ardupilot · GitHub

Looks like it has 5 chances over 500 milliseconds to initialize:

bool AP_Compass_AK8963::init()
{
    hal.scheduler->suspend_timer_procs();
    if (!_backend->sem_take_blocking()) {
        error("_spi_sem->take failed\n");
        return false;
    }


    if (!_backend_init()) {
        _backend->sem_give();
        return false;
    }

    _register_write(AK8963_CNTL2, AK8963_RESET); /* Reset AK8963 */

    hal.scheduler->delay(1000);

    int id_mismatch_count;
    uint8_t deviceid;
    for (id_mismatch_count = 0; id_mismatch_count < 5; id_mismatch_count++) {
        _register_read(AK8963_WIA, 0x01, &deviceid); /* Read AK8963's id */

        if (deviceid == AK8963_Device_ID) {
            break;
        }

        error("trying to read AK8963's ID once more...\n");
        _backend_reset();
        hal.scheduler->delay(100);
        _dump_registers();
    } 

    if (id_mismatch_count == 5) {
        _initialised = false;
        hal.console->printf("WRONG AK8963 DEVICE ID: 0x%x\n", (unsigned)deviceid);
        hal.scheduler->panic(PSTR("AK8963: bad DEVICE ID"));
    }

    _calibrate();

    _initialised = true;

I wonder what happens when we increase the loop delay of 100 milliseconds to a delay of 1000 milliseconds. Maybe the scheduler delay above also needs to be extended from 1000 to 6000+ (if the intent of that was to wait until all the initialization is complete). Or should this device always initialize quickly and the problem is somewhere else?

Not quite sure if this is related, but I found a similar issue with the Navio RAW board and RPi 2. On RPi 1 SPI was working without problems, but on the RPi 2 I ran into problems with SPI.

On the raspberrypi.org forums there was some info regarding dynamic overclock. After investigating myself I saw the default ARM clock was at 600MHz, which is dynamically adjusted to 900MHz if required (and within temp. limits). This is the default out-of-the box setting. You can change this by adding “force_turbo=1” to /boot/config.txt. This will force the core frequency to be fixed at 900MHz. The complete section should look like this:

[pi2]
kernel=kernel7.rt.img
arm_freq=900
core_freq=250
sdram_freq=450
over_voltage=0
force_turbo=1

You can check the current settings yourself by running the following commands:
“sudo cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq”
“sudo cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq”
“sudo cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq”

For me, this seems to fix the SPI issues, but I need to do more testing to confirm this is a permanent solution. Also I am not sure whether the core clock is also dynamically adjusted, but I think this might be the case. I feel there must be a better way to fix the SPI frequency and make it independent of the core clock, but am unsure how.

Note: If you use the raspi-config utility it will alter the settings and on RPi 2 doesn’t allow you to go back to defaults (only by manually editing the config file), so better not to use it at all.

Okay, the problem seems to be connected just with the core frequency. So maybe arm overclocking might work, but core not anymore.

@CodeChief @Bernt_Christian_Egel @benrexwinkel @dgrat

I’ve compiled a new kernel with updated SPI driver, it calculates clock differently.
SPI clock on oscilloscope seems to be the same regardless of the overclocking and I’ve been running APM on overclocked system for a whole day. But I’m still not sure as the previous problem didn’t happen all the time for me.
Could you guys please check it?

And the standalone kernel, if you’d prefer to update existing system.
Copy boot and lib to boot and / partitions accordingly:
emlid-rpi2-kernel-3.18.11-rt7.tar.xz

Please note that it is not overclocked by default.

1 Like

@mikhailavkhimenia
Thanks! I’ll test the new kernel this weekend.

Thx.
Will try asap.