Starke Spurabweichung (Drift) nach längerer Einschaltzeit

@eRaccon, ich habe mit Alexander darüber schon mich unterhalten, der kennt das Problem leider nicht. Vielleicht brauchen wir da mehr Daten von euren Mähern. Z.B. original Motortreiber und Motoren. Vielleicht auch mal das Thema im Ardumower Skypechannel posten. Ansonsten bitte in dem Drift Thread weitermachen.

Okay guys what else can we summarize as informations?
We have users with brushed and brushless motors in this thread which reported the problem, correct?
There are multiple sunray software releases affected over the last 12 month?
Maybe the only thing which is always the same for all users here, is that we all use the Adafruit Grand Central M4 (AGCM4)?

We all have solved the problem by disable the spike function?

Maybe lets compare a few basic information we already started/did a while ago.


Sunray version: 1.0.286 without any modification
Unit: AGCM4
Motors: 2021 brushless from the Shop
Drivers: 3x JQYD 2021
IMU: 6050 and 9250 (same problem on both)
Usually mowing speed: 0.31m/s
Problem solved when disable the spike function: YES
 
Ich habe den ALFRED, so wie er mir im Juli 2022 geliefert wurde, ohne irgendwelche eigene Modifikationen, aber mit den zur Verfügung gestellten SW-Updates.
Weil hier danach gefragt wurde, hier zu dem Drift-Thema nachstehend noch einmal meine beiden diesbezüglichen Posts (von weiter vorne in diesem Thread), da nach meiner Beobachtung auch ALFRED davon betroffen ist:
  1. Vor einigen Tagen habe ich ein Sunray Update auf dem Alfred auf 1.0.305 (von 1.0.298) gemacht. Die MCU Firmware RM18 ist Verion 1.1.15.
    Vor 2 Tagen ist auch mir (durch Zufall) das in diesem Thread beschriebene Verhalten von teilweise starke Abweichungen von der berechneten Bahn aufgefallen. Am PC wurden teilweise bis zu 40 cm Abweichung in langen Bahnen angezeigt, obwohl GPS auf "fix" stand und die angezeigten Abweichungen sich normalerweise im Bereich +/- 2-4 cm bewegen. Am Ende der Bahnen manövrierte sich Alfred aber wieder ziemlich genau zum berechneten Endpunkt.

    Das ist alles eigentlich noch kein großes Problem, sofern es nicht dazu führt, daß Alfred sich am Ende von Bahnen in Beeten oder Hecken festfährt. (Könnte bei mir mal passiert sein, aber da war ich noch nicht für die in diesem Thread beschriebene möglich Ursache sensibilisiert.)
    Gestern scheint diese problematische Bahn-Abweichung (nach mehrstündigem Betrieb und ununterbrochener Einschaltdauer, da ich nach Mähende nicht ausschalte) aber dazu geführt zu haben, daß Alfred nicht in die Docking-Station manövrieren konnte. Die Abweichung vom Dockingpfad war so, daß er beim Einfahren in die Dockingstation ca. 10 - 15 cm zu weit rechts fuhr und gegen die Garage stieß, per Bumper ein Hindernis erkannte und es dann neu versuchte, aber einfach nicht den Weg hinein fand. An den Tagen vorher hatte das aber problemlos funktioniert.
    Nach etlichen Dockingfehlversuchen, erinnerte ich mich an diesen Thread mit den unerklärlichen Bahnabweichungen und die erwähnte temporäre Problemlösung.
    Ich stoppte Alfred durch einmaliges Drücken der roten Taste am Roboter, führte am PC ein "Reboot Robot" aus, dann den PC neu mit Alfred "connected" und schließlich vom PC aus den "Dock" Befehl gegeben. Und siehe da: Alfred fand sofort zielgenau beim ersten Anlauf in seine Dockingstation (und seitdem bisher auch schon 3 mal wieder).

    Leider kann ich nicht sagen, ob dieser (hier beschriebene Abweichungs-)Bug schon bei Vers. 1.0.298 (vielleicht unerkannt) vorhanden war oder erst mit Vers. 1.0.305 neu kam.

  2. A few minutes ago, it just occured again (as described by me in my first post within this thread): Alfred couldn't dock! In various attempts he moved against the garage which triggered the bumper. He retreated and made several new approaches with same unsuccessful results until I sent a "stop" signal via Sunray Desktop.
    Then I "rebooted" Alfred, "connected" again, and sent the "dock" signal (all via Sunray on my Windows PC). His docking was then immediately successful with no problems at all.
Ich kann (noch) nicht sagen, ob mit den neuesten Releases (Sunray 1.0.309 und Sunray App v1.0.169) das Problem bei ALFRED schon behoben ist.
 
Sunray version: 1.0.298 with some modifications in AmRobotDriver.cpp, basically changes for bumper and lift sensor level
Unit: AGCM4
Motors: Original Worx dunker motors (BL)
Drivers: 2xJQYD 2021 1x2017
IMU: 6050
Usually mowing speed: 0.3m/s
Problem solved when disable the spike function: YES
 
@bernard thank you very much for your investigation, it works great. Really impressed from your deep knowledge about embedded systems, now I can let my mower whole time switched on

I have still two problems to be removed from fw, then I‘ll have perfect working helper

In a mean time I changed my mowing speed to 0.4m/s
 
Könnte aber da ein Zusammenhang zwischen den zwei Problemen bestehen, hatte jemand schon beide Probleme die er durch das entfernen des
#define SUPER_SPIKE_ELIMINATOR 1 // advanced spike elimination (experimental, comment out to disable)
or remove the timeout part.
nicht mehr hatte.
 
Könnte aber da ein Zusammenhang zwischen den zwei Problemen bestehen, hatte jemand schon beide Probleme die er durch das entfernen des
#define SUPER_SPIKE_ELIMINATOR 1 // advanced spike elimination (experimental, comment out to disable)
or remove the timeout part.
nicht mehr hatte.
Ich empfehle dir, Tausch den Treiber gegen einen JYQD 2017er oder 2021er, damit gibt es keine Peobleme, wenn du Shop unterstützen möchtest, gibt es den 2017er auch im Shop zu kaufen.
Um den offiziellen Nachfolger Treiber ist es richtig still geworden, meine Mutmaßung, dass es dieses Jahr nichts wird
 
I just noticed my mower having this problem. I'm planning to try the fix for commenting out the super spike eliminator, but I'm just confused about why it is causing issues.

From my understanding, "millis()" is going to return the same value over the duration of the ISR. Calling it multiple times in the same ISR isn't useful. It also is going to keep getting bigger and bigger until about 50 days. Overall, it sounds like ticks are getting missed causing too many rotations.

While it is great that disabling the super spike eliminator will fix the problem, it makes me worried that it isn't clear why it was needed and why it isn't working as intended. Looking at the code there are a few things I could see trying.
  1. avoid redefining the duration variable every ISR
  2. don't call millis() except once at the start of the ISR
  3. Use micros() instead of millis(), but might require some more thought
  4. Avoid the use of math using floats and unsigned longs
I also just wonder if counting the rotations has already been solved by another library that has handled this in a more robust way. Some things I don't understand are:
  • I don't understand why "spikes" are a problem
  • what are "spikes" specifically
  • what this spike eliminator code is really trying to do
  • why the ISR listens for "CHANGE", but then returns when LOW: "if (digitalRead(pinMotorMowRpm) == LOW) return;"
 
If there is an issue with millis() after about 50 days, I'm sure the reason is a wrong programming style, because a variable has an overflow.
I am used to this because I have the same issue at another ESP8266 project.
After ~49 days it stops working and i need to reset it.

An unsigned long variable has a number range from 0 to 4,294,967,295 (2^32 - 1)
If you divide this number by 1000 per second, 60 per minute, 60 per hour and 24 per day you get ~49,71 days.

If the programming style uses a difference with the time values it will get negative if the variable overflows or it will cause undefined states.

Best regards,
Chris
 
Ive made some changes in this PR, not directly related to the issue but thought I might as well chage the whole thing to microseconds since Im using them for the tick time anyways.

The super spike thing seems to take the maximum time that a odometry tick takes multiplied by 0.7, and use that as a timeout, basically and adaptive timeout isntead of the 1 millisecond.
My understanding is that spikes are bounces that can happen in the signal and trigger the ISR when it shouldnt.
I havent seen any issue without it though.

The ISR is indeed set to CHANGE but then discards the LOW call.
No idea why this is the way it is, my best guess after trying to figure out how the odomerty divider works, is that only RISING or FALLING can be used with the divider, but that got set as a condition instead???
 
Zuletzt bearbeitet:
If there is an issue with millis() after about 50 days, I'm sure the reason is a wrong programming style, because a variable has an overflow.
I definitely get that part, but the thing that doesn't make sense is why after only running 12-24h or so it starts having problems. This is what makes me wonder if in general the robot has a memory leak or something that is causing the ISRs to take longer and longer to execute, which would mean millis() gets more and more stale, since millis() doesn't change over the duration of the ISR routine.

Ive made some changes in this PR, not directly related to the issue but thought I might as well chage the whole thing to microseconds since Im using them for the tick time anyways.
This looks interesting. It totally feels like a sampling problem. I am just curious, where do you infer the 50ms sampling time? I thought the ISR is essentially getting triggered each time the motor encoder detects another tick, which could take a variable amount of time.

I interpreted the existing code (with super spike eliminator disabled) as rejecting ticks that occur less than 1ms since the last tick. Then, the super spike eliminator makes that a bit more variable. However, triggering the ISR on CHANGE, then rejecting the LOW state makes this all harder to reason about.

My understanding is that spikes are bounces that can happen in the signal and trigger the ISR when it shouldnt.
Gotcha, so this is essentially a debouncing problem. There is kind of a minimum amount of time it should take between two ticks and you don't want to count the same tick multiple times. It sounds like the super spike eliminator is resulting in the opposite problem of not counting a number of ticks, which results in the odometry test rotating the wheels more than they should be.
 
Hello all,
I also have big problems with drunken pilot and Drift if my Mower is not rebooted after >10h.
Alexander gave me the hint to this thread, I have read it, many thanks for your good analysis and explanations, nevertheless I am not sure if I know now what to do ;-(
Can somebody please check my questions below, I am writing in Englsih to rreach all of you.
Many thanks
Konny

Sunray version: 1.0.298
Unit: AGCM4
Motors: Original BRUSHED motors
Drivers: standard BRUSHED
IMU: 6050
Usually mowing speed: 0.3m/s
Problem solved when disable the spike function: I will test now starting today


Questions:

1)
Why is the no new Ardumower-Sunray-Release available where the problem with drunken pilot is fixed. In last version 1.0.298 it is not, and it is from Oct 8, 2022, so very old - no maintenance any more for Sunray...?


2) Do I also need to change this hints within this thread:
Or non of them and only disable spike function?

a) Hartmut's solution for BRUSHED Motors:
// motor speed control (PID coefficients) - these values are tuned for Ardumower motors
// general information about PID controllers: https://wiki.ardumower.de/index.php?title=PID_control
#define MOTOR_PID_KP 1.0 // do not change 2.0 (for non-Ardumower motors or if the motor speed control is too fast you may try: KP=1.0, KI=0, KD=0)
#define MOTOR_PID_KI 0.01 // do not change 0.03
#define MOTOR_PID_KD 0.005 // do not change 0.03
// ---- path tracking -----------------------------------
// below this robot-to-target distance (m) a target is considered as reached
#define TARGET_REACHED_TOLERANCE 0.05
// stanley control for path tracking - determines gain how fast to correct for lateral path errors
#define STANLEY_CONTROL_P_NORMAL 1.6 // 3.0 for path tracking control (angular gain) when mowing
#define STANLEY_CONTROL_K_NORMAL 0.2 // 1.0 for path tracking control (lateral gain) when mowing
#define STANLEY_CONTROL_P_SLOW 1.6 // 3.0 for path tracking control (angular gain) when docking tracking
#define STANLEY_CONTROL_K_SLOW 0.1 // 0.1 for path tracking control (lateral gain) when docking tracking

b) or Bernhard's values:
Also into config.h try to Adjust stanley setting according the value you use in the sunray app:
// ---- path tracking -----------------------------------
// below this robot-to-target distance (m) a target is considered as reached
#define TARGET_REACHED_TOLERANCE 0.05
// stanley control for path tracking - determines gain how fast to correct for lateral path errors
#define STANLEY_CONTROL_P_NORMAL 1.1 // 3.0 for path tracking control (angular gain) when mowing
#define STANLEY_CONTROL_K_NORMAL 0.3 // 1.0 for path tracking control (lateral gain) when mowing
#define STANLEY_CONTROL_P_SLOW 1.1 // 3.0 for path tracking control (angular gain) when docking tracking
#define STANLEY_CONTROL_K_SLOW 0.3 // 0.1 for path tracking control (lateral gain) when docking tracking

c) where can I change stanley setting in the sunray app anyhow?

d) EinEinfach's tip: bfx AGCM4 stack memory waste workaround
// CONSOLE.print (activeOp->getOpChain());

e) Bernhard's Tip: asm("dsb"); code to free the interrupt into amRobotdriver.com
 
1) Yeah there is still no comment in the forum from Alexander about all the problems... don't know why.
The community finds bugs and also the fix but no reaction from development to this.
You can go to the current code in github but there are multiple changes per week, so you better go to some "random" release in the past or you have a high risk of new bugs.

2) Disable the spike funktion should solve your problem that the mower turn drunk after a while.

a/b) PID and Stainley values have to be matching to your motor drivers, when you have the default ardumower BL drivers then leave the values on default.
When you mower drives well, then don't change anything here.

c) You can check/change the values in the app when you enable debugging. But when you have found the correct values, I would recommend to set them in the config.h file.

PID and Stainley values have impact on the driving in every situation, has nothing to do with the "drunken robot"... was only an idea that maybe the robot changes the values randomly after a few hours.

2 d) Yes you can also do this, just another bug. Has nothing to do with the drunken robot, was also more an idea that this situation could lead into the drunk problem.

e) not sure what exactly you talking about here, I guess this was the same thing like above, only an idea how to maybe solve the bug.
 
I also have big problems with drunken pilot and Drift if my Mower is not rebooted after >10h.
into amrobotdriver.cpp replace the odometry ISR function by this code :
Code:
void OdometryLeftISR(){             
  odomTicksLeft++;
  asm("dsb");     
}

void OdometryRightISR(){           
  odomTicksRight++;
  asm("dsb");   
}

into motor.cpp the test part use this code to add a timeout on the 10 rev test.
Code:
void Motor::test(){
  CONSOLE.println("motor test - 10 revolutions");
  motorLeftTicks = 0; 
  motorRightTicks = 0; 
  unsigned long nextInfoTime = 0;
  int seconds = 0;
  int pwmLeft = 200;
  int pwmRight = 200;
  bool slowdown = true;
  unsigned long stopTicks = ticksPerRevolution * 10;
  unsigned long nextControlTime = 0;
  while (motorLeftTicks < stopTicks || motorRightTicks < stopTicks){
    if (millis() > nextControlTime){
      nextControlTime = millis() + 20;
      if ((slowdown) && ((motorLeftTicks + ticksPerRevolution  > stopTicks)||(motorRightTicks + ticksPerRevolution > stopTicks))){  //Letzte halbe drehung verlangsamen
        pwmLeft = pwmRight = 80;
        slowdown = false;
      }   
      if (millis() > nextInfoTime){     
        nextInfoTime = millis() + 1000;   
        if (seconds > 120)
        {
          CONSOLE.println("Error can't reach the 10 rev in less than 2 minutes");
          break;
        }         
        dumpOdoTicks(seconds);
        seconds++;     
      }   
      if(motorLeftTicks >= stopTicks)
      {
        pwmLeft = 0;
      } 
      if(motorRightTicks >= stopTicks)
      {
        pwmRight = 0;     
      }
      
      speedPWM(pwmLeft, pwmRight, 0);
      sense();
      //delay(50);         
      watchdogReset();     
      robotDriver.run();
    }
  } 
  speedPWM(0, 0, 0);
  CONSOLE.println("motor test done - please ignore any IMU/GPS errors");
}

Now repeat the 10 rev test (AT+E) to setup correctly the TICKS_PER_REVOLUTION and change value into config.h
Code:
// ...for the older 42mm diameter motor (white connector)  https://wiki.ardumower.de/images/d/d6/Ardumower_chassis_inside_ready.jpg
#define TICKS_PER_REVOLUTION   960   // odometry ticks per wheel revolution

If you use AGCM4, into comm.cpp you can disgard the outputConsole to avoid stack memory error by adding "return;" at the beginning of the function

Code:
// output summary on console
void outputConsole(){
  return;
  if (millis() > nextInfoTime){       
    bool started = (nextInfoTime == 0);
 
Oben