Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Buggy time handling code (doesn't consider timer rollover) #2832

Closed
rondlh opened this issue Sep 7, 2023 · 33 comments · Fixed by #2889
Closed

[BUG] Buggy time handling code (doesn't consider timer rollover) #2832

rondlh opened this issue Sep 7, 2023 · 33 comments · Fixed by #2889
Labels
bug Something isn't working

Comments

@rondlh
Copy link

rondlh commented Sep 7, 2023

While looking through the TFT source code I noticed some issues, the time handling code does not handle timer rollovers correctly. This happens in many places in the code (just search for "OS_GetTimeMs"), luckily rollovers are not frequent (it takes 50 days), but this issue will probably cause a hiccup of the TFT, perhaps even a lockup.

Here an example (common.c):

// Check time elapsed against the time specified in milliseconds for displaying/updating info on screen
// Use this for timed screen updates in menu loops only
bool nextScreenUpdate(uint32_t duration)
{
  static uint32_t lastTime = 0;
  uint32_t curTime = OS_GetTimeMs();

  if (curTime > (lastTime + duration)) // BUG!!!
  {
    lastTime = curTime;
    return true;
  }
  else
  {
    return false;
  }
}

The problem here is that "lastTime + duration" does rollover before curTime does, causing the if statement guard to become true immediately, which will happen again and again until curTime also rolls over.

The correct code would be this:

if (curTime - lastTime > duration)

This code is rollover safe.

Another example (menu.c)

  reminder.time = OS_GetTimeMs() + STATUS_BAR_REFRESH_TIME; // BUG!!!

  if (menuType != MENU_TYPE_FULLSCREEN) drawReminderMsg();
}

void loopReminderManage(void)
{
  if (reminder.status == SYS_STATUS_IDLE || OS_GetTimeMs() < reminder.time) return; // BUG!!!

This should be:

  reminder.time = OS_GetTimeMs();

  if (menuType != MENU_TYPE_FULLSCREEN) drawReminderMsg();
}

void loopReminderManage(void)
{
  if (reminder.status == SYS_STATUS_IDLE || OS_GetTimeMs() - reminder.time < STATUS_BAR_REFRESH_TIME) return;

So this makes me think that timer rollovers were not considered, but then I came across this piece of code in timer.c

void TIM7_IRQHandler(void)
{
  if ((TIM7->SR & 0x01) != 0)
  { // update interrupt flag
    TIM7->SR &= (uint16_t)~(1<<0);  // clear interrupt flag

    os_counter++;

    updatePrintTime(os_counter);

    loopTouchScreen();

    if (os_counter == (uint32_t)(~0))
    {
      os_counter = 0;
    }
  }
}

The last lines seem very odd to me, it looks like someone tried to prevent a rollover from happening.
This code is executed 1000x per second, waiting until 1ms before a rollover, then prevents a rollover. If you just let the rollover happen, then everything (timing) will be correct again. The code seems to only slows down the TFT, or am I missing something?

Also the updatePrintTime is quite odd if you consider what happens in updatePrintTime:

void updatePrintTime(uint32_t osTime)
{
  if (osTime % 1000 == 0)
  {
    if (infoPrinting.printing && !infoPrinting.paused)
    {
      infoPrinting.elapsedTime++;

      if (infoPrinting.remainingTime > 0 && !heatHasWaiting())
        infoPrinting.remainingTime--;
    }
  }
}

The following will be much more efficient (remove odd rollover code, don't call updatePrintTime when not needed):

void TIM7_IRQHandler(void)
{
  if ((TIM7->SR & 0x01) != 0)
  { // update interrupt flag
    TIM7->SR &= (uint16_t)~(1<<0);  // clear interrupt flag

    os_counter++;

    if (osTime % 1000 == 0)    
      updatePrintTime();

    loopTouchScreen();

  }
}

void updatePrintTime(void) // WARNING, TIMER INTERRUPT ROUTINE CALLED ONCE A SECOND
{
    if (infoPrinting.printing && !infoPrinting.paused)
    {
      infoPrinting.elapsedTime++;

      if (infoPrinting.remainingTime > 0 && !heatHasWaiting())
        infoPrinting.remainingTime--;
    }
}

UPDATE: I did a quick scan to find the bugs, here the results
Bug is present in:

  1. void loopFan(void)
  2. void OS_TaskLoop(OS_TASK *task_t)
  3. void OS_TaskEnable(OS_TASK *task_t, uint8_t is_exec, uint8_t is_repeat)
  4. bool FIL_NormalRunoutDetect(void)
  5. bool FIL_SmartRunoutDetect(void)
  6. void FIL_FE_CheckRunout(void)
  7. inline static bool nextUpdate(void)
  8. void setReminderMsg(int16_t inf, SYS_STATUS status)
  9. void loopReminderManage(void)
  10. void drawBusySign(void)
  11. void loopBusySignClear(void)
  12. void drawToast(bool redraw)
  13. void loopToast(void)
  14. void loopPrintFromOnboard(void)
  15. void probeHeightQueryCoord(void)
  16. static void m291_loop(void)
  17. void ParseACKJsonParser::endDocument()
  18. void rrfStatusQuery(void)
  19. void loopSpeed(void)
  20. void updateNextHeatCheckTime(void)
  21. void loopCheckHeater(void)
  22. uint8_t Touch_Enc_ReadPos(void) (strange code, guard is always true except at timer rollover)
  23. void Scroll_DispString(SCROLL * para, uint8_t align)
  24. uint16_t KNOB_GetRV(GUI_RECT *knob)
  25. bool nextScreenUpdate(uint32_t duration)
  26. static inline void pidStart(void)
  27. void menuPid(void)

Correct timer handling in:

  1. void LCD_CheckDimming(void)
  2. void Mode_Switch(void)
  3. bool Touch_Enc_ReadPen(uint16_t interval)
  4. bool Touch_Enc_ReadBtn(uint16_t interval)
  5. bool LCD_Enc_ReadBtn(uint16_t interval)
@rondlh rondlh added the bug Something isn't working label Sep 7, 2023
@digant73
Copy link
Contributor

digant73 commented Sep 8, 2023

I don't think the rollover is a real issue (printer powered on for more than 50 days).
I would consider only the optimization on TIM7_IRQHandler() and updatePrintTime() just to avoid a useless invokation of updatePrintTime() during printing

@rondlh
Copy link
Author

rondlh commented Sep 8, 2023

I don't think the rollover is a real issue (printer powered on for more than 50 days). I would consider only the optimization on TIM7_IRQHandler() and updatePrintTime() just to avoid a useless invokation of updatePrintTime() during printing

The issue could easily be solved by adding a simple function:

// Returns true if the time has passed, otherwise returns false
bool checkTimeElapsed(uint32_t previousTimestamp, uint32_t duration)
{
  return (OS_GetTimeMs() - previousTimestamp) > duration;
}

In many cases just rearranging the guard in the correct way would solve the issue, like:

if (curTime > (lastTime + duration))

should be changed into:

if (curTime - lastTime > duration)

Anyway, I Agree, It's only an issue if your printers are always on, and for print farms etc. The issue can just be added to the known bugs list and we can recommend to restart the TFT once a month to prevent the bug.

BTW: Do you understand what is happening with the timestamps in uint8_t Touch_Enc_ReadPos(void)?

@digant73
Copy link
Contributor

digant73 commented Sep 8, 2023

BTW: Do you understand what is happening with the timestamps in uint8_t Touch_Enc_ReadPos(void)?

What do you mean?

@rondlh
Copy link
Author

rondlh commented Sep 8, 2023

Here is the function (from Touch_Encoder.c)

1. uint8_t Touch_Enc_ReadPos(void)
2. {
3.   static uint32_t nowTime = 0;
4.   static bool move = false;
5.   static uint16_t sy;
6.   uint16_t ex, ey;
7. 
8.   ex = ey = 0;
9. 
10.   if (!XPT2046_Read_Pen() && OS_GetTimeMs() >= nowTime)
11.   {
12.     TS_Get_Coordinates(&ex, &ey);
13. 
14.     if (!move)
15.       sy = ey;
16. 
17.     move = true;
18. 
19.     if (ex > LCD_FREE_WIDTH)  // if touched navigation area, stop mode switching
20.       modeSwitching = true;
21.     else
22.       modeSwitching = false;
23. 
24.     if (ex > LCD_FREE_WIDTH)
25.     {
26.       if (sy > ey && ey != 0)
27.       {
28.         if (sy - ey > LCD_HEIGHT / 9 && sy - ey < LCD_HEIGHT / 7)  // 7 - 5
29.         {
30.           sy = ey;
31. 
32.           return 2;
33.         }
34.       }
35.       else
36.       {
37.         if (ey - sy > LCD_HEIGHT / 9 && ey - sy < LCD_HEIGHT / 7)
38.         {
39.           sy = ey;
40. 
41.           return 3;
42.         }
43.       }
44.     }
45.   }
46.   else
47.   {
48.     nowTime = OS_GetTimeMs();
49.     move = false;
50.     sy = ey = 0;
51.     modeSwitching = false;  // resume mode switching
52.   }
53. 
54.   return 0;
55. }

In line 10 there is a guard: OS_GetTimeMs() >= nowTime

This is "always" true because nowTime is equal to 0 or equal to a previous OS_GetTimeMs(), and OS_GetTimeMs() is always >= 0.
The guard is only false for a very short time every 50 days when OS_GetTimeMs() just did a rollover but nowTime has not yet done this... very odd code.

@digant73
Copy link
Contributor

digant73 commented Sep 8, 2023

yes, it is useless (I probably made some testing using timer but then I forgot that useless piece of code in the past, so I will remove it now). Touch_Enc_ReadPos() function is used in Marlin menu to handle the scroll Up/Down (touch, hold and move the finger on right part of the screen).
Condition if (!XPT2046_Read_Pen() && OS_GetTimeMs() >= nowTime) is matched only when holding the touch on screen

@rondlh
Copy link
Author

rondlh commented Sep 8, 2023

I saw you updatedTIM7_IRQHandler() to improve the updatePrintTime handling, but you left in:

    if (os_counter == (uint32_t)(~0))
    {
      os_counter = 0;
    }

Do you believe this code has any purpose? I believe it does nothing except taking up time and making a small error in time calculation every 50 days because it skips a count.

@digant73
Copy link
Contributor

digant73 commented Sep 8, 2023

I saw you updatedTIM7_IRQHandler() to improve the updatePrintTime handling, but you left in:

    if (os_counter == (uint32_t)(~0))
    {
      os_counter = 0;
    }

Do you believe this code has any purpose? I believe it does nothing except taking up time and making a small error in time calculation every 50 days because it skips a count.

? It is no more present on my push (it has been removed from the code)

@rondlh
Copy link
Author

rondlh commented Sep 8, 2023

? It is no more present on my push (it has been removed from the code)

Sorry, you are right, it's removed, well done, the TFT is getting faster and faster :D

BTW: I fixed all the Timer rollover bugs, it only took me an hour or so. It's really trivial if you understand 32-bit modulus calculations.

@digant73
Copy link
Contributor

digant73 commented Sep 8, 2023

BTW: I fixed all the Timer rollover bugs, it only took me an hour or so. It's really trivial if you understand 32-bit modulus calculations.

Eventually post the files but I would provide the changes in a specific PR (not urgent)

@rondlh
Copy link
Author

rondlh commented Sep 8, 2023

Eventually post the files but I would provide the changes in a specific PR (not urgent)

Sure, I will do some extensive testing first, also for the DMA serial writing, which seems to be working very well, it causes a significant increase in the scan frequency. Unfortunately I only have the STM32F2_4 implementation, I don't have any other hardware, I only have the BTT TFT35 V3.0 and MKS TFT35 V1.0.

@kisslorand
Copy link
Contributor

kisslorand commented Sep 8, 2023

It's really trivial if you understand 32-bit modulus calculations.

Beware that not all ARM Cortex-M MCUs have a dedicated hardware divider. (ex Cortex-M3)

@kisslorand
Copy link
Contributor

kisslorand commented Sep 9, 2023

In line 10 there is a guard: OS_GetTimeMs() >= nowTime

This is "always" true because nowTime is equal to 0 or equal to a previous OS_GetTimeMs(), and OS_GetTimeMs() is always >= 0. The guard is only false for a very short time every 50 days when OS_GetTimeMs() just did a rollover but nowTime has not yet done this... very odd code.

Line 10 should be

if (!XPT2046_Read_Pen() && OS_GetTimeMs() > nowTime)

so the touch coordinate is not read faster than every 1 ms. Altough it is a very small time, it is for debouncing purposes. (As you move your fingers on the TFT there can be fractions of a time where there's no physical touch detected. It is better understood if the output is monitored with an oscilloscope. I use debounce always on physical switches/contacts, otherwise you think you pressed the button once but the MCU can see actually hundreds of ON/OFF toggles.)

To better serve its purpose I would change it to:

if (!XPT2046_Read_Pen() && OS_GetTimeMs() >= nowTime + TOUCH_DEBOUNCE_TIME)

where TOUCH_DEBOUNCE_TIME could be defined somewhere between 5~15 (ms).

...or as per your anti-rollover method:

if (!XPT2046_Read_Pen() && OS_GetTimeMs() - nowTime >= TOUCH_DEBOUNCE_TIME)

:)

@kisslorand
Copy link
Contributor

the TFT is getting faster and faster :D

I made optimizations on the TIM7_IRQHandler() and updatePrintTime(), made them to be even more faster than our dear colleague did (or so I think) and spent half the day making all kind of tests and measurement. I could not measure nor detect any kind of speed improvement during print even with the "hesitation guard" on or off.

I might make a PR out of it but do not expect any tangible gain out of it.

@rondlh
Copy link
Author

rondlh commented Sep 10, 2023

> if (!XPT2046_Read_Pen() && OS_GetTimeMs() >= nowTime + TOUCH_DEBOUNCE_TIME)

This is another example of a Timer rollover bug, please stop doing this. Here the correct version:

if (!XPT2046_Read_Pen() && OS_GetTimeMs() - nowTime >= TOUCH_DEBOUNCE_TIME)

I made optimizations on the TIM7_IRQHandler() and updatePrintTime(), made them to be even more faster than our dear colleague did (or so I think) and spent half the day making all kind of tests and measurement. I could not measure nor detect any kind of speed improvement during print even with the "hesitation guard" on or off.

How about showing your improvements here? Let everyone review and discuss... I don't see any obvious relevant improvement anymore in TIM7/updatePrintTime, so please surprise me. @digant73 could add it to #2824 and I'm sure you would be credited.
Also note that the speed improvement discussed here (which were added to #2824), will NOT improve printing speed, but it will improve overall speed and responsiveness of the TFT, especially for low-end TFTs. By how much you ask? Not much I guess, but it's an easy and free improvement for everybody and improving/removing buggy code from an interrupt routine that is executed 1000 times a second is usually a good idea, there is no need to spend much time in trying to benchmark the speed improvement.

@kisslorand
Copy link
Contributor

kisslorand commented Sep 10, 2023

...please stop doing this. Here the correct version:

if (!XPT2046_Read_Pen() && OS_GetTimeMs() - nowTime >= TOUCH_DEBOUNCE_TIME)

I already gave literally the same example.

How about showing your improvements here? Let everyone review and discuss...

It can be discussed in a PR also.

I don't see any obvious relevant improvement anymore in TIM7/updatePrintTime, so please surprise me.

I do not have such goals (to surprise anyone here), but sure, I'll make a PR.

@digant73 could add it to #2824 and I'm sure you would be credited.

He has no problem to copy changes from an existing PR and whenever he did, usually no credits were given.

Also note that the speed improvement discussed here (which were added to #2824), will NOT improve printing speed, but it will improve overall speed and responsiveness of the TFT, especially for low-end TFTs.

I was well aware of this, I never stated that TFT speed increase is not beneficial (especially if it is for free), I just stated that print speed will not benefit from it, a conclusion that I came to by tests, not by theory.

there is no need to spend much time in trying to benchmark the speed improvement

I did it out of curiosity.

@rondlh
Copy link
Author

rondlh commented Sep 10, 2023

I already literally gave the same example.

Yes, you did, you provide 2 wrong lines of code, and corrected one. Better teach everyone here to use the correct version, it's not an option or irrelevant thing. Please note, the correct way of handling timestamps doesn't prevent rollovers, it's not a "anti-rollover method", it just handles rollovers correctly so you don't have a glitch at rollover.

It can be discussed in a PR also.

Of course you can, but it slows down progress here and decreases your chances of getting your proposals merged.

He has no problem to copy changes from an existing PR and whenever he did, usually no credits were given.

"He" is @digant73 I guess. I think you real pain is here... getting credit.
I can see that @digant73 has spend a lot of time and effort to make the TFT FW better, recently he solved the hesitation bug and ADVANCED_OK is ready for merge (a big improvement, great progress).
On the other hand, I spend a lot of time on your closed source hesitation bug solution (which seems to work), but you have not provide any insight into your approach, nor did you contribute to the discussion to support @digant73 and me to solve the issue. I only saw some vague statements, nothing concrete that was helpful. More code and less talk please.

I just stated that print speed will benefit from it, a conclusion that I came to by tests, not by theory.

That's quite surprising to me. I can imagine gaining a ms left and right, which could add up to a few seconds maybe if you don't use ADVANCED_OK, but tell me more... tell me more... any numbers?

@kisslorand
Copy link
Contributor

I just stated that print speed will benefit from it, a conclusion that I came to by tests, not by theory.

I am sorry, I made a typo. What I meant to say is that " I just stated that print speed will NOT benefit from it"

@kisslorand
Copy link
Contributor

kisslorand commented Sep 10, 2023

It can be discussed in a PR also.

Of course you can, but it slows down progress here and decreases your chances of getting your proposals merged.

Normally it would speed it up and increase the chances of getting it merged. The maintainers of this repository nicely and explicitly asked us to provide clean pull requests with separate topics, not cluttered ones with many unrelated changes.


Pull request Standards

  • We recommend creating separate PRs for unrelated fixes, do not gather them in one single PR. By doing so it will make it easier for our team to focus on eventual issues and to review them.

It can be found here: https://github.com/bigtreetech/BIGTREETECH-TouchScreenFirmware/blob/master/coding_standard.md

I can see that @digant73 has spend a lot of time and effort to make the TFT FW better,

So? He is not the only one! For example @guruathwal did a ton of heavy lifting for this FW to be a better one. You can check it here if you're not familiar with his work amount.

but you have not provide any insight into your approach

So?


nor did you contribute to the discussion to support @digant73 and me to solve the issue

The issue is solved for more than a month now. The path you two have taken is one where I've already been. If you remember, I already implemented ADVANCED_OK a few months ago. Implementing it was an eye opener for the fact (for me at least) that ADVANCED_OK is not the answer, it just mitigates the outcome of an issue and it is not universally applicable, it serves for a few setup only. So I really do not see how can I be any help in a matter that I already believe it's not the answer for the issue.
I do not have to beg to be believed as I have the living proof of it: my precompiled files which do not exhibit the issues of the "improved" serial receiver, do not need at all big buffers on Marlin, do not need any user configuration on the TFT side nor any new configuration on Marlin side and so on... There's a saying: "less is more".

@rondlh
Copy link
Author

rondlh commented Sep 10, 2023

@kisslorand
Where's your ADVANCED_OK code?
What insights did you learn when you implemented it? Where did you share your findings? You talk about "an issue", how about share your findings and be helpful in moving the firmware to the next level?

Where's your hesitation bugfix? Do you agree that the current bugfix solves the hesitation issue? Or are there other issues you solved in your closed source solution?

Of course a PRs should not cover too many things, but @digant73 has already added an (very low risk) improvements to #2824, it would make sense to add your additional improvements now too. Otherwise you risk that your improvement is too "lightweight" to justify a merge. It's up to you...

@kisslorand
Copy link
Contributor

kisslorand commented Sep 10, 2023

Where's your ADVANCED_OK code?

On my PC.

What insights did you learn when you implemented it?

I found out it creates more problems, it is not universally viable for all setups, it mitigates the outcome of the hesitation problem not fixes it.

Where did you share your findings?

Here: #2761

Later edit: sorry I remembered it wrong, I didn't share any findings there, only my concerns regarding ADVANCED_OK.

You talk about "an issue", how about share your findings and be helpful in moving the firmware to the next level?

I highly doubt it's me who's holding up this firmware. I have numerous bugfixes not merged. However the next level firmware is already available for more than a month now in my repository.

Where's your hesitation bugfix?

It's implemented for more than a month now in my repository.

Do you agree that the current bugfix solves the hesitation issue?

No. I already stated numerous time it just mitigates the outcome, doesn't fix anything. A painkiller is not fixing a rotten teeth, you need to go to the dentist to fix the teeth itself.
ADVANCED_OK was never ever implemented by no one for a good reason. I guess you also agree that there are many brilliant minds out there who could have done it at a snap of the finger. The one single guy who did something to use ADVANCED_OK is Chen who made the BufferBuddy plugin for Octoprint. His own conclusions also are:
"So, it looks like BufferBuddy doesn't fully address the problem, but it still mitigates against potential planner underruns by keeping the command buffers full, which still should help against the occasional blip of load on lower-powered devices."

For me the ADVANCED_OK is in the past. I always wanted to implement it, I eventually did it and than ditched it. End of story!

I totally understand your enthusiasm towards ADVANCED_OK. I was in your shoes once. I wanted it also really badly and waited and waited for someone to implement it. Garry from PrintEngineering implemented it 4 or 5 years ago for the Artillery printers but he didn't make the source code available. I was really frustrated. Anyway there was no visible change with or without his ADVANCED_OK, we were printing at low speeds back in the time (80mm/s).
Than I implemented it too and... you know the rest.

Or are there other issues you solved in your closed source solution?

I am not sure what you're asking here. Of course I solved other issues too. The changelog can be read in my repository.

Otherwise you risk that your improvement is too "lightweight" to justify a merge. It's up to you...

I do not risk anything, my printers are working just fine. ;)
Anyway, I am confident that our good buddy will not hesitate to copy & paste the ideas if he thinks they worth something.

@rondlh
Copy link
Author

rondlh commented Sep 10, 2023

Good news! Some code has appeared (#2833), that's what I like to discuss here:

@kisslorand Brilliant improvement for updatePrintTime, but why did you make void loopTouchScreen 2 to 3 times slower than before? Sabotage?

@kisslorand
Copy link
Contributor

@rondlh
LoL.
You're joking, right? :)

@rondlh
Copy link
Author

rondlh commented Sep 10, 2023

@rondlh LoL. You're joking, right? :)

I'm only joking about the sabotage part, your implementation of loopTouchScreen is about 2-3 times slower than the old one, so I'm wondering why you did that? Code readability?

UPDATE: @kisslorand My tests show that your loopTouchScreen is only 25% slower than the original code (STM32F207), not the 2-3 times I theoretically expected. Only code readability has decreased by a factor of 2-3.

Here an easy way to make loopTouchScreen a tiny bit faster. Drop the guard inversion and swap the "normal" and "else" case.

Original code:

void loopTouchScreen(void)  // Handle in interrupt
{
  static uint8_t touch;
  if (!XPT2046_Read_Pen())
  {
    if (touch >= 20)  // 20ms
    {
      touchScreenIsPress = true;
    }
    else
    {
      touch++;
    }
  }
  else
  {
    touchScreenIsPress = false;
    touch = 0;
  }
}

Improved code:

void loopTouchScreen(void)  // Handle in interrupt
{
  static uint8_t touch;
  if (XPT2046_Read_Pen())
  {
    touchScreenIsPress = false;
    touch = 0;
  }
  else
  {
    if (touch >= 20)  // 20ms
    {
      touchScreenIsPress = true;
    }
    else
    {
      touch++;
    }
  }
}

If you want to make it 25% slower and obfuscate the code then replace it with this (Credit to @kisslorand):

void loopTouchScreen(void)  // Handle in interrupt
{
  static uint8_t touch;
  touch = !XPT2046_Read_Pen() * (touch + !touchScreenIsPress);
  touchScreenIsPress = !(TOUCH_DEBOUNCE_MS - touch);
}

@kisslorand
Copy link
Contributor

OK. Let's keep the discussion in one place: #2833

@rondlh
Copy link
Author

rondlh commented Sep 11, 2023

@kisslorand

"So, it looks like BufferBuddy doesn't fully address the problem, but it still mitigates against potential planner underruns by keeping the command buffers full, which still should help against the occasional blip of load on lower-powered devices."

For me the ADVANCED_OK is in the past. I always wanted to implement it, I eventually did it and than ditched it. End of story!

What do you mean "end of story"!? This is about a problem in Cura 4.7.1 (too many small segments) in combination with low-powered devices. And ADVANCED_OK is reported to improve the situation significantly, but cannot fully solve the problem (buffer underruns). To fully address the problem you need to upgrade Cura or/and new/faster hardware.

I guess you will never wear a raincoat or use an umbrella, because you will still get wet, right?

@kisslorand
Copy link
Contributor

I guess you will never wear a raincoat or use an umbrella, because you will still get wet, right?

I will use a raincoat and/or umbrella because I cannot fix the root of the problem.

What do you mean "end of story"!?

I mean that I'm done with believing that ADVANCED_OK is a magic wand.

I think we can put an end to the debate about ADVANCED_OK. As I said, I fully understand your enthusiasm towards it. For the moment it's the best fix you two could come up with. I've already been there, I get it.

The tone of the discussion starts to deviate from a friendly one so with your permission I will excuse myself from it to save what's left and keep it from degenerate further.

@rondlh
Copy link
Author

rondlh commented Sep 11, 2023

@digant73 Your monitoring update is really nice (#2824), I've abused it to do some benchmarks lately, potentially you could add a few blank (commented out) lines for general purpose use. I would for example be interested to see how much free memory is available, and perhaps MCU loading during printing would be interesting too.

@digant73
Copy link
Contributor

@digant73 Your monitoring update is really nice (#2824), I've abused it to do some benchmarks lately, potentially you could add a few blank (commented out) lines for general purpose use. I would for example be interested to see how much free memory is available, and perhaps MCU loading during printing would be interesting too.

that menu was meant to be extended in the future adding any KPI that can provide useful info. I didn't add other info at the moment simply because the menu was initially (and enabled at compile time) focused only to monitor resources for "advanced ok" feature.

@rondlh
Copy link
Author

rondlh commented Sep 11, 2023

that menu was meant to be extended in the future adding any KPI that can provide useful info. I didn't add other info at the moment simply because the menu was initially (and enabled at compile time) focused only to monitor resources for "advanced ok" feature.

Sure, I understood from your previous comments on ADVANCED_OK. I like it, it's very useful and easy to use and abuse :D

@digant73
Copy link
Contributor

digant73 commented Sep 11, 2023

@kisslorand

"So, it looks like BufferBuddy doesn't fully address the problem, but it still mitigates against potential planner underruns by keeping the command buffers full, which still should help against the occasional blip of load on lower-powered devices."
For me the ADVANCED_OK is in the past. I always wanted to implement it, I eventually did it and than ditched it. End of story!

What do you mean "end of story"!? This is about a problem in Cura 4.7.1 (too many small segments) in combination with low-powered devices. And ADVANCED_OK is reported to improve the situation significantly, but cannot fully solve the problem (buffer underruns). To fully address the problem you need to upgrade Cura or/and new/faster hardware.

I guess you will never wear a raincoat or use an umbrella, because you will still get wet, right?

Not sure about the sentences from kisslorand about extra resources on Marlin side to support ADVANCED_OK. In my printers I even reduced BUFSIZE to 8 (from 32) while maintaining even RX_BUFFER_SIZE disabled.
Maybe his implementation was different and needed extra resources.

@rondlh
Copy link
Author

rondlh commented Sep 12, 2023

Not sure about the sentences from kisslorand about extra resources on Marlin side to support ADVANCED_OK. In my printers I even reduced BUFSIZE to 8 (from 32) while maintaining even RX_BUFFER_SIZE disabled.
Maybe his implementation was different and needed extra resources.

That user seems to have quite outdated and incorrect insights and information, his arguments do not seem objective, very little code but lots of talk. I like to talk about code or at least strategy, just saying things are bad is useless if you don't give more pointers afterwards, or even better, show some draft sample code to discuss.
The only extra thing that enabling ADVANCED_OK in Marlin does is changing the default response from "ok\n", to "ok Pxx By\n". There is virtually no overhead, but the benefits of ADVANCED_OK are huge, I don't have to tell you that.

@rondlh
Copy link
Author

rondlh commented Sep 12, 2023

I have tested what actually happens when a timer rollover happens (by using a time machine moving 49 days into the future).
Various things can happen because this bug is all over the code, and it depends on which code is running at the time of the rollover. Various types of timer calculation mistake are in the code which can cause a section of code to be skipped completely or it gets continuously executed for a second or so, locking out all other code.
Practically I found that the TFT can lockup for a second and start beeping like crazy, but I did not see any TFT freezes, the TFT always recovered, which is of course good news. It seems worst case you will get a hesitation of about 1 second in the print. Note that your printer must be on for 49 days before this happens, so most users do not need to worry.
ADVANVED_OK #2824 can mostly mitigate this issue, so let's hope this code gets merged soon.
I will do some more tests and then work with @digant73 to create a PR or try out a PR myself.

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked and limited conversation to collaborators May 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants