Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delayMicroseconds() #18

Closed
LaZsolt opened this issue Mar 10, 2020 · 46 comments
Closed

delayMicroseconds() #18

LaZsolt opened this issue Mar 10, 2020 · 46 comments

Comments

@LaZsolt
Copy link
Collaborator

LaZsolt commented Mar 10, 2020

In wiring.c delayMicroseconds() function using SBIW and BNRE instructions for timing.
In AVR MCUs it takes usually 4 clock cycles while in LGT MCUs takes usually 3 clock cycles.
Therefore this function delay less than expected.
I found a better version in a chinese site. The author corrected only 16 and 32 MHz cases.
https://www.geek-workshop.com/thread-38486-1-1.html

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Mar 10, 2020

I think the best solution for LGT MCUs is if we use a NOP between SBIW and BRNE.

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Mar 15, 2020

mybetter_delayMicroseconds.c.txt

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented May 5, 2020

Edit: I found this code not precise because the compiler using only the whole part of F_CPU/3000000L. The corrected code is in a a later omment.

If you don't want to delay zero time and every timing are constans (not variable), this most compact and precise LGT8Fx specific code is for you. (Must be in Arduino.h) Link time optimization (LTO) does not matter.

#define delayMicroseconds(us)            \
(__extension__({                         \
  __asm__ __volatile__ (                 \
    "usL_%=:" "sbiw %0,1" "\n\t"         \
              "brne usL_%="              \
	          :  /* no outputs */    \
              :"w" ((uint16_t) ((F_CPU/3000000L) * (uint16_t)us))  \
  );                                     \
}))

@seisfeld
Copy link
Contributor

seisfeld commented May 5, 2020

Would this work for any selected speed? I'm asking because of the F_CPU/3000000L. And what do you mean by "every timing are constans (not variable)"? Sorry if these are stupid questions.

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented May 5, 2020 via email

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented May 5, 2020

When I write timing are constans, I mean compiler calculate ( F_CPU/3000000L * microseconds) like this example at 16 MHz:

    delayMicroseconds(3);

Compiled to:
 2b8:   0f e0           ldi     r24, 0x0F
 2ba:   10 e0           ldi     r25, 0x00
000002bc <usL_217>:
 2bc:   01 97           sbiw    r24, 0x01
 2be:   f1 f7           brne    .-4        ; 0x2bc <usL_217>

If you call delayMicroseconds() macro with variables, MCU will do necessary calculations. It cause more delay than load two registers, so delay would be not precise. Don't use my macro like this:

  uint16_t nnn=12;
  for( int i = 0; i < 118; i++) {
    delayMicroseconds(nnn);        // nnn is a variable
    nnn = 3.14*nnn+1;
  }  

@seisfeld
Copy link
Contributor

seisfeld commented May 5, 2020

Allright, thank you for the explanation! Really appreciate it. :)

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented May 6, 2020

Now this code really precise.
If you don't want to delay zero time and every timing are constans (not variable), this most compact and precise LGT8Fx specific code is for you. (Must be in Arduino.h) Link time optimization (LTO) does not matter. (3000000L is the LGT8Fx specific value. The Atmel specific value is 4000000L)

#define delayMicroseconds(us)     \
(__extension__({                  \
  __asm__ __volatile__ (          \
    "usL_%=:" "sbiw %0,1" "\n\t"  \
              "brne usL_%="       \
	          :  /* no outputs */ \
              :"w" ( (uint16_t) ((F_CPU * (uint32_t) us) / 3000000L) )  \
  );                              \
}))

@dbuezas
Copy link
Owner

dbuezas commented Jul 10, 2020

Ill add the new delayMicroseconds version you posted first on the next version. Thanks for that!

@dbuezas
Copy link
Owner

dbuezas commented Jul 21, 2020

I forgot about this one in the last release.
I'll put it in in the next one

@jayzakk
Copy link
Collaborator

jayzakk commented Aug 12, 2020

( if anyone read my comment I just deleted - it's just too hot and I did a mistake ^^ )

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Aug 12, 2020

If you want to try on 32 MHz, you must use my delayMicroseconds code.
I wrote two different type of delayMicroseconds. Which one you choose?

@jayzakk
Copy link
Collaborator

jayzakk commented Aug 12, 2020

@LaZsolt , i used the macro version, but accidently put it into wiring.c instead Arduino.h.

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Aug 21, 2020

When using macro version of delayMicroseconds() with higher values, the compiler multiplication overflow 32 bit number, so the timing will not correct. I have an idea how to solve this problem, but I need to sleep now.

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Aug 22, 2020

I made several compilations on new code. No 32 bit overflow within parameter limitations.

#define delayMicroseconds(us)     \
(__extension__({                  \
  __asm__ __volatile__ (          \
    "usL_%=:" "sbiw %0,1" "\n\t"  \
              "brne usL_%="       \
              :  /* no outputs */ \
              :"w" ( (uint16_t) (( (F_CPU/1000) * (uint32_t) us ) / 3000L) )  \
  );                              \
}))

Macro parameter limitations:

Freq min. delay max. delay max.+1 microsec when _macro parameter is:
32 MHz 1 6143 0
16 MHz 1 12287 0
8 MHz 1 24575 0
4 MHz 1 49151 0
2 MHz 2 98303 0 or 1
1 MHz 3 196607 0 or 1 or 2

Timing in microsec at 1, 2, 4 MHz, when parameter is:

- 1 2 3 4 5 6 7 8 9
4 MHz 1 1.75 3.25 4 4.75 6.25 7 7.75 9.25
2 MHz 98304 2 3.5 3.5 5 6.5 6.5 8 9.5
1 MHz 196608 196608 4 4 4 7 7 7 10

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Aug 31, 2020

@dbuezas,
I found a library (Adafruit DHT temperature sensors) which using delayMicroseconds() with a variable parameter. This mean the macro version of delayMicroseconds() not fully compatibile. So don't put the macro version in next release.
But I have a new idea with the old style function: void attribute ((noinline)) delayMicroseconds() { ... }
I am still testing it.

@dbuezas
Copy link
Owner

dbuezas commented Sep 20, 2020

with #51 merged, this can be closed, right?

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Sep 20, 2020

Not yet. I would like to comment here my ideas for a while.

@XGIACOMO
Copy link

ciao, well done for your job!!
i'm using LGT8F with your libreries but i faced problems due to lack of precision with microseconds in bit banging.
(this is the bus that i'd like to use with LGT8F https://www.pjon.org/ )
are you planning to fix the microsecond issues or are you no longer pursuing it?
thank you very much
Giacomo

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Oct 26, 2020

@XGIACOMO
What type of precision problems have you faced?

Anyway, delayMicroseconds() is not modified in the release v1.0.6. The actual version of these branch of delayMicroseconds() will be in the next release.

If you want to use my better delayMicroseconds() now, you need to copy this two files

https://github.com/dbuezas/lgt8fx/blob/master/lgt8f/cores/lgt8f/Arduino.h
https://github.com/dbuezas/lgt8fx/blob/master/lgt8f/cores/lgt8f/wiring.c

to your hard drive directory:

C:\Users\__yourusername__\AppData\Local\Arduino15\packages\LGT8fx Boards\hardware\avr\1.0.5\cores\lgt8f\

@XGIACOMO
Copy link

hy LaZsolt, thank you for your answer!
this is what i've got with a simple delayMicroseconds(50) sketch
values are so far from what they should be!
i'll try your new libraries. thank you, i'll keep you in touch!

50us@16mhz
50us@32mhz

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Oct 26, 2020

Be aware digitalWrite() takes 2 to 5 microseconds.
Any port bitSet(), bitClear() much more faster than digitalWrite() but need to calculate its execution time when calculating pulse time.

@dbuezas
Copy link
Owner

dbuezas commented Oct 26, 2020

If you want to get as efficient, precise, and fast as possible at having something happening at regular intervals, there is a trick you can do using counters. It is the same as using interrupts but you just busy-wait for the counter to reach its target instead of consuming the 50+ cycles of the whole interrupt prelude, return & stuff.

So let's count clock cycles on run time:

// setting up timer 1
// secPerSample means "seconds per sample"
// this will work up to a max wait of 2ms (i.e secPerSample=0.002)
void startCPUCounter(float secPerSample) {
  TCCR1A = 0;
  TCCR1B = (1 << WGM13) | (1 << WGM12)   // CTC mode, counts to ICR1
        | (1 << CS10); // prescaler set to 1
  ICR1 = secPerSample * F_CPU - 1;
 
  TCNT1 = 0;
  setBit(TIFR1, OCF1A);  // clear overflow bit
}

__attribute__((always_inline)) inline void myDelay(){
   loop_until_bit_is_set(TIFR1, OCF1A); // this can be off by at most 3 clock cycles, but error won't accumulate because the timer will keep counting
  TIFR1 = 255;  // setBit(TIFR1, OCF1A); is actually enough, but clearing all flags at once is quicker and I'm not using the other timer flags anyway.
}

And then you busy-wait for a very precise timing that doesn't accumulate error:

void setup(){
    startCPUCounter(1.0/1000000); // 1us cycle
    
}
void loop(){
  noInterrupts(); // if you don't do this, it will be a bit off some times, but error won't accumulate.
  for (int i = 0; i< 10000;i++){
    doTheThingThatNeedsToHappenAtVeryPreciseAndShortIntervals();
    myDelay();
  }
  interrupts();
}

It is the trick I ended up using in the oscilloscope project to get the oscilloscope here to go very fast even while handling multiple channels and checking for triggering conditions. Only there I'm using Timer3 and fiddle with prescalers to get to higher waiting intervals when necessary. Obvious in hindsight I did feel really clever about this.

The good thing about all this nonsense, is that the counter will keep track of time while you are doing something else, so error never accumulates.

@XGIACOMO
Copy link

hy dbuezas, thank you for your answer!!
i'm building in my caravan a battery charger with a 15v powersupply were i work on the feedback regulation to create the target charge stages. i find lgt8f very powerfull due to integrated dac and Differential Amplifier with 32x gain: with this chip i don't need other external components, and it works very well!!
unfortunatly i need to comunicate with the central unit using a bitbang protocol,
https://www.pjon.org/
the same that i'm using to connect all devices in my caravan, but i'm not able to make lgt8f compatible with the protocol becouse of different bits lenght.
in coming days i'll try new LaZsolt libraries. i hope to succede!
than you again

@XGIACOMO
Copy link

XGIACOMO commented Nov 2, 2020

great job @LaZsolt!!!!!! much better than before ;-)

50microsecondsLgt8f@32mhz

@dbuezas
Copy link
Owner

dbuezas commented Nov 2, 2020

It's about time we make a new release including all these improvements from @LaZsolt et al.

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Nov 3, 2020

@XGIACOMO
I am also happy with the better result.

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Nov 3, 2020

@dbuezas
I am still working on a clock tick precise verison of delayMicroseconds(), but most of the case not needed clock tick tight precisity. But on lower freq, like 1 MHz, could useful the clock tick precise timing.
This task is more complex than I thought at first and code is became complex too. (I found posts about to correcting the delayMicroseconds(), older than 15 years.)
I have ideas, half made codes, and a testing code, but none of them are good enough to publish yet.

@dbuezas
Copy link
Owner

dbuezas commented Nov 3, 2020

@LaZsolt that's awesome!
Should I wait then?

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Nov 3, 2020

@dbuezas only few days.
I think need to discuss about delayMicroseconds() code, before we will reject it. ;)

@SuperUserNameMan
Copy link
Contributor

SuperUserNameMan commented Nov 3, 2020

@LaZsolt :
Since you're still working on this, just throwing an idea that is bugging me since I discovered LGT8Fx boards : do you think it would be possible to make clock speed software defined so it would be possible to change the speed of the board at runtime and have delay(), delayMicroseconds(), millis(), micros(), Serial(), I2C, SPI timings etc etc to adapt themselves at runtime too ?

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Nov 4, 2020

@SuperUserNameMan

do you think it would be possible to make clock speed software defined ... ?

I may can create a source for the case of runtime variable clock speeds, but the delay count calculations became more complex, so short timing will be more inaccurate. Better idea is, when caller will calculate before calling delayMicroseconds() or duplicate the bit-banging routines for two (or more) different clock speeds.
delay() is clock speed independent now, because it calling micros() which reads the timer. If timer set is correct, then delay() will measure the correct milliseconds.

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Nov 4, 2020

@dbuezas

#define delayMicroseconds(us)       \
    if (__builtin_constant_p(us)) { \
        delayMicroseconds_c(us);    \
    } else {                        \
        delayMicroseconds_v(us);    \
    }
#define delayMicroseconds_c(us) lgt8fx_delay_cycles((double)us*F_CPU/1000000)    // for constant case
 void   delayMicroseconds_v(unsigned int us) __attribute__ ((noinline));         // for variable case which is same as earlier

More sources in the coming days.

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Nov 6, 2020

Just finished, but not tested yet: https://github.com/LaZsolt/delayMicroseconds/tree/master/for_LGT8F

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Nov 7, 2020

@SuperUserNameMan
Copy link
Contributor

@LaZsolt :
I've seen your code includes a trick to prevent the linker from discarding a function you want to make available for debugging.

I don't know if it it could be useful to you, but if you want to protect some of your functions from compiler optimization, you can encapsulate them this way :

#pragma GCC push_options
#pragma GCC optimize ("keep-static-functions") 
static void foo( int a )
{
  // code i want to protect from compiler optimization
}
#pragma GCC pop_options

You can also specify a level of optimization O0, O1, O2, O3, Ofast, or Os this way #pragma GCC optimize ("O0", "keep-static-functions").

More info here :

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Nov 11, 2020

Finished, tested.

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Nov 21, 2020

An interesting trick:

This code will wait x*4 - 1 clocks cycle as we know.

__asm__ __volatile__ (
    "1: sbiw %0,1   \n\t"          // 1 cycle in LGT
    "   nop         \n\t"          // 1 cycle
    "   brne 1b"                   // 2 cycles ( 1 cycle when counter became 0 )
    : "=w" (x)                     // No outputs, but it is inform the compiler about a modified register
    : "0"  (x)
);



But this code will wait x*4 clocks cycle exactly.

__asm__ __volatile__ (
    "1: sbiw %0,1   \n\t"          // 1 cycle in LGT
    "   breq .+0    \n\t"          // 1 cycle  ( 2 cycle when counter became 0 )
    "   brne 1b"                   // 2 cycles ( 1 cycle when counter became 0 )
    : "=w" (x)                     // No outputs, but it is inform the compiler about a modified register
    : "0"  (x)
);

This little trick, with same code size, can make a bit readabe delay calculations inside delayMicroseconds_v().

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Jan 26, 2021

@SuperUserNameMan

do you think it would be possible to make clock speed software defined so it would be possible to change the speed of the board at runtime ... ?

Yes, I found a solution for delayMicroseconds(). The compiler may can select the best code for different clock speeds. The source not uploaded yet.

Other thing is, the delay calculations and the delaying cycle all written in assembly language, so it will avoid compiler or linker optimizations.

@SuperUserNameMan
Copy link
Contributor

@LaZsolt : sounds great !

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Jan 26, 2021

When I tested, the compiler handled the different speeds.

namespace CPU32 {
  #undef  F_CPU
  #define F_CPU 32000000
  #include "delayMicroseconds.h"
}
namespace CPU16 {
  #undef  F_CPU
  #define F_CPU 16000000
  #include "delayMicroseconds.h"
}
namespace CPU4 {
  #undef  F_CPU
  #define F_CPU 4000000
  #include "delayMicroseconds.h"
}

But after I put the final source into the core (without namespace) I getting redefinition errors in every namespaces.

@SuperUserNameMan
Copy link
Contributor

Oh ! I've just noticed I misread your message in my first answer that i've just deleted.

When I tested, the compiler handled the different speeds.
But after I put the final source into the core (without namespace) I getting redefinition errors in every namespaces.

So the namespace trick works if the delayMicroseconds() function is stored into a separate header. If that's jsut a matter of copy/pasting, I don't think it's a problem.

Your namespace trick is more elegant than what I proposed in the answer I deleted.

Thanks for the idea !

@LaZsolt
Copy link
Collaborator Author

LaZsolt commented Jan 31, 2021

A very new delayMicroseconds() are finished and tested again.
This version has better accuracy than before. The lowest frequency for clock tick accurate delay is:

  • 4 Mhz when the parameter is a variable ( or 3 Mhz when the resonator is 12 MHz on a Wavgat board)
  • 1 MHz when the parameter is constant.

You may use it with different clock frequencies in one source. This example shows how to use it on the menu selected and the other frequencies together:

#define FCPUSAVE F_CPU
namespace CPU32 {
  #undef  F_CPU
  #define F_CPU 32000000
  #include "delayus.h"
}
namespace CPU1 {
  #undef  F_CPU
  #define F_CPU 1000000
  #include "delayus.h"
}
#undef  F_CPU
#define F_CPU FCPUSAVE

void setup() {
  // put your setup code here, to run once:
}

void loop() {
  delayMicroseconds(3);            // Default speed from Arduino IDE
  // set speed to 32 MHz
  CPU32 :: delayMicroseconds(3);   // 32 MHz speed
  // set speed to 1 MHz
  CPU1  :: delayMicroseconds(3);   // 1 MHz speed
  // set speed back to the basic speed
}

The source is here: https://github.com/LaZsolt/delayMicroseconds/tree/master/for_LGT8F
How to install:

  • Replace the old delayMicroseconds() definition in Arduino.h in your directory C:\Users\__yourusername__\AppData\Local\Arduino15\packages\LGT8fx Boards\hardware\avr\1.0.5\cores\lgt8f\ with all my Arduino.h source.
  • Edit your wiring.c and delete delayMicroseconds() function.
  • Copy my delayus.h into your skech directory.




The previous attept was failed because I wanted to define delayMicroseconds() by a macro, but the compiler found it too complex when I tried to compile in different namespaces. Perhaps I run into a compiler issue. The macro which was bad I mentioned earlier: #18 (comment)

@DurandA
Copy link

DurandA commented Feb 26, 2021

@LaZsolt sorry to hijack this issue. I am trying to port waiting functions from the AVR Transistortester (in wait.S) to the LGT8F328P. Do you know if these figures are still valid with the LGT8F328P?

  • rcall needs 3 clock cycles
  • ret needs 4 clock cycles

@SuperUserNameMan
Copy link
Contributor

@DurandA : according to chinese datasheet, rcall needs 1 cycle, and ret 2 cycles.

See page 252 : https://github.com/raw/dbuezas/lgt8fx/master/docs/LGT8FX8P_databook_v1.0.4.ch.pdf

@DurandA
Copy link

DurandA commented Feb 26, 2021

@SuperUserNameMan Thank you very much. I missed it in the translated datasheet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants