Autogain delays crowd out other events and lead to connection losses #7

phrxmd · 2019-01-11T08:51:39Z

When it's dark, the autogain readout can take several seconds. During this time the system is busy mostly in the following section of code:

void waitNext( Tsl2561::exposure_t exposure ) {
  switch( exposure ) {
    case Tsl2561::EXP_14:  delay(Tsl2561Util::DELAY_EXP_14);  break;
    case Tsl2561::EXP_101: delay(Tsl2561Util::DELAY_EXP_101); break;
    default:               delay(Tsl2561Util::DELAY_EXP_402); break;
  }
}

The problem is that delay() is not very well-behaved, so that other code that depends on regular calls to maintain connections and event loops is being crowded out. While delay() does the basics such as feeding watchdogs, a typical problem example is MQTT using the PubSubClient library (as outlined in this article). I regularly see lost MQTT connections and missed messages when it's dark because the system cannot respond to them.

I see two main ways out of this:

for myself I have refactored the autoGain() routine into a state machine. I call it regularly from the main loop() and it maintains timers internally so that there is no use of delay()and no lockups at all. That's a rather heavyweight solution, though.
or as a more lightweight solution there could be a hook that makes it possible to add user-specified alternative delay() functions that makes the necessary callbacks.

The text was updated successfully, but these errors were encountered:

joba-1 · 2019-01-14T11:51:14Z

At the time I coded this, I did not know of a way to call things on a regular basis inside tasmota to check for passed waiting intervals. I dont like delays either, so I'm prepared to change that. Do you have the refactored code for review somewhere? Whatever works, but a pull request would be ideal...

phrxmd · 2019-01-23T12:56:12Z

OK, so I haven't gotten around to breaking it out into a proper pull request, but here is an example from a sketch I'm currently working on (a lighting controller for an IKEA VÄXER hydroponic growing unit). I'll commit that into a separate repository as soon as it's presentable.

The basic idea is the following: control the sensor through a state machine, which is a routine called regularly from loop(). Where traditionally one would use delay() to wait for things, here instead one uses the millisecond counter and some timer variables. It's a basic form of multitasking that works well enough if loop() gets called often (the more often the better, so it helps to keep code paths short).

// Various timing loop lengths, all in milliseconds
#define INTERVAL_TSL 1000   // Default time between luminosity readouts)
...
// Variables needed to control the various timing loops
unsigned long lastTSLReadout;
...
// Data availability (used e.g. for MQTT logging)
boolean freshTSL = false;
...
// Data we're collecting
unsigned long luminosity;     // millilux
...
/*************************************************************************/
/* State machine                                                         */
/* TSL2561 auto-gain readout                                             */
/*************************************************************************/
/* States:
 *  0 - Base state
 *      0 -> 1: measurement interval timer has run out
 *  1 - Measurement initiated, do something
 *      1 -> 2: Wait for sensor to give meaningful data for the first iteration 
 *  2 - Autogain measurement waiting for next iteration
 *      2 -> 3: start next iteration
 *  3 - Autogain measurement processing iteration
 *      3 -> 0: I2C errors broke our measurement, start over
 *      3 -> 4: our gain is enough, we have meaningful data
 *      3 -> 2: our light is so low that we need to increase gain and measure again
 *  4 - Measurement finished, clean up
 *      4 -> 0: reset measurement interval timer, start over      
 */

#define STATE_TSL_NO_ACTION   0 // Base state
#define STATE_TSL_MEASUREMENT 1 // Autogain measurement started
#define STATE_TSL_WAITING     2 // Waiting for next measurement iteration
#define STATE_TSL_ITERATION   3 // Measurement iteration
#define STATE_TSL_FINISHED    4 // Measurement finished, write data or log error
uint8_t stateTSL = STATE_TSL_NO_ACTION;

unsigned long timerTSLMeasurement = 0;    // Timer between measurements (state 0 -> 1)
unsigned long timerTSLIteration = 0;      // Timer for autogain iterations (state 1 -> 2)
unsigned long tslInterval = INTERVAL_TSL; // Interval between measurements
unsigned long tslWaitNext = Tsl2561Util::DELAY_EXP_402; // Delay between iterations 

// Lots of formerly internal variables that need to be reset after each measurement
uint16_t scaledFull = 0xffff, scaledIr = 0xffff;
uint32_t full = 0xffffffff, ir = 0xffffffff, milliLux = 0xffffffff;
bool gain = false;
Tsl2561::exposure_t exposure = Tsl2561::EXP_OFF;
uint8_t retryOnSaturated = 10;

// Internal variable that needs no resetting
uint8_t curr; // Sensitivity index
    
void tslAutogainStateMachine() {
  static const struct {
    bool                gain;
    Tsl2561::exposure_t exposure;
  } sensitivity[] = {
    { false, Tsl2561::EXP_14, }, // min
    { false, Tsl2561::EXP_101 },
    { true,  Tsl2561::EXP_14  },
    { false, Tsl2561::EXP_402 },
    { true,  Tsl2561::EXP_101 },
    { true,  Tsl2561::EXP_402 }  // max
  };
  
  switch(stateTSL) {
    case STATE_TSL_NO_ACTION:
      // Check if we need to measure
      if (millis() - timerTSLMeasurement > tslInterval) {
        stateTSL = STATE_TSL_MEASUREMENT;
        printlnV("[TSL] 0 -> 1: Initiate measurement");
        break;
      }
      break;
    case STATE_TSL_MEASUREMENT:
      // Initiate a measurement
      digitalWrite(PIN_RGB_GREEN, LOW);
      // if( Tsl2561Util::autoGain(Tsl, gain, exposure, scaledFull, scaledIr) ) {

      // get current sensitivity
      if( !Tsl.getSensitivity(gain, exposure) ) {
        printlnD("[TSL] 1 -> 0: I2C error");
        stateTSL = STATE_TSL_NO_ACTION;
        timerTSLMeasurement = millis();
        break;
      }

      // find index of current sensitivity
      curr = 0;
      while( curr < sizeof(sensitivity)/sizeof(sensitivity[0]) ) {
        if( sensitivity[curr].gain == gain && sensitivity[curr].exposure == exposure ) {
          break;
        }
        curr++;
      }
      if( curr == sizeof(sensitivity)/sizeof(sensitivity[0]) ) {
        printlnW("[TSL] 1 -> 0: Weird error determining current sensitivity index");
        stateTSL = STATE_TSL_NO_ACTION;
        timerTSLMeasurement = millis();
        break;
      }
    
      //  Now start the autogain iterations. 
      //  waitNext(exposure);
      timerTSLIteration = millis();
      switch( exposure ) {
        case Tsl2561::EXP_14:  tslWaitNext = Tsl2561Util::DELAY_EXP_14; 
        case Tsl2561::EXP_101: tslWaitNext = Tsl2561Util::DELAY_EXP_101;
        default:               tslWaitNext = Tsl2561Util::DELAY_EXP_402; 
      }
      printlnV("[TSL] 1 -> 2: Waiting for sensor");
      stateTSL = STATE_TSL_WAITING;
      break;
    case STATE_TSL_WAITING:
      // Wait until the sensor can return samples again
      // Because we're not using delay(), the CPU can do other things
      // Check if iteration wait timer has passed
      if (millis() - timerTSLIteration > tslWaitNext) {
        stateTSL = STATE_TSL_ITERATION;
        printlnV("[TSL] 2 -> 3: Waited enough, starting next autogain iteration");
        break;
      }
      break;
    case STATE_TSL_ITERATION:
      // We've waited enough for the sensor to return useful data for another iteration
      // Get samples and adjust sensitivity if needed
      if( !Tsl.fullLuminosity(scaledFull) || !Tsl.irLuminosity(scaledIr) ) {
        printlnW("[TSL] 3 -> 0 :I2C error reading luminosity");
        stateTSL = STATE_TSL_NO_ACTION;
        timerTSLMeasurement = millis();
        break;
      }
      uint16_t limit = Tsl2561Util::getLimit(exposure);
      if( scaledFull >= 1000 && scaledFull <= limit ) {
        // new value within limits, quit measuring
        printlnV("[TSL] 3 -> 4: We've got a reasonable value");
        stateTSL = STATE_TSL_FINISHED;
        break;
      }

      if( (scaledFull < 1000 && ++curr < sizeof(sensitivity)/sizeof(sensitivity[0]))
       || (scaledFull > limit && curr-- > 0) ) {
        if( !Tsl.setSensitivity(sensitivity[curr].gain, sensitivity[curr].exposure) ) {
          printlnW("[TSL] 3 -> 0: I2C error setting sensitivity");
          stateTSL = STATE_TSL_NO_ACTION;
          timerTSLMeasurement = millis();
          break;
        }
        // Set gain and exposure for next measuring round
        gain = sensitivity[curr].gain;
        exposure = sensitivity[curr].exposure;
      }
      else {
        if( ++curr > 0 && retryOnSaturated-- == 0 ) {
          // saturated, but best we can do
          printlnD("[TSL] 3 -> 4: We're saturated, but that's all we can do");
          stateTSL = STATE_TSL_FINISHED;
          timerTSLMeasurement = millis();
          break;
        }
      }

      // No value yet, let's iterate again
      timerTSLIteration = millis();
      switch( exposure ) {
        case Tsl2561::EXP_14:  tslWaitNext = Tsl2561Util::DELAY_EXP_14; 
        case Tsl2561::EXP_101: tslWaitNext = Tsl2561Util::DELAY_EXP_101;
        default:               tslWaitNext = Tsl2561Util::DELAY_EXP_402; 
      }
      stateTSL = STATE_TSL_WAITING;
      printlnV("[TSL] 3 -> 2: Waiting for sensor");
      break;
    case STATE_TSL_FINISHED:
      // We've got data, let's check it
      if( Tsl2561Util::normalizedLuminosity(gain, exposure, full = scaledFull, 
                                            ir = scaledIr) ) {
        if( Tsl2561Util::milliLux(full, ir, milliLux, Tsl2561::packageCS(tslId), 5) ) {
          debugD("[Data] Luminosity: %d.%d lx", (unsigned long)milliLux/1000, 
                (unsigned long)milliLux%1000);
          luminosity=milliLux;
          freshTSL = true; // mark data for MQTT publishing
        }
        else {
          debugW("Tsl2561Util::milliLux(full=%lu, ir=%lu) error\n", 
                 (unsigned long)full, (unsigned long)ir);
        }
      } else {
        debugW("Tsl2561Util::normalizedLuminosity(gain=%u, exposure=%u, sfull=%u, sir=%u, full=%lu, ir=%lu) error\n",
          gain, exposure, scaledFull, scaledIr, (unsigned long)full, (unsigned long)ir);
      }

      // Reset variables for next measurement
      scaledFull = 0xffff; scaledIr = 0xffff;
      full = 0xffffffff; ir = 0xffffffff; milliLux = 0xffffffff;
      gain = false;
      exposure = Tsl2561::EXP_OFF;
      retryOnSaturated = 10;
      
      // Measure less frequently if it's really dark, to keep the autogain routine 
      // from clogging the system 
      if (luminosity < 20000) {
        tslInterval = 10000;
      } else if (luminosity < 4000) {
        tslInterval = 20000;
      } else {
        tslInterval = INTERVAL_TSL;
      }
      
      // Finish measuring, get ready for next measurement
      printlnV("[TSL] 4 -> 0: Finished, wait for next measurement");
      stateTSL = STATE_TSL_NO_ACTION; 
      timerTSLMeasurement = millis();
      digitalWrite(PIN_RGB_GREEN, HIGH);
      break;
    default:
      printlnE("[TSL] *ERROR* State machine is in invalid state");
      // Force death and reboot of the ESP8266 via the hardware watchdog
      ESP.wdtDisable();
      while (1){};
    }      
  }
}
...


void loop() {
  debugHandle();
  // Redraw the UI on the OLED display 
  int remainingTimeBudget = ui.update();
  if (remainingTimeBudget > 0) {
    // The amount of time remaining depends on the framerate
    // ESP8266 at 160 MHz and 30 fps leaves plenty of time

    // Update MQTT via state machine (reconnect, upload etc.)
    mqttStateMachine();
    
    // Update time via NTP state machine if necessary
    ntpStateMachine();

    // Run state machine for DHT measurements (temperature, humidity)
    dhtStateMachine();

    // Run state machine for TSL measurements (brightness)
    tslAutogainStateMachine();

    // Update status icons depending on the state of the system
    iconUpdate();
  }
}

The debugging macros are those from Joao Lopes' SerialDebug library (link).

Anyway, that gets you the idea. This is not too beautiful yet and I suspect some of the variables can be done away with. Ideally I'd wrap this into a class (and other state machines for reading other sensors into other classes with a consistent interface).

joba-1 · 2019-02-02T00:15:00Z

Looks like a good way to handle this in general.

Problem for tasmota is, afaik the handler can't be called often.
There is only a coarse 1-second-interval interface.

phrxmd · 2019-02-04T05:15:18Z

Really? I've never looked at Tasmota in detail, but it must have a finer-grained loop internally, otherwise it would be impossible to drive things like displays. I'll take a look at Tasmota and maybe I can port part of my hydroponic control system's code to it. There are some other things that might be possibly useful, e.g. some user interface code. I've also started to put together an alternative backend for Tsl2561 to use the Brzo I2C library, which on ESP8266 is much faster than Wire.h and gives better I2C signals. I'll produce a pull request at some point over the next two weeks.

…

On Febr. 2 2019, at 6:15 am, joba-1 ***@***.***> wrote: Looks like a good way to handle this in general. Problem for tasmota is, afaik the handler can't be called often. There is only a coarse 1-second-interval interface. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub ***@***.***/0?redirect=https%3A%2F%2Fgithub.com%2Fjoba-1%2FJoba_Tsl2561%2Fissues%2F7%23issuecomment-459911445&recipient=cmVwbHkrMDIxNmYxOGJmYTVlYzk3ZmQ5Yzk4ODkxMDFjZDFiYTVlODliMGFiYjA5M2RiYmYyOTJjZjAwMDAwMDAxMTg2Y2EzMDQ5MmExNjljZTE3YmJlMjA3QHJlcGx5LmdpdGh1Yi5jb20%3D), or mute the thread ***@***.***/1?redirect=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAhbxiwLHCG9auY_00tsrl0JeHWNRyOE1ks5vJNiEgaJpZM4Z7CJz&recipient=cmVwbHkrMDIxNmYxOGJmYTVlYzk3ZmQ5Yzk4ODkxMDFjZDFiYTVlODliMGFiYjA5M2RiYmYyOTJjZjAwMDAwMDAxMTg2Y2EzMDQ5MmExNjljZTE3YmJlMjA3QHJlcGx5LmdpdGh1Yi5jb20%3D).

joba-1 · 2019-03-13T08:17:02Z

tasmota now has a 50ms hook for drivers. I plan to use this. Not ready, but if you want to have a look: branch async-autogain

phrxmd changed the title ~~Autogain delays crowd out other events and leads to connection losses~~ Autogain delays crowd out other events and lead to connection losses Jan 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autogain delays crowd out other events and lead to connection losses #7

Autogain delays crowd out other events and lead to connection losses #7

phrxmd commented Jan 11, 2019

joba-1 commented Jan 14, 2019 •

edited

Loading

phrxmd commented Jan 23, 2019 •

edited

Loading

joba-1 commented Feb 2, 2019

phrxmd commented Feb 4, 2019 via email

joba-1 commented Mar 13, 2019

Autogain delays crowd out other events and lead to connection losses #7

Autogain delays crowd out other events and lead to connection losses #7

Comments

phrxmd commented Jan 11, 2019

joba-1 commented Jan 14, 2019 • edited Loading

phrxmd commented Jan 23, 2019 • edited Loading

joba-1 commented Feb 2, 2019

phrxmd commented Feb 4, 2019 via email

joba-1 commented Mar 13, 2019

joba-1 commented Jan 14, 2019 •

edited

Loading

phrxmd commented Jan 23, 2019 •

edited

Loading