Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serial output using ESP32-S3 usb_serial_jtag controller is unreliable (IDFGH-7781) #9318

Closed
2 tasks done
davidzuhn opened this issue Jul 9, 2022 · 65 comments
Closed
2 tasks done
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally

Comments

@davidzuhn
Copy link
Contributor

davidzuhn commented Jul 9, 2022

Environment

  • Development Kit: ESP32-S3-DevKit-C-1
  • Kit version (for WroverKit/PicoKit/DevKitC): v1
  • Module or chip used: ESP32-S3-WROOM-1
  • IDF version (run git describe --tags to find it): v5.0-dev-4001-g495d35949d
  • Build System: CMake
  • Compiler version: (crosstool-NG esp-2022r1-RC1) 11.2.0
  • Operating System: mac OS 12
  • Using an IDE?: No
  • Power Supply: USB

Problem Description

My application uses the console component, and provides a number of commands. When I run 'help' at the console when using the UART for the console (via the CP2102N), the output is completely repeatable and formatted as expected. When I use the USB Serial/JTAG controller for the console (as I expect to do when my "real" hardware arrives), the output is janky [that's the technical term :)]. Some newlines seem to be eaten, and some lines get omitted altogether.

Expected Behavior

When I use the UART console (connected via /dev/cu.SLAB_USBtoUART), this is what I get every time I run 'help'

>help
fs 
  data fs command

haptic 
  Haptic feedback

state 
  Display current connection state values

throttle 
  throttle support

wifi 
  wifi support

battery 
  notify battery level

tasks 
  Show tasks

event 
  Print event loop debug

reset 
  Reset the device (optionally to factory default)

version 
  Display version information

board 
  read board values

knob 
  View/set the knob position levels

help 
  Print the list of registered commands

Actual Behavior

When I run using the USB Serial/JTAG interface (via /dev/cu.usbmodem1101), and run the command again (and again), I see results like this:

>help
fs 
  data fs command

haptic 
  Haptic feedback

state 
  Display current connection state values

throttle 
  throttle support

wifi 
  wifi support

battery 
  notify battery level
tasks 
  Show tasks

event 
  Print event loop debug

reset 
  Reset the device (optionally to factory default)

version 
  Display version information

board 
  read board values

knob 
  View/set the knob position levels

help 
  Print the list of registered commands
>help
fs 
  data fs command

haptic 
  Haptic feedback

state 
  Display current connection state values

throttle 
  throttle support

  wifi support

battery 
  notify battery level

tasks 
  Show tasks

event 
  Print event loop debug

reset 
  Reset the device (optionally to factory default)

version 
  Display version information

board 
  read board values

knob 
  View/set the knob position levels

help 
  Print the list of registered commands


>help
fs 
  data fs command

haptic 
  Haptic feedback

  Display current connection state values

throttle 
  throttle support

wifi 
  wifi support

battery 
  notify battery level
tasks 
  Show tasks

event 
  Print event loop debug

reset 
  Reset the device (optionally to factory default)
  Display version information

board 
  read board values
  View/set the knob position levels

help 
  Print the list of registered commands

>help
fs 
  data fs command

haptic 
  Haptic feedback
state 
  Display current connection state values

throttle 
  throttle support

wifi 
  wifi support

  notify battery level

tasks 
  Show tasks

event 
  Print event loop debug

reset 
  Reset the device (optionally to factory default)

version 
  Display version information

board 
  read board values

  View/set the knob position levels

  Print the list of registered commands

Note these are not identical results for each time, and none of them match the "correct" text as seen on the UART.

Steps to reproduce

  1. Run my program on ESP32-S3-DevKitC, using (alternately) UART & USB console. I've tried to reproduce this on a small example, but haven't managed to see the same problems. Two binaries are in the attached zip file, both will run on the ESP32-S3-DevKitC board with no further hardware attached.

Code to reproduce this issue

Debug Logs

Other items if possible

  • sdkconfig file (attach the sdkconfig file from your project folder)
  • elf file in the build folder
  • [ N/A ] coredump (This provides stacks of tasks.)

bug-report.zip

@espressif-bot espressif-bot added the Status: Opened Issue is new label Jul 9, 2022
@github-actions github-actions bot changed the title serial output using ESP32-S3 usb controller is unreliable serial output using ESP32-S3 usb controller is unreliable (IDFGH-7781) Jul 9, 2022
@igrr igrr changed the title serial output using ESP32-S3 usb controller is unreliable (IDFGH-7781) serial output using ESP32-S3 usb_serial_jtag controller is unreliable (IDFGH-7781) Jul 9, 2022
@moefear85
Copy link

i'm not an expert on this but did you try latest master? there was an issue with usb-cdc lines getting eaten, which was fixed about 2 weeks ago.

@davidzuhn
Copy link
Contributor Author

The version listed is the latest master (even still as of today - 09 July 2022).

@moefear85
Copy link

moefear85 commented Jul 10, 2022

ok, i have a hunch. can you apply this patch? it might solve the problem.

remove the .txt extension, place it in

$IDF_PATH/components/tinyusb/tinyusb

and run this command (in that folder):

git apply usb.patch

usb.patch.txt

@igrr
Copy link
Member

igrr commented Jul 10, 2022

@moefear85 I'm afraid the file you are suggesting to modify is related to the software USB CDC driver for the USB_OTG peripheral. The reported issue is related to the USB_SERIAL_JTAG peripheral which has a hardware CDC implementation.

(I have modified the issue title to avoid confusion.)

@moefear85
Copy link

oh ok, i don't have any experience with esp32-s3 yet. I have some modules lying around, but haven't broken them out yet (haven't found the time).

@egnor
Copy link

egnor commented Jul 14, 2022

To reduce variables, instead of your application, can you try the standard "basic console" example project? (At esp-idf/examples/system/console/basic.) Make sure to run idf.py menuconfig and set ESP_CONSOLE_SERIAL_JTAG.

That has its own "help" output, which gives me:

esp32s3> help
free 
  Get the current size of free heap memory

heap 
  Get minimum size of free heap memory that was available during program
  execution

version 
  Get version of chip and SDK

restart 
  Software reset of the chip

tasks 
  Get information about running tasks

deep_sleep  [-t <t>] [--io=<n>] [--io_level=<0|1>]
  Enter deep sleep mode. Two wakeup modes are supported: timer and GPIO. If no
  wakeup option is specified, will sleep indefinitely.
  -t, --time=<t>  Wake up time, ms
      --io=<n>  If specified, wakeup using GPIO with given number
  --io_level=<0|1>  GPIO level to trigger wakeup

light_sleep  [-t <t>] [--io=<n>]... [--io_level=<0|1>]...
  Enter light sleep mode. Two wakeup modes are supported: timer and GPIO.
  Multiple GPIO pins can be specified using pairs of 'io' and 'io_level'
  arguments. Will also wake up on UART input.
  -t, --time=<t>  Wake up time, ms
      --io=<n>  If specified, wakeup using GPIO with given number
  --io_level=<0|1>  GPIO level to trigger wakeup

join  [--timeout=<t>] <ssid> [<pass>]
  Join WiFi AP as a station
  --timeout=<t>  Connection timeout, ms
        <ssid>  SSID of AP
        <pass>  PSK of AP

nvs_set  <key> <type> -v <value>
  Set key-value pair in selected namespace.
Examples:
 nvs_set VarName i32 -v
  123 
 nvs_set VarName str -v YourString 
 nvs_set VarName blob -v
  0123456789abcdef 

         <key>  key of the value to be set
        <type>  type can be: i8, u8, i16, u16 i32, u32 i64, u64, str, blob
  -v, --value=<value>  value to be stored

nvs_get  <key> <type>
  Get key-value pair from selected namespace. 
Example: nvs_get VarName i32
         <key>  key of the value to be read
        <type>  type can be: i8, u8, i16, u16 i32, u32 i64, u64, str, blob

nvs_erase  <key>
  Erase key-value pair from current namespace
         <key>  key of the value to be erased

nvs_namespace  <namespace>
  Set current namespace
   <namespace>  namespace of the partition to be selected

nvs_list  <partition> [-n <namespace>] [-t <type>]
  List stored key-value pairs stored in NVS.Namespace and type can be specified
  to print only those key-value pairs.
Following command list variables stored
  inside 'nvs' partition, under namespace 'storage' with type uint32_tExample:
  nvs_list nvs -n storage -t u32 

   <partition>  partition name
  -n, --namespace=<namespace>  namespace name
  -t, --type=<type>  type can be: i8, u8, i16, u16 i32, u32 i64, u64, str, blob

nvs_erase_namespace  <namespace>
  Erases specified namespace
   <namespace>  namespace to be erased

help 
  Print the list of registered commands

It doesn't seem corrupt, but I haven't done a line by line check.

@moefear85
Copy link

good idea. although that wouldn't catch any hidden variables that manifest themselves under stress, (which may be at play here, otherwise everybody would have faced the issue), it can rule out systematic problems, such as bad connectors/cables or buggy drivers.

@egnor
Copy link

egnor commented Jul 14, 2022

Don't be so sure it would have come up already! (Though somehow the bootloader works...?) I've reported and worked through more broken things in widely-used software. People put up and work around a surprising amount of brokenness.

But yes, if the "basic console" demo works for you, that doesn't mean your bug is invalid! But maybe it gives us a point to start triangulating from to figure out the trigger conditions. Maybe we can come up with a minimal repro example, which will surely help get a fix faster.

[edit: as sweetly pointed out, I was confused about who I was talking to]

@moefear85
Copy link

moefear85 commented Jul 14, 2022

@egnor

it's premature to speak of silicon errata. The chance of you catching one here is small. Something like this would be reported by many users, this isn't something one can easily workaround, especially given all the tiny things people rush to the issues sections to complain about.

hate to break/dump it on you, but obviously you're confused, since I'm not the one who reported the bug. read closely. and repeating the obvious doesn't help much. you may think you're a hotshot somewhere else, but that has little relevance to answering the question. stick to the topic instead.

@davidzuhn
Copy link
Contributor Author

The small example I mentioned in the main ticket was indeed the basic console example from the IDF. I have not been able to reproduce this in a small test case.

Note that in the app I have, I am not only dropping lines from the 'help' command -- many operations will emit several lines of text (usually via ESP_LOGx), and that output is also irregular (subsequent examples will show differing results, sometimes even "correctly").

The 'help' output is just the easiest and most obvious example.

@moefear85 : Just to be able to rule out certain systemic issues, I have seen this on all of my ESP32-S3 devices -- DevKitC, UM Feather-S3, Adafruit ESP32-S3 Feather. I've used a variety of cables (as the DevKit C are USB-micro, and the other two are USB-C) using multiple upstream host ports (USB-C & USB-A) so I am fairly confident in ruling out a specific connector or cable failure.

It's also the case that any given line that does get output seems to always be correct. The errors entail entire lines being dropped (some of which are just blank lines). There are no garbled or missing characters within a line.

As far as drivers go, when I use my Pi4 as the development and USB host, I see the same problems as well.

@egnor
Copy link

egnor commented Jul 14, 2022

The small example I mentioned in the main ticket was indeed the basic console example from the IDF. I have not been able to reproduce this in a small test case.

Ah, bummer. I know you posted a binary, is it possible to share source that reproduces the problem? Or is that too proprietary?

(Mostly I'm trying to rule out that something else in your app is messing with, say, interrupt handling in a way that's causing lost data.)

@igrr
Copy link
Member

igrr commented Jul 14, 2022

@davidzuhn We have managed to reproduce a very similar looking issue with an ESP32-S3 development board and with USB_SERIAL_JTAG. However the issue only happens with idf.py monitor, and is not observed with either screen or miniterm (python3 -m serial.tools.miniterm).

Could you please give screen or miniterm a try and see if you observe the issue there? If no, then it might be related to #9343 (comment).

@davidzuhn
Copy link
Contributor Author

Using minicom and miniterm, I see similar results as when using 'idf.py monitor'.

As for sharing the source to the app, I cannot distribute it in any public fashion.

@pjsg
Copy link

pjsg commented Aug 3, 2022

I suspect that this is a problem in the monitor code. In particular, if the monitor reads "\r\n" from the (virtual) serial port, then it outputs "\r\n". However, if it reads "\r" and then "\n", it only outputs "\r".

I think that the root cause of this is https://github.com/espressif/esp-idf/blob/release/v4.4/tools/idf_monitor_base/serial_handler.py#L85

However, getting rid of that line causes extra newlines in the output, but does fix the missing newlines.

I'm going to try and figure out what is going on.....

@pjsg
Copy link

pjsg commented Aug 4, 2022

I think that this is the fix:

diff --git a/tools/idf_monitor_base/serial_handler.py b/tools/idf_monitor_base/serial_handler.py
index af267088f2..957ba9e13b 100644
--- a/tools/idf_monitor_base/serial_handler.py
+++ b/tools/idf_monitor_base/serial_handler.py
@@ -78,12 +78,8 @@ class SerialHandler:
             # add unprocessed part from previous "data" to the first line
             sp[0] = self._last_line_part + sp[0]
             self._last_line_part = b''
-        if sp[-1] != b'':
-            # last part is not a full line
-            self._last_line_part = sp.pop()
+        self._last_line_part = sp.pop()
         for line in sp:
-            if line == b'':
-                continue
             if self._serial_check_exit and line == console_parser.exit_key.encode('latin-1'):
                 raise SerialStopException()
             if gdb_helper:

@davidzuhn
Copy link
Contributor Author

While that may indeed be an issue, the problem that I see does not lie solely in the monitor.py program. I see the same unusual results when using minicom as when I use monitor.py.

I will also note that I have updated my development tree to ESP-IDF v5.0-dev-4547-gfde4afc67a and I see the same issues

When I have your patch applied (to that version of the esp-idf), I see different flawed output. Using the same example, which has seen some changes since the initial report, but nothing in the area of stdio & console config, I now see some lines merged. In the initial problem, lines (including blank lines) are sometimes dropped. Now I see output like this:

throttle 
  throttle support
wifi 
  wifi supporttasks 
  Show tasks

Compared to the earlier "good text", the lines about the 'battery' command are no longer expected, but the 'wifi support' text is immediately followed by the 'tasks' command header.

@pjsg
Copy link

pjsg commented Aug 4, 2022

Are you using the driver to write to the serial jtag or not? I.e. do you have

esp_vfs_usb_serial_jtag_use_driver();

or

esp_vfs_usb_serial_jtag_use_nonblocking();

It appears that you can overrun the output buffer if you are not using the driver and this can cause dropped characters.

@davidzuhn
Copy link
Contributor Author

My code calls esp_vfs_usb_serial_jtag_use_driver.

@pjsg
Copy link

pjsg commented Aug 5, 2022

I'm not sure that I can help then. My debugging approach was to run the idf.py monitor under linux strace to see what it was actually reading from the USB CDC device. That showed me that the data arriving on my linux box was correct. It sounds as though that isn't the case for you -- but I'd still verify it anyway.

Good luck tracking it down!

@davidzuhn
Copy link
Contributor Author

I hadn't thought about strace. I ran minicom under strace (strace -o minicom.log2 -s1024 minicom -D /dev/ttyACM0 ) and then ran my help command. I saw this:

BKT -1 #0>help
  data fs command
i2c 
  read/write/scan i2c values
adc 
  read ADC values

state 
  Display current connection state values

  throttle support

wifi 
  wifi support

tasks 
  Show tasks

event 
  Print event loop debug

reset 
  Reset the device (optionally to factory default)

version 
  Display version information

board 
  read board values

knob 
  View/set the knob position levels
help 
  Print the list of registered commands

battery 
  notify battery level

Here are some of the corresponding lines from the strace output:

read(3, "fs command\r\r\ni2c \r\n  read/write/scan i2c values\r\r\nadc \r\n  read ADC values\r\n\r\nstate \r\n  Display c", 127) = 96
write(1, "fs command\r\ni2c \r\n  read/write/scan i2c values\r\nadc \r\n  read ADC values\r\n\nstate \r\n  Display c", 93) = 93
ioctl(3, TIOCMGET, [TIOCM_DTR|TIOCM_RTS|TIOCM_CTS]) = 0
ioctl(3, TCGETS, {B115200 -opost -isig -icanon -echo ...}) = 0
pselect6(4, [0 3], NULL, NULL, {tv_sec=1, tv_nsec=0}, NULL) = 1 (in [3], left {tv_sec=0, tv_nsec=999968242})
read(3, "urrent connection state values\r\n\r\nthrottle \r  throttle support\r\n\r\nwifi \r\n  wifi support\r\n\r\ntasks \r\n  Show tasks\r\n", 127) = 113
write(1, "urrent connection state values\r\n\nthrottle \r  throttle support\r\n\nwifi \r\n  wifi support\r\n\ntasks \r\n  Show tasks\r\n", 110) = 110

Notice the lack of a \n reading the line "throttle \r". This is a line generated within the console help routine. This is the missing header before the " throttle support" text.

I ran this cycle several times, and here's the output from three log files:

$ grep "event loop" minicom.log* | grep read
minicom.log:read(3, " tasks\r\n\r\nevent \r\n  Print event loop debug\r\n\r\nreset \r\n  Reset the device (optionally to factory default)\r\n\r\nversion \r  Display ", 127) = 127
minicom.log2:read(3, "\r\nevent \r\n  Print event loop debug\r\n\r\nreset \r\n  Reset the device (optionally to factory default)\r\n\r\nversion \r\n  Display version", 127) = 127
minicom.log3:read(3, "hrottle \r\n  throttle support\r\r\nwifi \r\n  wifi support\r\n\r\ntasks \r\n  Show tasks\r\n\r\nevent \r  Print event loop debug\r\r\nreset \r\n  Res", 127) = 127

Notice that 'event ' header lines are inconsistent. Sometimes it has \r\n, and sometimes it's just the \r. When the \n is missing, the text doesn't display properly (pretty sure the first line is being output, but then is overwritten because of the \r without the \n).

But the key thing is that the data coming in from the ESP32-S3 is inconsistent, even though the device has not been refreshed (or even restarted) between these runs.

@pjsg
Copy link

pjsg commented Aug 5, 2022

The interesting thing about losing \n is that the output function handles \n specially. In particular in usb_serial_jtag_write it calls usb_serial_jtag_ll_txfifo_flush(); immediately after adding the \n to the buffer. I'd try commenting out that line, and see if the problem changes. I made a whole bunch of changes to that function (I was also losing \n but that was due to the idf monitor) during my debugging.

The inconsistency makes me think that it is related to the timing of the linux box polling the cdc device and precisely when characters get added to the output buffer. I imagine that the USB polling causes interrupts which happen at different times on each run.

The other thing to try is that Wireshark supports tracing USB transactions via the usbmon kernel module. Tracing the USB CDC device might reveal whether the missing newline is at the start of a packet, in the middle or at the end.

@davidzuhn
Copy link
Contributor Author

When I disable the usb_serial_jtag_ll_txfifo_flush call in usb_serial_jtag_write, the entire console command line interface hangs, so I don't get my prompts, I don't get any further debug output, and I can't run any commands. So disabling the call is very much a no-go situation.

I've done a trace under Wireshark, but I don't know how to read USB CDC traffic yet, so I haven't any further information on that front. Capture file attached in case you already know what you're looking at.

usb-cdc-monitor.zip

This is a capture of connecting to the device, running several 'help' commands (during which time some of the lines were lost), and then disconnecting. There are no other USB devices on this computer, so the capture contains only the ESP32-S3 traffic.

@chipweinberger
Copy link
Contributor

chipweinberger commented Jan 2, 2023

@davidzuhn

When I disable the usb_serial_jtag_ll_txfifo_flush call in usb_serial_jtag_write, the entire console command line interface hangs

This PR might fix the hang. I'd need to look deeper to be sure.

#10208

I'd first try to remove the conditional and always flush.

    //if (c == '\n') {
        //Make sure line doesn't linger in fifo
        usb_serial_jtag_ll_txfifo_flush();
    //}

or yes, ideally never force a flush and let the flush happen naturally when the USB Host polls. I dont understand what this flush would even do since the bus is Host controlled.

@chipweinberger
Copy link
Contributor

chipweinberger commented Jan 2, 2023

I am also hitting the same issue (or similar) when using python's serial console for reading. Characters seem out of order v.s. uart. I need to look deeper tho. I think entire lines are being swallowed.

Have we come to any conclusions?

@chipweinberger
Copy link
Contributor

Id also try changing this:
esp_vfs_dev_uart_port_set_tx_line_endings(CONSOLE_UART_NUM, ESP_LINE_ENDINGS_CRLF);

to
esp_vfs_dev_uart_port_set_tx_line_endings(CONSOLE_UART_NUM, ESP_LINE_ENDINGS_LF);

@chipweinberger
Copy link
Contributor

And would also try increasing tx buffer size:

usb_serial_jtag_driver_config_t config = {
    .tx_buffer_size = 4096,
    .rx_buffer_size = 256
};

@jjsch-dev
Copy link

jjsch-dev commented Aug 22, 2023

I have a smiliar problem, and upon examining the code, I noticed that the usbjtag_tx_char_via_driver function contains a comparison that checks whether an immediate send operation succeeds. However, the return value from usb_serial_jtag_write_bytes is always 1, which prevents the intended logic to try blocking for 50 mS, but also a return that always skyp the bloking try.

Proposed Solution:
To address this issue, I suggest considering the following adjustments:

Modify the usb_serial_jtag_write_bytes function to return -1 on failure and 0 on success.

int usb_serial_jtag_write_bytes(const void* src, size_t size, TickType_t ticks_to_wait)
{
    ESP_RETURN_ON_FALSE(size != 0, ESP_ERR_INVALID_ARG, USB_SERIAL_JTAG_TAG, "size should be larger than 0");
    ESP_RETURN_ON_FALSE(src != NULL, ESP_ERR_INVALID_ARG, USB_SERIAL_JTAG_TAG, "Invalid buffer pointer.");
    ESP_RETURN_ON_FALSE(p_usb_serial_jtag_obj != NULL, ESP_ERR_INVALID_ARG, USB_SERIAL_JTAG_TAG, "The driver hasn't been initialized");

    const uint8_t *buff = (const uint8_t *)src;
    // Blocking method, Sending data to ringbuffer, and handle the data in ISR.
    BaseType_t result = xRingbufferSend(p_usb_serial_jtag_obj->tx_ring_buf, (void*) (buff), size, ticks_to_wait);

    // Now trigger the ISR to read data from the ring buffer.
    usb_serial_jtag_ll_ena_intr_mask(USB_SERIAL_JTAG_INTR_SERIAL_IN_EMPTY);
       
    // Return -1 on failure, 0 on success.
    return (result == pdFALSE) ? -1 : 0;
}

Update the usbjtag_tx_char_via_driver function to accommodate the adjusted return values and logic.

static void usbjtag_tx_char_via_driver(int fd, int c)
{
    char ch = (char) c;
    TickType_t ticks = (TX_FLUSH_TIMEOUT_US / 1000) / portTICK_PERIOD_MS;
    // Attempt to send the character immediately without blocking.
    if (usb_serial_jtag_write_bytes(&ch, 1, 0) != 0) {
        s_ctx.tx_tried_blocking = false;
    } else {
        return;
    }

    // If immediate send fails, try blocking with a timeout.
    if (s_ctx.tx_tried_blocking == false) {
        if (usb_serial_jtag_write_bytes(&ch, 1, ticks) != 0) {
            return;
        } else {
            s_ctx.tx_tried_blocking = true;
        }
    } 
}

@chipweinberger
Copy link
Contributor

@jjsch-dev thats is a great observation!

Can you create a separate issue and open a Pr?

@jjsch-dev
Copy link

@chipweinberger thanks for the comment, and I will do a new issue and open the PR.

@davidzuhn
Copy link
Contributor Author

I find that this problem is not fixed, and in fact is worse now with the latest version of the IDF (git commit 62ee413).

This is what I get when I run my help command (which should look like the content shown earlier in this bug report).

BKT-1 #0>BKT-1 #0>I (4918) wc.event: found ssid:Clearwater rssi:-16 known:noI (4918) wc.event: found ssid:Palouse rssi:-44 known:no
I (4918) wc.event: found ssid:Clearwater rssi:-61 known:no
I (4918) wc.event: found ssid:myownlittleidaho rssi:-74 known:no
I (4928) wc.event: found ssid:BarbraHaugen rssi:-91 known:noown:no
I (4928) wc.event: found ssid:CenturyLink1349 rssi:-96 known:no
BKT-1 #0>helpfs 
  data fs command
pilot 
  set pilot light on/off times
i2c 
  read/write/scan i2c valuadc 
  read ADC valustate 
  Display current connection state values
throttle 
  throttle support

wi  wifi support
heap 
  show heaptasks 
  Show tasks
me  Memory debug

event 
  Print event loop debug
reset 
  Reset the device (optionally to factory defaulversion 
  Display version information
board 
  read board valuknob 
  View/set the knob position levels
help  [<string  Print the summary of all registered commands if no arguments are given,
  otherwise print summary of given command.
      <string>  Name of commabattery 
  notify battery level

Instead of losing lines, I lose large chunks of the text in arbitrary places.

@jjsch-dev
Copy link

Hi @davidzuhn, I think the master (as of 4th Sep) doesn't have this fix. Does your code use printf in more than one thread?

@davidzuhn
Copy link
Contributor Author

The tree I'm using has the fix that was claimed to close this issue (commit c926153 of Aug 23).

My code uses a log of ESP_LOG across many threads, and a little bit of printf (which is almost all from the thread that handles the command line processing).

@jjsch-dev
Copy link

Sorry with the fix date time, I made a mistake with the version. Although ESP_LOG is thread safe, printf is not, perhaps you can create a function that uses a mutex to serialize printf and if possible disable ESP_LOG.

@chipweinberger
Copy link
Contributor

chipweinberger commented Sep 5, 2023

@jjsch-dev perhaps we could add a warning? something like:

"printf is not thread safe. you should only call printf from a single thread"

which would be logged if you call printf from multiple threads?

I have been using ESP-IDF many years and never knew printf was not thread safe!

@jjsch-dev
Copy link

@chipweinberger, You have right because while printf() is reentrant, meaning it can safely be called from multiple threads without corrupting its internal data structures, it doesn't inherently guarantee that output will be serialized correctly. This can result in interleaved or mixed output when multiple threads use printf() simultaneously.

I might be mistaken, but in these scenarios, I believe the issue of mixed output occurs when using a console to receive user commands. For instance, when utilizing linenoise(), which relies on write() and read() functions. If the formatted response is generated using printf() from a different thread than the one handling linenoise(), this problem can arise. It becomes even more challenging if you have ESP_LOG macros enabled. To resolve this issue, you would need to make changes to the IDF to establish a mutex that can be shared for write(), printf(), and ESP_LOG.

The straightforward solution is to disable the ESP_LOG macros and ensure that printf is called from the same thread that runs linenoise.

@davidzuhn
Copy link
Contributor Author

In my scenario, the printf output is not being interleaved between data from multiple threads, nor is there any interleave of output from ESP_LOG. The block of text with the missing/corrupted data is generated and printed by the help command that comes from esp_console_register_help_command().

I completely understand that there is the chance of interleave between portions of the help text and log messages from ESP_LOG. But that's not happening. What's happening is that the blob of text from help is missing characters when being read from the host. Earlier it was missing the carriage return, allowing the terminal to overwrite one line with the next giving the appearance of missing lines of text. With the latest ESP-IDF, much more than that is going missing.

@jjsch-dev
Copy link

@davidzuhn thanks for the firmware explanation, is the host program in python?

@davidzuhn
Copy link
Contributor Author

I get similar results when I use 'idf.py monitor' or when I use minicom.

@jjsch-dev
Copy link

ok, can you share the source code? I have some time and maybe I can help you find the problem.

@chipweinberger
Copy link
Contributor

I could repro it with simple code, when I tried.

    int v = 0;
    while (true) {
        for(int i = 0; i < 1024 * 10; i++) {
            printf("%i,%i\n", v, i);
        }
        vTaskDelay(1);
        v++;
    }

@jjsch-dev
Copy link

@chipweinberger to run this test, you disabled the WDT or put vTaskDelay(1); inside the for()?

@chipweinberger
Copy link
Contributor

I recall disabling WDT

@jjsch-dev
Copy link

@chipweinberger I tested your code on an esp32-s3 with WDT disabled at 160Mhz with channel for console output = USB Serial/JTAG Controller and can't reproduce the issue.

image

I use the ESP-IDF v5.2-dev-2319-gcbbc7ff9df-dirty
image

I create the app with the idf.py create-project

I use this main.

#include <stdio.h>
#include <stdbool.h>
#include <freertos/FreeRTOS.h>
#include <freertos/task.h>

void app_main(void)
{
    int v = 0;
    while (true) {
        for(int i = 0; i < 1024 * 10; i++) {
            printf("%i,%i\n", v, i);
        }
        v++;
        vTaskDelay(1);
    }
}

To test I used @igrr's procedure with the monitor Ctr+T Ctr+L and the python script. To check if python works I modified some values, you can see the pictures with the report.
log.issue_IDFGH-7781.20230906203007.zip

image

@chipweinberger
Copy link
Contributor

chipweinberger commented Sep 7, 2023

that's great! I typically use 240mhz, btw.

I do recall thinking it might have been a bug in macOS.

@davidzuhn maybe you could write a reproducing minimal example?

espressif-bot pushed a commit that referenced this issue Sep 9, 2023
CommanderRedYT pushed a commit to CommanderRedYT/esp-idf that referenced this issue Sep 24, 2023
espressif-bot pushed a commit that referenced this issue Oct 10, 2023
espressif-bot pushed a commit that referenced this issue Oct 18, 2023
@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: Opened Issue is new labels Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

10 participants