Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop X509 context after successful server verification to save heap space #6065

Merged
merged 5 commits into from
May 15, 2019
Merged

Conversation

sislakd
Copy link
Contributor

@sislakd sislakd commented May 10, 2019

After completing handshake in BSSL, server is already verified and X509 context is no longer needed. Depending on verification method it save more or less heap space.

#6005

Copy link
Collaborator

@earlephilhower earlephilhower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! We're in a release freeze, so this will get in right after.

@earlephilhower earlephilhower added this to the 2.6.0 milestone May 10, 2019
@sislakd
Copy link
Contributor Author

sislakd commented May 10, 2019

Added the change for reporting not connected if TLS session is broken and there is no more buffered decrypted data. TLS can be broken if message authentication (MAC) cannot be verified. BearSSL enters BR_SSL_CLOSED state when processing invalid encrypted application data fragment. In such situation the current implementation get stuck forever unless user has own timeout mechanism build on top of WiFiClientSecureBearSSL. This change introduce fail fast via connected() returning false. Further it imply return -1 from read methods indicating broken channel upon which user should perform reconnect if needed.

@sislakd
Copy link
Contributor Author

sislakd commented May 10, 2019

It seems that I cannot create another PR for the change in the same file unless previous change is merged. Hopefully, this second change can be included in this open PR.

@earlephilhower
Copy link
Collaborator

Yes, GH works on PRs per-branch, so you'd need to make a new branch off master for each separate PR.

It's fine the way you've done it, we'll push them all at the same time. We'll adjust the title to reflect the two (or more) different updates.

@mnunespt
Copy link

Added the change for reporting not connected if TLS session is broken and there is no more buffered decrypted data. TLS can be broken if message authentication (MAC) cannot be verified. BearSSL enters BR_SSL_CLOSED state when processing invalid encrypted application data fragment. In such situation the current implementation get stuck forever unless user has own timeout mechanism build on top of WiFiClientSecureBearSSL. This change introduce fail fast via connected() returning false. Further it imply return -1 from read methods indicating broken channel upon which user should perform reconnect if needed.

Thank you very much!

earlephilhower added a commit to earlephilhower/Arduino that referenced this pull request May 20, 2019
Changes since 2.5.1 (to 2.5.2)

Core
----
* Add explicit Print::write(char) (esp8266#6101)

Build system
----
* Fix typo in elf2bin for QOUT binary generation (esp8266#6116)
* Support PIO Wl-T and Arduino -T linking properly (esp8266#6095)
* Allow *.cc files to be linked into flash by default (esp8266#6100)
* Use custom "ElfToBin" builder for PIO (esp8266#6091)
* Fail if generated JSON file cannot be read (esp8266#6076)
* Moved 'Dropping' print from stdout to stderr in drop_versions.py (esp8266#6071)
* Fix PIO issue when build environment contains spaces (esp8266#6119)

Libraries
----
* Remove deadlock when server is not acking our data (esp8266#6107)
* Bugfix for stuck in write method of WiFiClient and WiFiClientSecure until the remote peer closed connection (esp8266#6104)
* Re-add original SD FAT info access methods (esp8266#6092)
* Make FILE_WRITE append in SD.h wrapper (esp8266#6106)
* Drop X509 after connection, avoid hang on TLS broken (esp8266#6065)
@earlephilhower earlephilhower mentioned this pull request May 20, 2019
earlephilhower added a commit that referenced this pull request May 20, 2019
Changes since 2.5.1 (to 2.5.2)

Core
----
* Add explicit Print::write(char) (#6101)

Build system
----
* Fix typo in elf2bin for QOUT binary generation (#6116)
* Support PIO Wl-T and Arduino -T linking properly (#6095)
* Allow *.cc files to be linked into flash by default (#6100)
* Use custom "ElfToBin" builder for PIO (#6091)
* Fail if generated JSON file cannot be read (#6076)
* Moved 'Dropping' print from stdout to stderr in drop_versions.py (#6071)
* Fix PIO issue when build environment contains spaces (#6119)

Libraries
----
* Remove deadlock when server is not acking our data (#6107)
* Bugfix for stuck in write method of WiFiClient and WiFiClientSecure until the remote peer closed connection (#6104)
* Re-add original SD FAT info access methods (#6092)
* Make FILE_WRITE append in SD.h wrapper (#6106)
* Drop X509 after connection, avoid hang on TLS broken (#6065)
@mnunespt
Copy link

mnunespt commented May 27, 2019

Hi,
after more tests i've found that I have the same problem, even with this 2.5.2 version

I have this setup: Internet Router <=> WIFI AP <=> NodeMCU 12E
If i disconnect the cable from InternetRouter to WifiAP, the connection on WifiClientSecure hangs, with no time out.

The ESP8266 hangs in the last line of this debug output:
:ur 1 :del :ref 1 :wr 229 0 :wrc 229 229 0 :ack 229 :rn 536 :rch 536, 536 :rch 1072, 536 :rch 1608, 536 :rd 5, 2144, 0 :rdi 536, 5 :rd 93, 2144, 5 :rdi 531, 93 :rd 5, 2144, 98 :rdi 438, 5 :rd 2041, 2144, 103 :rdi 433, 433 :c 433, 536, 2144 :rdi 536, 536 :c 536, 536, 1608 :rdi 536, 536 :c 536, 536, 1072 :rdi 536, 536 :c0 536, 536

or

:wrc 45 45 0

i'm posting HTTPS and receiving data while the connection is broken.
It stops always in the :c0 536, 536 and some times on :wrc 45 45 0

have no clue... :(

Thanks for any help you can provide.

@mnunespt
Copy link

mnunespt commented May 29, 2019

Hi,
I've managed to solve my problem adding some code to the ClientContext.h file
I added a counter variable to the _write_some_from_cb() function to check for the loop and timeout after a few tries.
Here the code I added. For me is enought to restart the ESP, but we can just send a :wustmo timeout instead.

//line 545
    void _write_some_from_cb()
    {
		rcont++;
		DEBUGV("retry: %d\n",rcont);
		if (rcont > 30)
		{
			DEBUGV(" >>>>  Connection hang - will restart!!!");
			ESP.restart();
			rcont=0; 
		}
        if (_send_waiting) {
            _send_waiting = false;
            esp_schedule();
        }
    }

need to declare the global rcont as int

//line 671
private:   
    tcp_pcb* _pcb;
    pbuf* _rx_buf;
    size_t _rx_buf_offset;
    discard_cb_t _discard_cb;
    void* _discard_cb_arg;
    DataSource* _datasource = nullptr;
    size_t _written = 0;
    uint32_t _timeout_ms = 5000;
    uint32_t _op_start_time = 0;
    bool _send_waiting = false;
    uint8_t _connect_pending = 0;
    int8_t _refcnt;
    ClientContext* _next;
    bool _sync;
    int rcont = 0;
};

@d-a-v
Copy link
Collaborator

d-a-v commented May 29, 2019

@mnunespt This hack is dangerous. ClientContext is used by WiFiClient, and your restart impacts long transfers (even those unrelated with SSL) because rcont is never reset.
There were timeout fixes released in 2.5.2. If they are not sufficient, please open a new issue with a MCVE reproducing your issue.

@mnunespt
Copy link

mnunespt commented May 29, 2019

Yes you are right.
I forget to past the code here rcont is restarted.

rcont is reinitiated in the function:

err_t _acked(tcp_pcb* pcb, uint16_t len)
{
(void) pcb;
(void) len;
DEBUGV(":ack %d\r\n", len);
_write_some_from_cb();
rcont=0;
return ERR_OK;
}

It's not a elegant solution, but in my case, my specific sketch, it works very well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants