Add ETL process to compress relevant data #9

ChristophWurst · 2019-01-15T19:42:25Z

Due to many requests being captured, the login_address table grows quickly
on large installations. However, the individual rows are not required for the
training. As long as we know which (ip,uid) has been used first/last, we can
separate training from test data while keeping the database compact.

This will migrate and compress the existing data in batches of 25k rows per hour.

To my defense: I know this way of transforming the data is inefficient. I wish I could use the query builder to do it all in SQL and thuse don't pipe the data through php, but I couldn't get that working.

Due to many requests being captured, the login_address table grows quickly on large installations. However, the individual rows are not required for the training. As long as we know which (ip,uid) has been used first/last, we can separate training from test data while keeping the database compact. Signed-off-by: Christoph Wurst <christoph@winzerhof-wurst.at>

ChristophWurst added the enhancement New feature or request label Jan 15, 2019

ChristophWurst self-assigned this Jan 15, 2019

ChristophWurst merged commit a454dd0 into master Jan 15, 2019

ChristophWurst deleted the enhancement/etl-processing branch January 15, 2019 20:07

pzzszk mentioned this pull request Feb 1, 2022

Running ./occ Suspiciouslogin:optimize fails with call to undefined function pcntl_signal_dispath #602

Closed

psyciknz mentioned this pull request Mar 4, 2022

training error. Not quite sure what im doing. #616

Closed

AndyXheli mentioned this pull request Jun 9, 2023

ValueError: random_int(): Argument #1 ($min) must be less than or equal to argument #2 ($max) #745

Closed

wriver4 mentioned this pull request Sep 26, 2024

php occ suspiciouslogin:optimize fails #939

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ETL process to compress relevant data #9

Add ETL process to compress relevant data #9

ChristophWurst commented Jan 15, 2019

Add ETL process to compress relevant data #9

Add ETL process to compress relevant data #9

Conversation

ChristophWurst commented Jan 15, 2019