Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate mimetype insertion during scan #5199

Closed
PVince81 opened this issue Oct 8, 2013 · 28 comments · Fixed by #5442
Closed

Duplicate mimetype insertion during scan #5199

PVince81 opened this issue Oct 8, 2013 · 28 comments · Fixed by #5442

Comments

@PVince81
Copy link
Contributor

PVince81 commented Oct 8, 2013

Steps to reproduce:

  1. Set up an external storage
  2. Go to the files app
  3. Go back to admin page
  4. Change external storage
  5. Go to the files app

These aren't exact steps and it doesn't always happen, might be a race condition.
It does seem to happen more often when changing the config though.

It seems that the files scanning is trying to insert the same mimetype multiple times into the DB:

An exception occurred while executing 'INSERT INTO `oc_mimetypes`(`mimetype`) VALUES(?)': SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry '' for key 'mimetype_id_index'
#0 /srv/www/htdocs/owncloud/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Statement.php(140): Doctrine\DBAL\DBALException::driverExceptionDuringQuery(Object(PDOException), 'INSERT INTO `oc...', Array)
#1 /srv/www/htdocs/owncloud/lib/private/db/statementwrapper.php(70): Doctrine\DBAL\Statement->execute(Array)
#2 /srv/www/htdocs/owncloud/lib/private/db.php(282): OC_DB_StatementWrapper->execute(Array)
#3 /srv/www/htdocs/owncloud/lib/private/files/cache/cache.php(72): OC_DB::executeAudited('INSERT INTO `*P...', Array)
#4 /srv/www/htdocs/owncloud/lib/private/files/cache/cache.php(264): OC\Files\Cache\Cache->getMimetypeId(false)
#5 /srv/www/htdocs/owncloud/lib/private/files/cache/cache.php(210): OC\Files\Cache\Cache->buildParts(Array)
#6 /srv/www/htdocs/owncloud/lib/private/files/cache/scanner.php(143): OC\Files\Cache\Cache->put('', Array)
#7 /srv/www/htdocs/owncloud/lib/private/files/view.php(879): OC\Files\Cache\Scanner->scanFile('')
#8 /srv/www/htdocs/owncloud/lib/private/files/filesystem.php(734): OC\Files\View->getDirectoryContent('', '')
#9 /srv/www/htdocs/owncloud/apps/files/lib/helper.php(72): OC\Files\Filesystem::getDirectoryContent('')
#10 /srv/www/htdocs/owncloud/apps/files/index.php(77): OCA\Files\Helper::getFiles('')
#11 /srv/www/htdocs/owncloud/lib/base.php(727): require_once('/srv/www/htdocs...')
#12 [internal function]: OC::loadAppScriptFile(Array)
#13 /srv/www/htdocs/owncloud/lib/private/router.php(127): call_user_func(Array, Array)
#14 /srv/www/htdocs/owncloud/lib/base.php(655): OC_Router->match('/apps/files')
#15 /srv/www/htdocs/owncloud/index.php(30): OC::handleRequest()
#16 {main}

I'll provide more detailed step when I have them.
@PVince81
Copy link
Contributor Author

PVince81 commented Oct 8, 2013

@icewind1991 @schiesbn @DeepDiver1975 @karlitschek

@PVince81
Copy link
Contributor Author

PVince81 commented Oct 8, 2013

Ok, here are my steps with MySQL:

  1. Start with an empty DB and installed => false
  2. Delete data/mount.json if it exists
  3. Run the setup through the web interface and select the MySQL DB
  4. Activate the files_external app
  5. Go to the admin page
  6. Prepare a local directory and put one big picture inside, at least 3 MB big to slow down scanning.
  7. Add an external mountpoint: local folder "sftp", type: SFTP, host: localhost, path to the dir from step 6, and "all users"
  8. Select the "Files" app
  9. Click on the "sftp" subdir to navigate to it. It should take a while to scan.
  10. Open the network console of the debugger and see the response of the "list.php" call.

Expected result

Navigates to the 'sftp' subdir

Actual result

Stays in the current dir and the network console has this error in a HTML page:

An exception occurred while executing 'INSERT INTO `oc_mimetypes`(`mimetype`) VALUES(?)':

SQLSTATE[23000]: Integrity constraint violation: 1062 Duplicate entry 'image' for key 'mimetype_id_index'
0 /srv/www/htdocs/owncloud/3rdparty/doctrine/dbal/lib/Doctrine/DBAL/Statement.php(140): Doctrine\DBAL\DBALException::driverExceptionDuringQuery(Object(PDOException), 'INSERT INTO `oc...', Array)
    1 /srv/www/htdocs/owncloud/lib/private/db/statementwrapper.php(70): Doctrine\DBAL\Statement->execute(Array)
    2 /srv/www/htdocs/owncloud/lib/private/db.php(282): OC_DB_StatementWrapper->execute(Array)
    3 /srv/www/htdocs/owncloud/lib/private/files/cache/cache.php(72): OC_DB::executeAudited('INSERT INTO `*P...', Array)
    4 /srv/www/htdocs/owncloud/lib/private/files/cache/cache.php(264): OC\Files\Cache\Cache->getMimetypeId('image')
    5 /srv/www/htdocs/owncloud/lib/private/files/cache/cache.php(241): OC\Files\Cache\Cache->buildParts(Array)
    6 /srv/www/htdocs/owncloud/lib/private/files/cache/cache.php(187): OC\Files\Cache\Cache->update('5', Array)
    7 /srv/www/htdocs/owncloud/lib/private/files/cache/scanner.php(143): OC\Files\Cache\Cache->put('P1060370.JPG', Array)
    8 /srv/www/htdocs/owncloud/lib/private/files/cache/scanner.php(202): OC\Files\Cache\Scanner->scanFile('P1060370.JPG', 3, true)
    9 /srv/www/htdocs/owncloud/lib/private/files/cache/scanner.php(168): OC\Files\Cache\Scanner->scanChildren(false, false, 3)
    10 /srv/www/htdocs/owncloud/lib/private/files/view.php(848): OC\Files\Cache\Scanner->scan(false, false)
    11 /srv/www/htdocs/owncloud/lib/private/files/filesystem.php(734): OC\Files\View->getDirectoryContent('/sftp', '')
    12 /srv/www/htdocs/owncloud/apps/files/lib/helper.php(72): OC\Files\Filesystem::getDirectoryContent('/sftp')
    13 /srv/www/htdocs/owncloud/apps/files/ajax/list.php(37): OCA\Files\Helper::getFiles('/sftp')
    14 /srv/www/htdocs/owncloud/lib/base.php(727): require_once('/srv/www/htdocs...')
    15 [internal function]: OC::loadAppScriptFile(Array)
    16 /srv/www/htdocs/owncloud/lib/private/router.php(127): call_user_func(Array, Array)
    17 /srv/www/htdocs/owncloud/lib/base.php(655): OC_Router->match('/apps/files/aja...')
    18 /srv/www/htdocs/owncloud/index.php(30): OC::handleRequest()
    19 {main}

Sometimes it is another exception about constraint violation when insert a file into the DB.

It doesn't seem to happen for smaller image files. It seems to be a race condition.
Also, it only happens the first time.
If after that you refresh the page in the root dir, then navigate to the subdir it will work fine.

@PVince81
Copy link
Contributor Author

PVince81 commented Oct 8, 2013

Okay, it seems to be a conflict between the scanning triggered by scan.php and the list retrieval from list.php, which also triggers a scanning.

Before navigating to the subdir, if you observe the network console and wait for the scan.php call to finish, then click on the subdir, it will all work fine.

@PVince81
Copy link
Contributor Author

PVince81 commented Oct 8, 2013

@icewind1991 @DeepDiver1975 is scanning supposed to be exclusive / locking other scans out or is it allowed for multiple scan processes to run at the same time ?

@PVince81
Copy link
Contributor Author

Getting closer, it looks like there are two server calls that conflict:

First call: Initiates the background scan
0 OC\Files\Cache\Cache->put() /srv/www/htdocs/owncloud/lib/private/files/cache/cache.php:219
1 OC\Files\Cache\Scanner->scanFile() /srv/www/htdocs/owncloud/lib/private/files/cache/scanner.php:143
2 OC\Files\Cache\Scanner->scanChildren() /srv/www/htdocs/owncloud/lib/private/files/cache/scanner.php:199
3 OC\Files\Cache\Scanner->scan() /srv/www/htdocs/owncloud/lib/private/files/cache/scanner.php:166
4 OC\Files\Cache\Scanner->backgroundScan() /srv/www/htdocs/owncloud/lib/private/files/cache/scanner.php:253
5 OC\Files\Utils\Scanner->backgroundScan() /srv/www/htdocs/owncloud/lib/private/files/utils/scanner.php:80
6 require_once() /srv/www/htdocs/owncloud/apps/files/ajax/scan.php:29
7 OC::loadAppScriptFile() /srv/www/htdocs/owncloud/lib/base.php:726
8 call_user_func() /srv/www/htdocs/owncloud/lib/private/router.php:127
9 OC_Router->match() /srv/www/htdocs/owncloud/lib/private/router.php:127
10 OC::handleRequest() /srv/www/htdocs/owncloud/lib/base.php:654
11 {main} /srv/www/htdocs/owncloud/index.php:30

Second call: Scans as part of getDirectoryContent():
0 OC\Files\Cache\Cache->put() /srv/www/htdocs/owncloud/lib/private/files/cache/cache.php:219
1 OC\Files\Cache\Scanner->scanFile() /srv/www/htdocs/owncloud/lib/private/files/cache/scanner.php:143
2 OC\Files\Cache\Scanner->scanChildren() /srv/www/htdocs/owncloud/lib/private/files/cache/scanner.php:199
3 OC\Files\Cache\Scanner->scan() /srv/www/htdocs/owncloud/lib/private/files/cache/scanner.php:166
4 OC\Files\View->getDirectoryContent() /srv/www/htdocs/owncloud/lib/private/files/view.php:853
5 OC\Files\Filesystem::getDirectoryContent() /srv/www/htdocs/owncloud/lib/private/files/filesystem.php:734
6 OCA\Files\Helper::getFiles() /srv/www/htdocs/owncloud/apps/files/lib/helper.php:72
7 require_once() /srv/www/htdocs/owncloud/apps/files/ajax/list.php:37
8 OC::loadAppScriptFile() /srv/www/htdocs/owncloud/lib/base.php:726
9 call_user_func() /srv/www/htdocs/owncloud/lib/private/router.php:127
10 OC_Router->match() /srv/www/htdocs/owncloud/lib/private/router.php:127
11 OC::handleRequest() /srv/www/htdocs/owncloud/lib/base.php:654
12 {main} /srv/www/htdocs/owncloud/index.php:30

The second call tries to insert the same file into the DB which causes a duplicate key exception.

After both calls are done, the entry for that file appears once in the DB.

I'll try to pinpoint what is triggering this call and why the second call doesn't find the file in the DB.

@PVince81
Copy link
Contributor Author

Ok, the first call is from scan.php which triggers the background scan and the second one from list.php to get the directory contents.

If at the time where the second call is running, the background hasn't finished yet or hasn't started, the second call will believe the scan is incomplete here in lib/private/files/view.php:

if ($cache->getStatus($internalPath) < Cache\Cache::COMPLETE) {
    $scanner = $storage->getScanner($internalPath);
    $scanner->scan($internalPath, Cache\Scanner::SCAN_SHALLOW);
} else {
    $watcher = $storage->getWatcher($internalPath);
    $watcher->checkUpdate($internalPath);
}

and when the scan is incomplete, getDirectoryContent() will then call $scanner->scan() which itself is checking whether we have data in the cache in lib/private/files/cache/scanner.php in the scanFile() method:

$cacheData = $this->cache->get($file);
if ($cacheData) {
    // ...
}
if (!empty($newData)) {
    $this->cache->put($file, $newData);
}

When no cache data is available, it will put/insert the scanned file into the DB. If between this the first call has inserted the row already, the duplicate key error will occur.

I'm not sure yet how the two scanning processes are supposed to lock each other out.
Maybe instead of checking for Cache::COMPLETE above we should have a state Cache::RUNNING which would make the call getDirectoryContent() block until the scanner is done.

I'll look into the getWatcher() call above which might be what I'm looking for.

@DeepDiver1975
Copy link
Member

@icewind1991 can you please have a look? THX

@bantu
Copy link

bantu commented Oct 16, 2013

@PVince81 The query is only inserting a single mimetype. So you can just go to where the insert happens, catch the duplicate key exception and ignore it?

@PVince81
Copy link
Contributor Author

Yes, @karlitschek suggested to do that as a workaround.
Note that it can also happen while inserting files, so the exception should be caught there as well.

@bantu
Copy link

bantu commented Oct 17, 2013

@PVince81 Proper design would suggest that there is only a single place where mimetype insertion happens.

@karlitschek
Copy link
Contributor

@bantu The point is that this can happen if several php process run at the same time and try to insert the same primary key into the db. Catching the duplicate key exception should be fine

@tanghus
Copy link
Contributor

tanghus commented Oct 17, 2013

Or use OCP\DB::insertIfNotExist(); https://github.com/owncloud/core/blob/master/lib/public/db.php#L65

@karlitschek
Copy link
Contributor

@tanghus Excellent point. I used this in the past with MySQL. I wasn't sure that this works with all databases :-)

@tanghus
Copy link
Contributor

tanghus commented Oct 17, 2013

@karlitschek amazingly it works even with Oracle ;) AFAIRC performance with SQLite isn't optimal, but when is it ever that.

@bantu
Copy link

bantu commented Oct 17, 2013

@karlitschek You seem to have misunderstood or I wasn't clear enough.
My comment

Proper design would suggest that there is only a single place where mimetype insertion happens.

was in reference to

Note that it can also happen while inserting files, so the exception should be caught there as well.

My point is that there should be only a single place where the mimetype insert SQL is executed and thus only a single place where the duplicate key error has to be ignored.

@PVince81
Copy link
Contributor Author

There are at least two code paths that reach that single place: the scanner from scan.php and the getDirectoryContent() from list.php

I'll have a try catching and ignoring the exception directly where the insert is done.
There are two places:

  1. insertion of mime type, which happens when the mime type is queried from the DB and doesn't exist yet
  2. insertion of new files into the DB

Should this conflict/exception ignoring be logged ?

@karlitschek
Copy link
Contributor

I would log it with level DEBUG especially until we are 100% that we found and fixed the problem. In normal operation, especially in a heavy load setup, this should not be logged. It's just normal

@PVince81
Copy link
Contributor Author

From what I see catching the exception isn't enough, we need to return an id.
In the exception handler I'll add a SELECT or call to ::get() to make sure we return the expected id. (for both mimetype insertion and file insertion)

@karlitschek
Copy link
Contributor

see here: http://stackoverflow.com/questions/1132571/implementing-update-if-exists-in-doctrine-orm
Perhaps REPLACE INTO is the solution

@bantu
Copy link

bantu commented Oct 21, 2013

@karlitschek Probably not. That performs a delete+insert which changes the autoincrement id according to the link you posted.

@bantu
Copy link

bantu commented Oct 21, 2013

@PVince81 Please link to work-in-progress patch.

@PVince81
Copy link
Contributor Author

After discussing we found out that REPLACE INTO wouldn't work with Oracle DB and others.
I'm currently trying by catching the exception then running a select to get the id.
But it doesn't seem to work. The select doesn't return anything, even though I can get results for the same query from a separate client.

@PVince81
Copy link
Contributor Author

@bantu here's my current code: #5442

@PVince81
Copy link
Contributor Author

CC @butonic FYI

@PVince81
Copy link
Contributor Author

Ok, i guess it's probably a transactions thing.

Transaction 1: (scan.php)

  1. inserts entry with key "2-abc"

In parallel, Transaction 2 is started: (list.php)
2) Select entry with key "2-abc". Even if already added, since the transaction 2 started after 1 at a time where it didn't exist, the select returns nothing.
3) Insert duplicate entry with key "2-abc". Gives constrain violation because even though the state from 2) is not up to date, it can't allow a duplicate entry to appear.

Which means that the current solution with try{ insert } catch () { select id } can't be used.
Will try to find another approach.

@VicDeo
Copy link
Member

VicDeo commented Oct 21, 2013

@PVince81 handler for this exception needs to refresh the mime cache and retry https://github.com/owncloud/core/blob/master/lib/private/files/cache/cache.php#L88

@PVince81
Copy link
Contributor Author

I've moved the duplicate exception catching to scanFile() as it's the place we expect it to happen while two scan processes happen.

Now the problem with duplicate mimetype insertion remains. Ignoring the exception cause image previews to not be rendered until the browser is refreshed.

PVince81 pushed a commit that referenced this issue Oct 23, 2013
When two scanning processed run at the same time, for example when
scan.php and list.php were called at the same time on an external
storage mountpoint, duplicate insertion errors can occurs.

These errors are now logged and ignored.

Since both scans are running in parallel transactions, they don't see
each other's changes directly in the DB which can cause duplicate
insertion errors for mime types as well, but those mime types can't be
selected yet. The solution to this is to force-reload the mimetypes list
after the transaction is finished.

Fixes #5199
@PVince81
Copy link
Contributor Author

Ok, I found out that there is an explicit DB transaction is started when scanning a folder.
During the transaction, if you try and select all mime types, you only get the ones that were present before the transaction was started. Which means that if the parallel running scan added values, you won't see them, but inserting duplicates will still throw an exception.

To make sure that we still get the mime type id after the scan is done (by list.php), we force-reload the mimetype id after the transaction is finished.

I've updated the PR.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants