Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unserialized objects use significantly more memory than ones created with the normal constructor #10126

Open
czirkoszoltan opened this issue Dec 18, 2022 · 6 comments

Comments

@czirkoszoltan
Copy link

Description

Consider the following code:

class SomeClass {
    public $a = 1;
    public $b = 2;
    public $c = 3;
    public $d = 4;
    public $e = 5;
    public $f = 6;
}

$s1 = serialize(new SomeClass());

function new_object() {
    return new SomeClass();
}

function unserialize_object() {
    global $s1;
    return unserialize($s1);
}


class OtherClass {
    public $a = 1;
    public $b = 2;
    public $c = 3;
    public $d = 4;
    public $e = 5;
    public $f = 6;
    
    public function __unserialize(array $attrs) {
        foreach ($attrs as $name => $value) 
            $this->$name = $value;
    }
}

$s2 = serialize(new OtherClass());

function new_object_other() {
    return new OtherClass();
}

function unserialize_object_other() {
    global $s2;
    return unserialize($s2);
}


foreach ([
    'new_object', 'unserialize_object',
    'new_object_other', 'unserialize_object_other',
] as $f) {
    $arr = [];
    for ($i = 0; $i < 50000; ++$i) {
        $arr[] = $f();
    }

    echo $f, " ", memory_get_usage() >> 20, "\n";
}

SomeClass contains some data. new_object() will create an instance, while unserialize_object() will unserialize an instance of exactly the same state (see $s1). When an array of 50k elements is created with one or another, the output for memory usage is as follows:

new_object 10
unserialize_object 43

I.e. the 50k instances created with the constructor use 10M memory, while the unserialized ones 43M. It seems that some data structures that are used while unserializing are not freed. However, when unsetting $arr to destroy the instances, memory consumption will drop to 1M, as expected.

Now consider OtherClass which has the __unserialize method. Nothing fancy, it just sets the attributes as the default handler would do. new_object_other() and unserialize_object_other() replicate the same logic as explained above, but for the new class. Output in this case is

new_object_other 10
unserialize_object_other 10

That is, memory consumption is the same regardless of how the objects are created.

Surplus memory usage grows with the number of attributes.

Tested with 8.1.2 and 8.2.0. Can also be seen on https://3v4l.org/ .

PHP Version

8.2.0

Operating System

No response

@MorganLOCode
Copy link

One thing to note from the 3v4l results (and my own testing) is that from at least 5.2.1 through 7.3.11, unserialize_object_other has the same memory usage as unserialize_object (i.e., over-sized); only from 7.4.0 onward does it match new_object[_other].

@cmb69
Copy link
Member

cmb69 commented Dec 19, 2022

Usually, declared object properties are stored in a C array (.properties_table), but unserializing materializes the more general HashTable (.properties), and this HashTable amounts for the additional memory usage. Basically the same happens when you var_dump() the object, and the engine deliberately doesn't delete/purge the .properties HashTable once it has been materalized for performance reasons.

@cmb69 cmb69 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 19, 2022
@MorganLOCode
Copy link

MorganLOCode commented Dec 20, 2022

but unserializing materializes the more general HashTable (.properties), and this HashTable amounts for the additional memory usage.

And using foreach in the __unserialize method above effectively bypasses the Hashtable by directly populating the C form instead (slower, since done property by property)?

@cmb69
Copy link
Member

cmb69 commented Dec 20, 2022

And using foreach in the __unserialize method above effectively bypasses the Hashtable by directly populating the C form instead (slower, since done property by property)?

No, not really. Re-opening, since this might deserve closer investigation.

One thing to note from the 3v4l results (and my own testing) is that from at least 5.2.1 through 7.3.11, unserialize_object_other has the same memory usage as unserialize_object (i.e., over-sized); only from 7.4.0 onward does it match new_object[_other].

This is because ::__unserialize() is only supported as of PHP 7.4.0. Previously, the function was not called.

@czirkoszoltan
Copy link
Author

Hi @cmb69 thanks for considering this. In my use case, it is a difference betweem 500M or 1000M memory usage for the scipt.

@MorganLOCode The workaround is pretty fast, fortunately. It does not slow down the script considerably, the difference is marginal.

@nielsdos
Copy link
Member

nielsdos commented May 7, 2023

There are two places where the unserializer will allocate more memory when not using __unserialize.

First place:

zend_hash_extend(ht, zend_hash_num_elements(ht) + elements, HT_IS_PACKED(ht));

The reason the table is extended beforehand, is because without it a nested deserialisation may cause an array resize, which will invalidate all pointers to the data. So by resizing beforehand all pointers will remain valid.
One solution could be to shrink the hashtable after the deserialisation, but there is unfortunately no API for that.

Second place:

ht = Z_OBJPROP_P(rval);

This was already hinted at by Christoph. The properties hash table is rebuilt and will be kept in memory.

It seems like at least the first one should be doable to fix. The second one is for performance (speed) reasons.

The reason it doesn't happen with __unserialize is because it overrides the default behaviour and instead collects the properties into an array:

if (has_unserialize) {
zval ary, *tmp;
if (elements >= HT_MAX_SIZE) {
return 0;
}
array_init_size(&ary, elements);
/* Avoid reallocation due to packed -> mixed conversion. */
zend_hash_real_init_mixed(Z_ARRVAL(ary));
if (!process_nested_array_data(UNSERIALIZE_PASSTHRU, Z_ARRVAL(ary), elements)) {
ZVAL_DEREF(rval);
GC_ADD_FLAGS(Z_OBJ_P(rval), IS_OBJ_DESTRUCTOR_CALLED);
zval_ptr_dtor(&ary);
return 0;
}
/* Delay __unserialize() call until end of serialization. We use two slots here to
* store both the object and the unserialized data array. */
ZVAL_DEREF(rval);
tmp = tmp_var(var_hash, 2);
ZVAL_COPY(tmp, rval);
Z_EXTRA_P(tmp) = VAR_UNSERIALIZE_FLAG;
tmp++;
ZVAL_COPY_VALUE(tmp, &ary);
return finish_nested_data(UNSERIALIZE_PASSTHRU);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants