Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable mmap for reading model from cache #26696

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 48 additions & 1 deletion src/core/dev_api/openvino/runtime/shared_buffer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

namespace ov {

/// \brief SharedBuffer class to store pointer to pre-acclocated buffer.
/// \brief SharedBuffer class to store pointer to pre-acclocated buffer. Own the shared object.
template <typename T>
class SharedBuffer : public ov::AlignedBuffer {
public:
Expand All @@ -28,4 +28,51 @@ class SharedBuffer : public ov::AlignedBuffer {
T _shared_object;
};

/// \brief SharedStreamBuffer class to store pointer to pre-acclocated buffer and provide streambuf interface.
class SharedStreamBuffer : public std::streambuf {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider:

  • put this buffer in separate file like string buffer
  • add OPENVINO_API
  • split implementation to hpp/cpp
  • add some unit test (like for string buffer) to test overridden function how they are used.

Just another alternative to integrate it with ov buffer interface.
If multiple inheritance is an issue the composition can be used.

class SharedStreamBuffer : public SharedBuffer<std::shared_ptr<void>>, public std::streambuf {
public:
    SharedStreamBuffer(char* data, size_t size, const std::shared_ptr<void>& shared_object)
        : SharedBuffer<std::shared_ptr<void>>(data, size, shared_object),
          std::streambuf(),
          m_offset(0) {}

protected:
    std::streamsize xsgetn(char* s, std::streamsize count) override {
        auto real_count = std::min<std::streamsize>(showmanyc(), count);
        std::memcpy(s, get_ptr(m_offset), real_count);
        m_offset += real_count;
        return real_count;
    }

    int_type underflow() override {
        return (size() == m_offset) ? traits_type::eof()
                                    : traits_type::to_int_type(*(get_ptr<const char>() + m_offset));
    }

    int_type uflow() override {
        return (size() == m_offset) ? traits_type::eof()
                                    : traits_type::to_int_type(*(get_ptr<const char>() + m_offset++));
    }

    std::streamsize showmanyc() override {
        return size() - m_offset;
    }

private:
    size_t m_offset;
};

public:
SharedStreamBuffer(char* data, size_t size) : m_data(data), m_size(size), m_offset(0) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'data' is not passed to std::streambuf, so operator>> for std::istream should not work in core_impl and HeaderException should be thrown. Actually, it could be done on Linux only, due to the std::streambuf has different implementations depend on OS.
Another way is to define operator>> but you should handle end of line for different OS at least.


protected:
std::streamsize xsgetn(char* s, std::streamsize count) override {
auto real_count = std::min<std::streamsize>(m_size - m_offset, count);
std::memcpy(s, m_data + m_offset, real_count);
m_offset += real_count;
return real_count;
}

int_type underflow() override {
return (m_size == m_offset) ? traits_type::eof() : traits_type::to_int_type(*(m_data + m_offset));
}

int_type uflow() override {
return (m_size == m_offset) ? traits_type::eof() : traits_type::to_int_type(*(m_data + m_offset++));
}

std::streamsize showmanyc() override {
return m_size - m_offset;
}

char* m_data;
size_t m_size;
size_t m_offset;
};

/// \brief OwningSharedStreamBuffer is a SharedStreamBuffer which owns its shared object. Can return AlignedBuffer to
/// shared memory
class OwningSharedStreamBuffer : public SharedStreamBuffer {
public:
template <typename T>
OwningSharedStreamBuffer(char* data, size_t size, const T& shared_object)
: SharedStreamBuffer(data, size),
m_alligned_buffer(std::make_shared<SharedBuffer<T>>(data, size, shared_object)) {}

std::shared_ptr<AlignedBuffer> get_aligned_buffer() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is required and why not just return internal buffer. When shared buffer used it will just increment counter in stored shared object.

return m_alligned_buffer;
}

protected:
std::shared_ptr<AlignedBuffer> m_alligned_buffer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why shared pointer is required. The ov::SharedBuffer will manage owner ship
Why use aligned buffer not shared?

Suggested change
std::shared_ptr<AlignedBuffer> m_alligned_buffer;
SlignedBuffer m_buffer;

Should be sufficient

};

} // namespace ov
6 changes: 6 additions & 0 deletions src/inference/dev_api/openvino/runtime/iplugin.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,12 @@ class OPENVINO_RUNTIME_API IPlugin : public std::enable_shared_from_this<IPlugin
*/
const std::shared_ptr<ov::threading::ExecutorManager>& get_executor_manager() const;

/**
* @brief Check if plugin support mmap for cached model reading. Returns false is the method is not overrided by plugin.
* @return true if mmap is supported, false otherwise
*/
virtual bool support_mmap_for_caching() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use property instead adding functions?
util::contains(plugin.get_property(ov::supported_properties), ov::enable_mmap.name()) to check if supprted and if yes then get it value.

And add ENABLE_MMAP to supported properties in GPU plugin

Copy link
Contributor Author

@olpipi olpipi Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not like to show this "property" via public API.
It just says whether mmap should be used for a particular plugin. In future if all plugins support mmap, we can just remove this method.


virtual ~IPlugin() = default;

protected:
Expand Down
17 changes: 13 additions & 4 deletions src/inference/src/cache_manager.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@
#include <memory>
#include <string>

#include "openvino/runtime/shared_buffer.hpp"
#include "openvino/util/file_util.hpp"
#include "openvino/util/mmap_object.hpp"

namespace ov {

Expand Down Expand Up @@ -78,7 +80,7 @@ class ICacheManager {
* @param id Id of cache (hash of the model)
* @param reader Lambda function to be called when input stream is created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add new parameter to doxy comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is not new parameter anymore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter bool enable_mmap is still here (should it be removed?), sync doxy comment with function signature.

*/
virtual void read_cache_entry(const std::string& id, StreamReader reader) = 0;
virtual void read_cache_entry(const std::string& id, bool enable_mmap, StreamReader reader) = 0;

/**
* @brief Callback when OpenVINO intends to remove cache entry
Expand Down Expand Up @@ -129,13 +131,20 @@ class FileStorageCacheManager final : public ICacheManager {
writer(stream);
}

void read_cache_entry(const std::string& id, StreamReader reader) override {
void read_cache_entry(const std::string& id, bool enable_mmap, StreamReader reader) override {
// Fix the bug caused by pugixml, which may return unexpected results if the locale is different from "C".
ScopedLocale plocal_C(LC_ALL, "C");
auto blobFileName = getBlobFile(id);
if (ov::util::file_exists(blobFileName)) {
std::ifstream stream(blobFileName, std::ios_base::binary);
reader(stream);
if (enable_mmap) {
auto mmap = ov::load_mmap_object(blobFileName);
OwningSharedStreamBuffer buf(mmap->data(), mmap->size(), mmap);
std::istream stream(&buf);
reader(stream);
} else {
std::ifstream stream(blobFileName, std::ios_base::binary);
reader(stream);
}
}
}

Expand Down
72 changes: 38 additions & 34 deletions src/inference/src/dev/core_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1354,7 +1354,7 @@ bool ov::CoreImpl::device_supports_internal_property(const ov::Plugin& plugin, c
}

bool ov::CoreImpl::device_supports_model_caching(const ov::Plugin& plugin) const {
return plugin.supports_model_caching();
return plugin.supports_model_caching() == ov::Plugin::CachingMode::unsupported ? false : true;
}

bool ov::CoreImpl::device_supports_cache_dir(const ov::Plugin& plugin) const {
Expand Down Expand Up @@ -1401,48 +1401,52 @@ ov::SoPtr<ov::ICompiledModel> ov::CoreImpl::load_model_from_cache(
ov::Plugin& plugin,
const ov::AnyMap& config,
const ov::SoPtr<ov::IRemoteContext>& context,
std::function<ov::SoPtr<ov::ICompiledModel>()> compile_model_lambda) {
std::function<ov::SoPtr<ov::ICompiledModel>()> compile_model_lambda) const {
ov::SoPtr<ov::ICompiledModel> compiled_model;
struct HeaderException {};

OPENVINO_ASSERT(cacheContent.cacheManager != nullptr);
try {
cacheContent.cacheManager->read_cache_entry(cacheContent.blobId, [&](std::istream& networkStream) {
OV_ITT_SCOPE(FIRST_INFERENCE,
ov::itt::domains::LoadTime,
"Core::load_model_from_cache::ReadStreamAndImport");
try {
ov::CompiledBlobHeader header;
networkStream >> header;
if (header.get_file_info() != ov::ModelCache::calculate_file_info(cacheContent.modelPath)) {
// Original file is changed, don't use cache
OPENVINO_THROW("Original model file is changed");
}
if (util::contains(plugin.get_property(ov::internal::supported_properties),
ov::internal::compiled_model_runtime_properties_supported.name())) {
ov::AnyMap compiled_model_runtime_properties = {
{ov::internal::compiled_model_runtime_properties.name(),
std::string(header.get_runtime_info())}};
auto res = plugin.get_property(ov::internal::compiled_model_runtime_properties_supported.name(),
compiled_model_runtime_properties);
if (!res.as<bool>()) {
OPENVINO_THROW("Original model runtime properties have been changed, not supported anymore!");
cacheContent.cacheManager->read_cache_entry(
cacheContent.blobId,
coreConfig.get_enable_mmap() && plugin.supports_model_caching() == ov::Plugin::CachingMode::mmap,
[&](std::istream& networkStream) {
OV_ITT_SCOPE(FIRST_INFERENCE,
ov::itt::domains::LoadTime,
"Core::load_model_from_cache::ReadStreamAndImport");
try {
ov::CompiledBlobHeader header;
networkStream >> header;
if (header.get_file_info() != ov::ModelCache::calculate_file_info(cacheContent.modelPath)) {
// Original file is changed, don't use cache
OPENVINO_THROW("Original model file is changed");
}
} else {
if (header.get_openvino_version() != ov::get_openvino_version().buildNumber) {
// Build number mismatch, don't use this cache
OPENVINO_THROW("Version does not match");
if (util::contains(plugin.get_property(ov::internal::supported_properties),
ov::internal::compiled_model_runtime_properties_supported.name())) {
ov::AnyMap compiled_model_runtime_properties = {
{ov::internal::compiled_model_runtime_properties.name(),
std::string(header.get_runtime_info())}};
auto res = plugin.get_property(ov::internal::compiled_model_runtime_properties_supported.name(),
compiled_model_runtime_properties);
if (!res.as<bool>()) {
OPENVINO_THROW(
"Original model runtime properties have been changed, not supported anymore!");
}
} else {
if (header.get_openvino_version() != ov::get_openvino_version().buildNumber) {
// Build number mismatch, don't use this cache
OPENVINO_THROW("Version does not match");
}
}
} catch (...) {
throw HeaderException();
}
} catch (...) {
throw HeaderException();
}

ov::AnyMap update_config = config;
update_config[ov::loaded_from_cache.name()] = true;
compiled_model = context ? plugin.import_model(networkStream, context, update_config)
: plugin.import_model(networkStream, update_config);
});
ov::AnyMap update_config = config;
update_config[ov::loaded_from_cache.name()] = true;
compiled_model = context ? plugin.import_model(networkStream, context, update_config)
: plugin.import_model(networkStream, update_config);
});
} catch (const HeaderException&) {
// For these exceptions just remove old cache and set that import didn't work
cacheContent.cacheManager->remove_cache_entry(cacheContent.blobId);
Expand Down
4 changes: 2 additions & 2 deletions src/inference/src/dev/core_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -149,12 +149,12 @@ class CoreImpl : public ov::ICore, public std::enable_shared_from_this<ov::ICore
const ov::SoPtr<ov::IRemoteContext>& context,
const CacheContent& cacheContent) const;

static ov::SoPtr<ov::ICompiledModel> load_model_from_cache(
ov::SoPtr<ov::ICompiledModel> load_model_from_cache(
const CacheContent& cacheContent,
ov::Plugin& plugin,
const ov::AnyMap& config,
const ov::SoPtr<ov::IRemoteContext>& context,
std::function<ov::SoPtr<ov::ICompiledModel>()> compile_model_lambda);
std::function<ov::SoPtr<ov::ICompiledModel>()> compile_model_lambda) const;

bool device_supports_model_caching(const ov::Plugin& plugin) const;

Expand Down
4 changes: 4 additions & 0 deletions src/inference/src/dev/iplugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,10 @@ std::shared_ptr<ov::ICore> ov::IPlugin::get_core() const {
return m_core.lock();
}

bool ov::IPlugin::support_mmap_for_caching() const {
return false;
}

const std::shared_ptr<ov::threading::ExecutorManager>& ov::IPlugin::get_executor_manager() const {
return m_executor_manager;
}
Expand Down
14 changes: 9 additions & 5 deletions src/inference/src/dev/plugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,14 @@ ov::Any ov::Plugin::get_property(const std::string& name, const AnyMap& argument
return {m_ptr->get_property(name, arguments), {m_so}};
}

bool ov::Plugin::supports_model_caching() const {
bool supported(false);
supported = util::contains(get_property(ov::supported_properties), ov::device::capabilities) &&
util::contains(get_property(ov::device::capabilities), ov::device::capability::EXPORT_IMPORT) &&
util::contains(get_property(ov::internal::supported_properties), ov::internal::caching_properties);
ov::Plugin::CachingMode ov::Plugin::supports_model_caching() const {
ov::Plugin::CachingMode supported = ov::Plugin::CachingMode::unsupported;
if (util::contains(get_property(ov::supported_properties), ov::device::capabilities) &&
util::contains(get_property(ov::device::capabilities), ov::device::capability::EXPORT_IMPORT) &&
util::contains(get_property(ov::internal::supported_properties), ov::internal::caching_properties)) {
bool support_mmap = false;
OV_PLUGIN_CALL_STATEMENT(support_mmap = m_ptr->support_mmap_for_caching(););
supported = support_mmap ? ov::Plugin::CachingMode::mmap : ov::Plugin::CachingMode::legacy;
}
return supported;
}
8 changes: 7 additions & 1 deletion src/inference/src/dev/plugin.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,13 @@ class Plugin {
T get_property(const ov::Property<T, M>& property, const AnyMap& arguments) const {
return get_property(property.name(), arguments).template as<T>();
}
bool supports_model_caching() const;

enum class CachingMode {
legacy,
mmap,
unsupported
};
CachingMode supports_model_caching() const;
};

} // namespace ov
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ class Plugin : public ov::IPlugin {
const ov::AnyMap& properties) const override;
ov::SoPtr<ov::IRemoteContext> create_context(const ov::AnyMap& remote_properties) const override;
ov::SoPtr<ov::IRemoteContext> get_default_context(const ov::AnyMap& remote_properties) const override;
bool support_mmap_for_caching() const override;
};

} // namespace intel_gpu
Expand Down
4 changes: 4 additions & 0 deletions src/plugins/intel_gpu/src/plugin/plugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,10 @@ ov::SoPtr<ov::IRemoteContext> Plugin::get_default_context(const AnyMap& params)
return get_default_context(device_id);
}

bool Plugin::support_mmap_for_caching() const {
return true;
}

void Plugin::set_property(const ov::AnyMap &config) {
auto update_config = [](ExecutionConfig& config, const ov::AnyMap& user_config) {
config.set_user_property(user_config);
Expand Down
Loading