Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recording triggerable snapshot design proposal #755

Closed

Conversation

kielczykowski
Copy link

Description

As suggested in #663, this PR contains a design proposal for rosbag2 triggerable snapshot feature implementation.

For now I found multiple inconveniences with design decision making and I'm looking for a feedback and tips on what are the expectations for this feature design.

@kielczykowski kielczykowski requested a review from a team as a code owner April 23, 2021 10:08
@kielczykowski kielczykowski requested review from emersonknapp and piraka9011 and removed request for a team April 23, 2021 10:08

### Outside rosbag2

First approach implements `ros2 bag snapshot` as a different package. Mostly it requires rewriting `Recorder` node completly. Completly means in this case diving into implementation of `Writer`, `SequentialWriter` and `BaseWriterInterface` to copy and adjust writing to disk algorithms.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this makes sense, and I would honestly remove it from the proposal - you might add a section at the end called "alternate approaches considered" - but top-level I like to see a design being a single proposal of what should be implemented, not a selection of choices.

I think everyone would agree that this is a useful feature for rosbag2 to do - somebody could implement something in a separate repository/package without doing any design here - I think our goal is to determine a design that does fit into rosbag2 directly.

Second approach would base on making changes to `Recorder`, writers and their interfaces. This approach bases on adding service to `Recorder` which handles triggering buffer save. Although current design of writers bases on `BaseWriterInterface` which would need to be changed to enable some kind of `save()/flush()` functionality. I believe this is not an elegant approach but I haven't find other way to trigger buffer save from `Recorder`, through `Writer` to `SequentialWriter`. For sure it requires implementing new `SnapshotWriter` since it needs to take care of buffering the messages from subscribers instead of passing them straight to `MessageCache` and `CacheConsumer`.

Pros:
* Maintaining snapshot inside `ros2 bag snapshot ...` (or even `ros2 bag record snapshot ...`) and `Recorder` functionality
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this makes sense as a separate verb. I would think that the circular-buffer + snapshot approach should be considered a behavioral modification of recording, just like rosbag2 play --loop

I'd like to see a specific section for "User Interface" - how do I interact with this feature (without knowing how it is built)?

I think maybe something like the following:

# Recording is invoked with this feature via an extra option to "record"
ros2 bag record --buffer-snapshot

# Snapshot is exposed as a Service on the Recorder node (so it can be triggered remotely)
ros2 service call /rosbag2_recorder rosbag2_interfaces/Snapshot

# C++ API
* RecordingOptions has a new value "buffer_snapshot"
* Recorder has a new API `Recorder::snapshot`


### Custom `Recorder` implementation

Last but not least it is possible to write new alternative to `Recorder` node inside `rosbag2_transport` package. `Snapshoter` would implement functionalities of subscribing the messages, buffering them at high level node and most probably passing data straight to `MessageCache` and `CacheConsumer`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the Recorder actually won't know much about this behavior. The SequentialWriter has the message_cache_ and is the entity that will probably implement this behavior. That's the sort of proposal I'm looking for here, like "what owns what functionality, logically". The Recorder itself I think should just pass through configuration, its responsibility is the transport layer, e.g. Subscriptions, and the Writer should handle behavior around writing.

@emersonknapp
Copy link
Collaborator

emersonknapp commented Apr 23, 2021

Thanks for taking this on!

Hopefully the above comments help clarify the goal - overall I'd structure it like:

  1. What's the feature, what does it do?
  2. How does a user interact with it? CLI, ros communications, language APIs
  3. What are the major components involved and what portion of the functionality do they own
  4. (optional) alternative high level approaches considered and why they were ruled out

Hashing out the above things via this document makes it feel to me like there won't be any big surprises or rabbit-holes in the implementation/review stage.

@kielczykowski
Copy link
Author

After long time I found some time to design the feature in more details. Still haven't figured out some of the concerns, but I believe code-like documentation will help dispel them.

I am waiting for more suggestions and opinions.

@kielczykowski kielczykowski marked this pull request as draft May 25, 2021 20:56
@kielczykowski
Copy link
Author

It feels like at this point of development a validation of my idea would be helpful.
@emersonknapp could you take a look at current design and share your thoughts about it?

@kielczykowski kielczykowski marked this pull request as ready for review June 2, 2021 09:02
@kielczykowski
Copy link
Author

Managed to get some community review, but would love maintainers feedback (@emersonknapp @Karsten1987)

Copy link
Collaborator

@emersonknapp emersonknapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for going through the iterations, I appreciate the effort. I think these design reviews are valuable because they help clarify what features are useful and what interfaces make sense. However, I can see that you're inclined to get into code, which I understand. I would be willing to review a PR that starts with just the API changes, so that we can review the interfaces first, like we are doing here.

My main comment across this review is:

We should try to keep the interface as simple as possible. We already have --max-cache-size argument, which could be used to implement the first version of this feature. I would rather review a simplified feature that does one thing well than everything you might want it to do. It seems likely that we could maintain bag splitting and everything, just changing the MessageCache to have a different implementation, so that it doesn't provide a queue to the Consumer/Writer until triggered, rather than the current implementation where it provides the buffer once it is full.


## In-depth structure changes

```diff
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if I accidentally implied that you should do this diff thing - but please remove the entire diff section. The design review is not the place to put actual code - this will be out of date as soon as we actually implement it. The design should be the place for high level concepts that don't get out of date.

If you would like to open a separate PR with all these changes actually implementing this feature, that would be great! We can link them together in the descriptions as companions of each other.

* Save data to rosbag on trigger
* Enable triggering buffer save to disk with service call

* Pause and resume buffering process
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this feature very well - my understanding for the intention of this feature is:

  • rosbag2 is always keeping a circular buffer of the latest information - but not writing anything to disk
  • when we trigger a snapshot - dump the entire buffer to disk

Why does a user want to pause buffering? This sounds to me like it makes rosbag2 just drop data, such that you wouldn't be able to take a snapshot of it.


| parameter name | description | behavior |
| -------------- | ------------- | --------- |
| buffer size | maximum memory per topic | <ul><li> defined by new `--max-buffer-size` argument, default value is 0 [bytes] </li><li>if not set/set to non-positive value only duration parameter will be relevant (*note: either buffer size or duration must be specified*) </li></ul> |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would "buffer size" be different than the "cache size" that is currently used for double-buffered writing? It seems to me that we already have a configuration for an in-memory buffer of data. When snapshot is enabled, we would use this buffer circularly rather than triggering a write-to-disk when it reaches the configured size.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, why would this be per-topic. It seems better to use this as a total value so that the user has a guarantee of maximum memory usage. Regardless, we don't currently have per-topic caches, so that would require a rewrite of the caching system.

| parameter name | description | behavior |
| -------------- | ------------- | --------- |
| buffer size | maximum memory per topic | <ul><li> defined by new `--max-buffer-size` argument, default value is 0 [bytes] </li><li>if not set/set to non-positive value only duration parameter will be relevant (*note: either buffer size or duration must be specified*) </li></ul> |
| duration | maximum time difference between newest and oldest message | <ul><li> defined by new `--max-buffer-duration` argument, default value is 10.0 [s] </li><li> if not set/set to non-positive value only buffer size parameter will be relevant (*note: either buffer size or duration must be specified*) </li></ul> |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment - I think we have a feature request for a maximum duration of the write cache, this should probably be the same parameter, to minimize the number of parameters.

ros2 service call /rosbag2_recorder/snapshot/duration rosbag2_interfaces/srv/SetDuration "{duration: 50.0}"

# Clear snapshot buffer
ros2 service call /rosbag2_recorder/snapshot/clear_buffer rosbag2_interfaces/srv/ClearBuffer "{ }"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this feature (and it doesn't seem to be mentioned in the introduction) - a snapshot write should clear the buffers, but why would we want to do it without a write?


#### TODO

* Deleting empty rosbag after node closure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a separate feature that already has an issue open, you don't need to worry about it for this design.


* Deleting empty rosbag after node closure
* Resolve how to implement pause/resume functionality
* Think about configuration change during runtime
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this feature can be intentionally excluded from this design. Maybe add an "Out of scope" section near the top, something like

## Out of scope

* Reconfiguration of recording settings during runtime

@danthony06
Copy link

danthony06 commented Jun 15, 2021

I'll throw in my two cents here, I think my comments are high level enough that I'm not going to insert them into the review process.

  • I really like the overall idea. I have not seen the rosbag_snapshot package before, but I wish I had, because the functionality looks very useful, and ROS 2 will benefit from the same type of system.
  • The ROS 1 implementation used a separate node that cached all of the requested topics, and then wrote them out on command to a bag file using the ROS 1 bag interface. Looking through the comments here, I can see why. Fully implementing the functionality adds a lot of configurable options to the command line interface, and added a good amount of code to the rosbag codebase. I see why moving this feature into a separate node is attractive, because the maintenance burden could be high.
  • With that being said, does the rosbag2 code offer the functionality to reimplement this functionality? Very crude pseudocode would be something like:
bag = rosbag2.open(file_path)
for message in message_cache:
    bag.write(message)
bag.close
  • The main advantage that I can possibly see in embedding this functionality in the main rosbag2 codebase is re-using the message caches to reduce the memory footprint of the system. If done cleverly enough, we could avoid duplicating the cache of snapshot messages and the messages to be written out to a bag file, but this could get annoyingly complicated when things like splitting bag files, limiting bag file sizes, and running a snapshot while bag recording is paused, are considered.

@emersonknapp
Copy link
Collaborator

The main advantage that I can possibly see in embedding this functionality in the main rosbag2 codebase is re-using the message caches to reduce the memory footprint of the system. If done cleverly enough, we could avoid duplicating the cache of snapshot messages and the messages to be written out to a bag file, but this could get annoyingly complicated when things like splitting bag files, limiting bag file sizes, and running a snapshot while bag recording is paused, are considered.

The main point I am trying to make is that this shouldn't be annoyingly complicated. All of the functionality to do splitting, bag sizes, etc is already handled by the Writer functionality - and there should be a strong separation of concerns between the Writer and the in-memory MessageCache. Instead of trying to mix it all up, you should be able to get the advantage of all the Writer functionality without any changes whatsoever, by providing a different CircularTriggeredMessageCache that provides a different way to talk to the writer.

I think that strong separation of concerns makes for much more simple implementations - I think we can manage that here.

@emersonknapp
Copy link
Collaborator

Closing in favor of #837

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants