-
Notifications
You must be signed in to change notification settings - Fork 419
Conversation
metalog: * metalog.info('foo', {...}) for progress recording -- sent to log by default * metalog.minor('foo', {...}) for verbose recording -- sent nowhere by default * metalog.event('foo', {...}) cubifies the event and sends it to info - metalog.event('foo', {...}, 'silent') to cubify but not log * retargetable: - metalog.loggers.minor = metalog.log to log minor events - metalog.loggers.info = metalog.silent to quash logging also separated db test helpers into own file and added fixture ability.
* authentication.authenticator(name) gives you the requested authenticator - server.js uses options['authenticator'] - 'allow_all' is default in bin/collector-config.js etc. * authenticator.check(request, auth_ok, auth_no) - calls auth_ok if authenticated, auth_no if rejected - staples an 'authorized' member to the request object: - eg we use 'request.authorized.admin' to govern board editing - in the mongo_cookie authenticator, it's the user record * mongo_cookie authenticator compares bcrypted cookie to a stored hashed secret - you must set the cookie and store that db record in your host client; see 'test/authenticator-test.js' for format. Rails+devise snippet available on request.
* pr2_authentication: Added authentication: allow_all, read_only, or mongo_cookie
* marsup/mongodb-native-parser: Upgrade mongodb driver and use native BSON parser Conflicts: package.json
moved test.js to test_helper.js, renamed it in use.
Some mixture of these changes and the upgraded mongo npm package seem to have made the dancing sometimes-failures in test/metrics-test.js go away. Untangled and commented the code, so if you are still seeing those bugs here maybe is a better stepping-off point.
…ting server. * test_helper.with_server -- starts the server, run tests once server starts, stops server when tests are done * merged test/test_db.js into test/test_helper.js, documentated it good.
…inct, all tiers cascade to tensec, horizons)
For the record I still experience partially unresponsive evaluator if I ask for data past horizon, I think the computation never gets done because start and end date are equal, so any further request on the same collection will stall. Another thing I noticed is pretty poor performance on grouped queries, a mongodb distinct on several million un-indexed rows is supposed to take a while but still... |
I hear that. I'm running this with personal data but I haven't thrown |
Have you given some thoughts on my new configuration proposal ? |
@RandomEtc After running the service for a while, I was forced to remove a "feature" infochimps added. Cascading cache to the lowest tiers is a very bad idea. It sucks the MongoDB storage like hell, way too expensive for insignificant benefit considering a few seconds/minutes is not that long to re-compute, so I came back to the way things were in the current cube release. Might want to consider this before doing a release... |
I understand your feedback and that's definitely a valid concern, but want to clarify a little. First, as far as speed, a response time of seconds to minutes per query, given 20 to 30 queries on a page, wasn't acceptable for our UI requirements. So, stored metrics did offer some speed improvements. Although it was an added bonus, our intent was not to cache calculations for speed, but to store data. Hopefully I can offer some insights on why we did it that way. For our use, event data vastly outsized metric data, so we purposefully capped the events collection to make event records fall out. To preserve the data contained in those events, we saved the metrics at the lowest tier. With the lowest tier, we could build back up a higher tier metric, like 5 minute or 1 hour, using those low tier 10 second metrics. We stored our permanent data in metrics with ephemeral events, as opposed to the previous situation of permanent events with ephemeral metric caches. So assuming that one has large event record data sizes with many events per 10 second tier, storing only the metrics should use much less data. In our use case, it meant we were able to roll up thousands of multiple kB events into a handful of small, sub kB sized metrics per query. We also had a separate "cleaner" cron job to remove metrics older than a day, or so, to keep the data size down. We wrote our version with the intent of optimizing for high throughput while keeping a small, unsharded mongo. Storing metrics actually ended up being significantly more storage efficient for us. I can see how for other data shapes / use cases, storing all metrics may not make sense. We definitely cut a couple corners to fulfill our use case because not everyone wants to predefine queries, drop events, and handle dense data. We lost some of the flexibility offered by cube in order to meet our needs. It sounds like our version didn't fit your use case. I'm glad you were able to change it to better meet your needs. I hope that cleared up our intentions. If not, I'm happy to clarify further. |
Hello Houston ! First, reading it back, my previous comment seems more accusing that it was meant to be, so sorry for that. Now I can see why you would do that, in my case I keep everything, no capped collection at all, and I also have many events for any given time, our situation should be similar, so you might understand my pain seeing metrics grow horribly fast :) The difference might be I'm on a sharded mongo, with many evaluators to answer queries at the same time, and we mostly stream metrics (which is a difference of our fork) so full time ranges are not queried that often. Your version definitely improved many things and I'm grateful for that, I'm just worried such a default setting would disappoint newcomers as it fills up several GB for only a few days/weeks of metrics. I think ideally this cascading aspect should be configurable, but that'll be the subject of another pull request ;) Anyway it's nice to see you're still following things here ! |
Thanks both for keeping the discussion going. I'm sorry I've been silent on most matters. I haven't had a chance to try this new branch on real data so I'm hesitant to express too strong of an opinion about it. I'm open to landing it as a 0.3-pre branch here so there's a clearer target for new contributions/optimizations/docs. What do you think? |
Well, this thread has lasted long enough I think :) You have raised many concerns along the way so I would say do that branch and close this pull request, but let's not forget anything here, and maybe create a bunch of separate issues to track every doubt/task that needs to be dealt with before final release. |
Sounds like a plan. Thanks for your help triaging issues! |
when is it planned to be merged? |
This merge is (in my opinion) very important, why hasn't it been merged yet? |
Hi @jeffhuys - this branch is stalled mainly because I ran out of time/bandwidth for the project. But also because we only have one person (@Marsup) who has run this code to date. I'd like to run it before I merge it but I haven't had time. If you've followed our discussion above, and the related issue where I asked for community input into the merge, you can see I was very optimistic about bringing all the Infochimps changes into the Square cube repo, but the actual process of doing this was a lot more complex and time consuming than I imagined. We don't run Cube in an official/production capacity at Square any more, so it's more or less a volunteer side-project for me. Major apologies to @Marsup for not using this integration work yet. Next step remains setting up a new branch here for this work, and getting a few more people to try it out for their use-case. I'll try to get that done soon, including trying it out at Square. Until then, please comment on this thread if you've run @Marsup's version and let us know how it goes. |
Hi, Just as a sidenote the revert to plain js object for the config in that branch made my live a lot easier. |
I'm confident as well, I had this branch running in production since September (with a few modifications since then as the commit history will tell you), but beware it doesn't only contain infochimps modifications, so not everything is documented as it should be. I'm also glad I came to reason for the config, imposing cfg in cube's core was not very clever, even though it's a very nice module and I still use it for my cube runners. |
I'm going to test this branch with 150K events/day @Marsup can you tell me what is the job of the "warmer" and is there any way to use only one config, instead of three, which are almost the same? |
I'm not the one who conceived it so I'll try to describe it as best as I know it, I'm not using it either. As for configuration, nope afraid not, you can still use cube as a module and do the slight variations programmatically. |
Hello. I've been using the current Cube version for some time. Great work and thanks for open sourcing this project. Question 1: What's the level of confidence and timeline that the InfoChimps branch will be merged? I need to add some additional features in our project (authentication). Since this contains a pluggable authentication system, it makes sense for me to go ahead and use what's here if it will be mainlined. Looks like there's been a lot of energy put into this branch, but it's long-running and hard for me to judge what's going on from the outside looking in. :) Question 2: How can I override the configuration when using Cube as a library now? Looks like this line will always include the configuration file from Cube. Maybe I'm missing some Thanks. |
@ticean my intention was to make a merge branch and update our readme to encourage people to try it out. Unfortunately Cube has become less and less of my day-job here at Square and since starting this merge, despite the heroic efforts of @Marsup (thank you!) I haven't carved out the time to make much progress. Also since we started this project infochimps was acquired, so I suspect they haven't been able to give it the attention they wanted either. I still have this on a TODO list, and hope to get to it one day soon, though I realize we are very likely to be losing goodwill and attention by letting this branch linger. Enough excuses... For using cube as a library, here's an example collector script that we have in our internal cube repo:
The require for
Hope that gets you started. If you have a chance to checkout @Marsup's branch please do, any feedback on that will help others work out which version to use. Until then, now we have 3 versions... |
Hi @RandomEtc. Thanks for the help and the quick reply! You helped me realize that I was testing with the wrong branch. This PR is based on Marsup:infochimps-merge but I'd mistakenly branched square:infochimps-merge for testing. So my bad there. 😁 Now using @Marsup's branch, I'm able to override the config like I need to (that wasn't possible in square:infochimps-merge) I had some problems with the horizon feature not returning results when the request is "past_horizon". I see a metalogging output, but the server doesn't return a response and hangs. Don't think I'm interested in this feature anyway, so I was able to work around by removing the horizon configuration. This disables the feature. Some docs about horizons would be helpful, but you've already mentioned this a few times in the thread so I know you know that. :) Things are otherwise working well now that I'm using this branch. I'll keep testing and let you know if anything else comes up. |
Ok, after more hands-on time with this code I found some issues.
|
Beware that full-merge is not the exact same thing as this pull request, I've piled up other modifications for my own needs. |
Hey guys, can someone explain the status of this merge for those of us wanting to start using cube with all of the infochimps work merged in? What would be the best place to start? Clone this branch and go from there? Use the Marsup fork? GitHub is kind of useless right now in situations where the repo network gets complicated like this one :( |
I am declaring Cube-maintenance-failure for myself. I have updated the README here to indicate that nobody at Square is actively developing or maintaining Cube. Since I have failed to make progress on this branch I encourage people to help @Marsup with his integration branch and fork if you have any new features or bug fixes. I will be closing all issues here in a moment. |
For the record, I'm up and running with the @Marsup branch. Working great so far. |
As discussed in #123, here is a full merge from infochimps-labs/cube@master to the current master.
Considering the size of the merge, I hope to involve more people into the review so that we don't miss any critical part, even though the tests continue to pass (minus the one already failing).
I hope especially people from @infochimps-labs can give their feedback since they know their codebase much more than I do.