Skip to content
rjrudin edited this page Aug 14, 2017 · 32 revisions

Version 3.0.0 features several new tasks for leveraging the new Data Movement SDK in version 4 of the MarkLogic Java Client API.

The problem

The goal of these new tasks is to solve a common problem - you need to perform some kind of update operation on tens of thousands (perhaps even just thousands) of documents or more, and the operation times out in qconsole.

So you then either break the operation up and run it multiple times in qconsole, or you create a new CoRB job for an ad hoc operation that you may never perform again. CoRB is an important tool on MarkLogic projects for running transforms, particularly as part of a deployment (like migrating your data as part of a new release). But for an ad hoc task like removing one million documents from a particular collection, it'd be much simpler if you could run a single command without having to create/deploy new modules that you may never use again, and of course if you don't have to run the same thing over and over again in qconsole.

The solution

ml-gradle now provides a better solution to this common problem by using DMSDK to perform all the updates, thus scaling to any number of documents, and with a simple command line interface. The tasks in 3.0.0 are focused on common update operations on document collections and permissions, along with using collections and URI patterns to select the documents to update. But there's also support for easily creating your own tasks that use DMSDK to perform any kind of update based on any set of documents.

So while you'll almost certainly keep using CoRB and Gradle together for large transforms that either need to be repeated often and/or benefit from being able to write custom code, you can use these new DMSDK-based Gradle tasks for simple operations that don't need custom code and can be knocked out quickly via the command line and a few parameters.

Trying it out

To see all the new tasks, just run the following:

gradle tasks

And look for the new "Data Movement Tasks" group.

Here are a few examples to give you an idea of how the tasks work.

Let's say we have 1 million documents in a collection named "red". We can easily add those to another collection - note how "sourceCollections" defines the comma-separated set of collections of documents we want to modify, and "collections" defines the comma-separated collections we want to add to each selected document:

gradle mlAddCollections -Pcollections=blue -PsourceCollections=red

We can also explicitly set all the collections too:

gradle mlSetCollections -Pcollections=red,blue,green -PsourceCollections=red

And then remove collections to get back to our original state:

gradle mlRemoveCollections -Pcollections=blue,green -PsourceCollections=red

We can also select documents via a URI pattern (which is processed under the hood by cts:uri-match):

gradle mlAddCollections -Pcollections=xmlDocuments -PuriPattern=**.xml

And just like collections, we can set permissions too, using the common "role,capability,role,capability" syntax for specifying permissions:

gradle mlAddPermissions -Ppermissions=rest-reader,rest,rest-writer,update -PuriPattern=**.json

And as you probably expect now, you can use mlRemovePermissions and mlSetPermissions to remove and set document permissions too.

And of course, sometimes you just need to delete entire collections that contain tens of millions of documents - no problem now:

gradle mlDeleteCollections -Pcollections=red,blue
Clone this wiki locally