Skip to content

Configuration Options

Shane Harvey edited this page Apr 10, 2017 · 20 revisions

You can use a custom configuration file to specify some options to mongo-connector.

This page details all the options that can be specified in Mongo Connector's configuration file. You can also look at an example. Taking a look at the tests also might be helpful to understand configuration options.

Mongo Connector uses JSON as the format for its configuration file. We'll use MongoDB "dot-notation" for the configuration option names themselves. For example, we'll use the name authentication.password to mean:

{"authentication": {"password": XXX}}

Please note that any option that starts with __ will be ignored. For example,

"namespaces": {
  "__include": ["test.talks"]
},

Will have the __include option ignored.

You can tell mongo-connector what configuration file to use via the -c option (this will also be shown with --help). To invoke mongo-connector with a configuration file option, run:

mongo-connector -c config.json

(presuming your configuration file is called config.json and it is on the same directory that you are invoking mongo-connector)

Comment Syntax

Although JSON itself doesn't provide a syntax for comments, Mongo Connector allows its JSON configuration file to have comments, which are defined as any key in an object that is prefixed by 2 underscores (_). For example:

{
    "__comment": "this is a comment"
}

Global Configuration Options

mainAddress

Command-line equivalent: -m, --main

Default: localhost:27017

The address of the replica set or sharded cluster from which to replicate. This may be any MongoDB connection string.

oplogFile

Command-line equivalent: -o, --oplog-ts

Default: oplog.timestamp

The path to the oplog progress file. Note: backslashes must be escaped, eg "C:\\path\\to\\oplog.timestamp".

noDump

Command-line equivalent: --no-dump

Default: false

Do not dump collections from MongoDB to the remote system prior to tailing the MongoDB oplog. With this option, mongo-connector starts tailing the oplog from the oldest entry in the oplog.

batchSize

Command-line equivalent: --batch-size

Default: -1

Number of records processed from the oplog before updating the timestamp file.

verbosity

Command-line equivalent: -v, --verbose

Default: 1 (only output warnings and errors)

The verbosity of Mongo Connector. Note that the command-line option only turns on/off debug-level logging. In the config file, verbosity may be set according to the following table:

Verbosity Log Level
0 ERROR
1 WARNING
2 INFO
3 DEBUG

continueOnError

Command-line equivalent: --continue-on-error

Default: false

Whether to continue tailing the oplog after an error occurred while dumping a collection. This doesn't affect the connector's behavior while already tailing the oplog.

fields

Command-line equivalent: -i, --fields

Default: all fields

Comma-separated list of fields to read from MongoDB documents. This option can be used to select just a few fields out of every document. Note that the _id field, and the ns and _ts fields for Solr, will always be included. This option is mutually exclusive with the exclude_fields option.

exclude_fields

Command-line equivalent: -e, --exclude_fields

Default: empty

Comma-separated list of fields to exclude from MongoDB documents. This option can be used to select just a few fields out of every document. Note that the _id field, and the ns and _ts fields for Solr, will always be included. This option is mutually exclusive with the fields option.

timezoneAware

Command-line equivalent: --tz-aware

Default: false

Whether Dates read from MongoDB should be timezone-aware.

Configure Logging

logging.type

Command-line equivalents: --logfile, -s, --enable-syslog

Default: file

Where to direct Mongo Connector logs. This may be one of "file", "syslog", or "stream".

logging.filename

Command-line equivalent: --logfile

Default: mongo-connector.log

The path to Mongo Connector's log file. This option only applies if logging.type is "file". Note: backslashes must be escaped, eg "C:\\path\\to\\mongo-connector.log".

logging.rotationWhen

Command-line equivalent: --logfile-when

Default: midnight

The type of period defining when Mongo Connector should rotate its log file. This must be one of:

  • S (second)
  • M (minute)
  • H (hour)
  • D (day)
  • W0 - W6 (days of the week, numbered 0 - 6)
  • midnight

For more details, see the Python documentation for TimedRotatingFileHandler

This option only applies if logging.type is "file".

logging.rotationInterval

Command-line equivalent: --logfile-interval

Default: 1

How frequently the log file should be rotated. Specifically, how many units of logging.rotationWhen should occur before rotation. This option cannot be used if logging.rotationWhen is any of W0 - W6.

For more details, see the Python documentation for TimedRotatingFileHandler

This option only applies if logging.type is "file".

logging.rotationBackups

Command-line equivalent: --logfile-backups

Default: 7

How many rotated log files to keep around.

This option only applies if logging.type is "file".

logging.host

Command-line equivalent: --syslog-host

Default: localhost:512

Address of the syslog. This can include a host and port like "localhost:512" or, on Unix/Linux, be a Unix domain socket such as "/dev/log".

This option only applies if logging.type is "syslog".

logging.facility

Command-line equivalent: --syslog-facility

Default: user

The syslog facility to use.

This option only applies if logging.type is "syslog".

Configure Authentication

authentication.adminUsername

Command-line equivalent: -a, --admin-username

Default: (no default)

The username that Mongo Connector should use to log into MongoDB.

authentication.password

Command-line equivalent: -p, --password

Default: (no default)

The password for authentication.adminUsername. This option cannot be used with authentication.passwordFile.

authentication.passwordFile

Command-line equivalent: -f, --password-file

Default: (no default)

A path to a file that contains the password for authentication.adminUsername. This option cannot be used with authentication.password.

Configure SSL

ssl.sslCertfile

Command-line equivalent: --ssl-certfile

Default: (no default)

A path to the SSL certificate that Mongo Connector should use to identify the local connection to MongoDB.

ssl.sslKeyfile

Command-line equivalent: --ssl-keyfile

Default: (no default)

A path to the private key for ssl.sslCertfile. This option isn't necessary if ssl.sslCertfile already has the private key included.

ssl.sslCertificatePolicy

Command-line equivalent: --ssl-certificate-policy

Default: ignored

Policy for validating SSL certificates provided from the other end of the connection (i.e., to MongoDB). Must be one of:

  • required - Require and validate the remote certificate.
  • optional - The same as required, unless the server was configured to use anonymous ciphers.
  • ignored - Remote SSL certificates are ignored completely.

Configure Namespaces

namespaces

Default: Include all namespaces except system and GridFS collections.

NEW in 2.5.0: The namespaces configuration option is used to control how and which MongoDB namespaces are replicated. By default, Mongo Connector will replicate all namespaces except for system and GridFS collections. Namespaces should be given as database_name.collection_name. Each namespace may contain a single wildcard (*) which matches any characters. For example, db_*.foo matches db_bar.foo and db_a.foo.

Excluding Namespaces

Command-line equivalent: -x, --exclude-namespace-set

To prevent replication of a set of namespaces, add "db.collection": false to the "namespaces" config object.

Example:

{
  "namespaces": {
    "db.excluded_collection": false,
    "excluded_database.*": false,
    "*.exclude_collection_from_every_database": false,
  }
}

Command line: -x 'db.excluded_collection,excluded_database.*,*.exclude_collection_from_every_database'

Including Namespaces

Command-line equivalent: -n, --namespace-set

To replicate only a specific set of namespaces, add "db.collection": true, "db.collection": "db.collection", or "db.collection": {} to the "namespaces" config object. Included namespaces support additional options such as renaming, GridFS, and filtering fields in documents.

Config file usage:

{
  "namespaces": {
    "db.included_collection1": true,
    "db.included_collection2": {},
    "included_wildcard_db.*": true
  }
}

Command line usage: -n 'db.included_collection1,db.included_collection2,included_wildcard_db.*'

Renaming Namespaces

To rename a namespace, add "db.collection": "db.new_collection" or "db.collection": {"rename": "db.new_collection"}. By default, no renaming will occur. Renaming works with wildcard (*) namespaces with the following limitation: if the source namespace contains a wildcard in the collection name, then the destination must also contain a wildcard in the collection name. The same is true for a wildcard in a database name.

Renamed namespaces can also specify fields to include or exclude.

Note: mongo-connector 2.5.0 does not support renaming GridFS collections.

Config file usage:

{
  "namespaces": {
    "renamed_database.collection1": "new_database.new_collection1",
    "renamed_database.collection2": {
      "rename": "new_database.new_collection2"
    },
    "renamed_wildcard_db.*": {
      "rename": "new_database_name.*"
    }
  }
}

Note: when replicating to Elasticsearch, the MongoDB database name, which will become the Elasticsearch index name, is always made lowercase.

GridFS Namespaces

Command-line equivalent: --gridfs-set

GridFS collections are not replicated by default. To include a GridFS collection, add "gridfs": true to the options for that namespace. For example, if GridFS metadata is stored in the test.fs.files collection, and chunks are stored in the test.fs.chunks collection, add "test.fs": {"gridfs": true}. To include all GridFS collections in the test database, add "test.*": {"gridfs": true}.

Config file usage:

{
  "namespaces": {
    "gridfs_db.collection": {"gridfs": true},
    "gridfs_wildcard_db.*": {"gridfs": true}
  }
}

Command line usage: --gridfs-set 'gridfs_db.collection,gridfs_wildcard_db.*'

Filtering Documents per Namespace

Command-line equivalent: None

By default, all fields in all documents are replicated in each included namespace. The "includeFields" and "excludeFields" can be used to limit the fields per namespace. To include only a specific set of fields in a namespace, add "includeFields": <list of fields to include> to the options. To exclude only a specific set of fields in a namespace, add "excludeFields": <list of fields to exclude> to the options.

Note: the _id field will always be included. It is not possible to both include and exclude fields on the same namespace.

Note: mongo-connector does not support filtering fields inside arrays, you can only include or exclude the entire array field.

Config file usage:

{
  "namespaces": {
    "db.included_collection": true,
    "db.filtered_collection1": {
      "includeFields": ["included_field", "included.nested.field"]
    },
    "db.filtered_collection2": {
      "excludeFields": ["excluded_field", "excluded.nested.field"]
    },
    "filtered_database.*": {
      "includeFields": ["included_field", "included.nested.field"]
    },
    "filtered_renamed_database.*": {
      "rename": "new_filtered_database.*",
      "includeFields": ["included_field", "included.nested.field"]
    }
  }
}

Configure Namespaces pre 2.5.0

namespaces.include

Command-line equivalent: -n, --namespace-set

Default: all namespaces

DEPREPCATED in 2.5.0: List of collections to read from MongoDB. Collection names should be given as database_name.collection_name. By default, Mongo Connector will replicate all namespaces except for system and GridFS collections.

Usage Examples: -n test.test,alpha.bar,db_1.foo on the command line or ["test.test", "alpha.bar", "db_1.foo"] in a config file.

namespaces.exclude

Command-line equivalent: -x, --exclude-namespace-set

Default: no namespaces

DEPREPCATED in 2.5.0: List of collections to not read from MongoDB. Collection names should be given as database_name.collection_name. By default, Mongo Connector will not exclude any name.

Usage Examples: -x test.test,alpha.bar,db_1.foo on the command line or ["test.test", "alpha.bar", "db_1.foo"] in a config file.

namespaces.mapping

Command-line equivalent: -g, --dest-namespace-set

Default: no mapping

DEPREPCATED in 2.5.0: Comma-separated list of new names to use for each collection. Each namespace provided in namespaces.include will be renamed respectively at the destination according to this list. This option may only be used with namespaces.include, and both options must include the same number of names. By default, no renaming will occur. For example:

{
  "namespaces": {
    "include": ["company.employees"],
    "mapping": {
      "company.employees": "company.new_employees"
    }
  }
}

Command line usage: -n company.employees -g company.new_employees

The company.employees collection from MongoDB, will be renamed and sent to the target system as company.new_employees instead.

Note that when replicating to Elasticsearch, the MongoDB database name, which will become the Elasticsearch index name, is always made lowercase.

namespaces.gridfs

Command-line equivalent: --gridfs-set

Default: empty

DEPREPCATED in 2.5.0: Comma-separated list of GridFS root collections. For example, if GridFS metadata is stored in the test.fs.files collection, and chunks are stored in the test.fs.chunks collection, pass test.fs as the namespace.

Configure DocManagers

Mongo Connector may use more than one DocManager at a time to support replicating to more than one location simultaneously. An array of DocManagers should be provided, even if that array only contains one DocManager configuration. Here we use <index> in the configuration key name to mean "at any index within the array". For example, docManagers.0.docManager means:

{"docManagers": [{"docManager": XXX}]}

docManagers.<index>.docManager

Command-line equivalent: -d, --doc-manager

Default: doc_manager_simulator

Module name of the DocManager to use. Included in Mongo Connector are mongo_doc_manager, solr_doc_manager, and doc_manager_simulator. To write your own DocManager, see Writing Your Own DocManager.

The elastic_doc_manager is included in mongo-connector versions < 2.3, and only supports Elastic 1.x. For mongo-connector versions >= 2.3, doc managers for Elastic 1.x and 2.x are available as plugins.

Elastic 1.x doc manager: https://github.com/mongodb-labs/elastic-doc-manager

Elastic 2.x doc manager: https://github.com/mongodb-labs/elastic2-doc-manager

docManagers.<index>.targetURL

Command-line equivalent: -t, --target-url

Default: (no default)

URL to pass to the DocManager. For example, this should point to the base REST endpoint for a Solr core, or should be a MongoDB connection string, or the base REST endpoint for Elasticsearch.

docManagers.<index>.uniqueKey

Command-line equivalent: -u, --unique-key

_Default: id

What to call the _id field from the MongoDB document in the target system. This is useful for certain systems that call their primary key something else (e.g., Solr uses id instead) or when the primary key field is configurable (e.g., Elasticsearch's _id path mapping).

docManagers.<index>.autoCommitInterval

Command-line equivalent: --auto-commit-interval

Default: no auto commit

Interval in seconds between when the DocManager forces the end system to flush changes. This doesn't apply to every system.

docManagers.<index>.bulkSize

Command-line equivalent: (none)

Default: 1000

The number of documents that are sent in a single batch to the remote system.

docManagers.<index>.args

Command-line equivalent: (none)

Default: (no default)

Any arbitrary keyword arguments to pass to the constructor of the DocManager. What arguments can be passed should be documented by the author of the DocManager.