Skip to content
Luke Lovett edited this page Sep 13, 2016 · 26 revisions

This document encompasses many of the frequently asked questions (FAQs) about Mongo Connector.

How do I re-sync all data from scratch?

  1. Stop mongo-connector.
  2. Delete the oplog progress file.
  3. Restart mongo-connector.

What versions of MongoDB are supported by Mongo Connector?

mongo-connector is compatible with MongoDB >= 2.4.x. Mongo Connector may work with versions of MongoDB prior to 2.4.x, but this has not been tested.

My oplog progress file always seems really out of date. What's going on?

Mongo Connector updates the oplog progress file (called oplog.timestamp, by default) whenever its cursor into the MongoDB oplog is closed. Note that this may come long after Mongo Connector has read and processed all entries currently in the oplog. This is due to the connector's use of a tailable cursor, which can be re-used to retrieve documents that arrive in the oplog even after the cursor is created. Thus, you cannot rely on the progress file being updated automatically after the oplog is exhausted.

Instead, Mongo Connector provides the --batch-size option with which you can specify the maximum number of documents Mongo Connector may process before having to record its progress. For example, if you wanted to make sure that Mongo Connector records its progress at least every 100 operations in the oplog, you could run:

mongo-connector -m <source host/port> -t <destination host/port> --batch-size=100

Why are some fields in my MongoDB documents not appearing in Solr?

Documents that are missing or have additional fields to the Solr collection schema cannot be inserted, and Solr will log an exception. Thus, Mongo Connector tries to read your Solr collection's schema prior to replicating any operations to Solr in order to avoid sending invalid requests. Documents replicated to Solr from MongoDB may need to be altered to remove fields that aren't in the schema, and the result may look as if your documents are missing certain fields.

The solution to this is to update your schema.xml file and reload the relevant Solr cores.

What is the mongodb_meta index in Elasticsearch?

Mongo Connector creates a mongodb_meta index in Elasticsearch in order to keep track of when documents were last modified. This is used to resolve conflicts in the event of a replica set rollback event, but is kept in a separate index so that it can be removed easily if necessary.

Why are my documents empty in Elasticsearch? Why are updates not happening in Elasticsearch?

Mongo Connector needs _source to be enabled in order to apply update operations. Make sure that you have this enabled.

How many threads does Mongo Connector start?

Mongo Connector starts one thread for each oplog (i.e., each replica set), and an additional thread to monitor them. Thus, if you have a three-shard cluster, where each shard is a replica set, you will have:

  • 1 Connector thread (starts OplogThreads and monitors them)
  • 3 OplogThreads (one for each shard)

How do I increase the speed of Mongo Connector?

  1. Increase the value for --auto-commit-interval (or, even better, don't specify it at all and let it be None). Setting this value higher means we don't need to refresh the remote system as often and can save time. Leaving this option out entirely leaves when to refresh indexes up to the remote indexing system itself. Most indexing systems have some way to configure this.
  2. If you need only to replicate certain collections, use the --namespace-set option to specify these. You can also run separate instances of Mongo Connector, each with a single namespace to replicate, so that you can replicate those namespaces in parallel. Note that this may mean that some collections may be further ahead/behind others, especially if the number of operations is unbalanced across these collections.
  3. You can increase the value for --batch-size, or leave it out, so that Mongo Connector records its timestamp less frequently.
  4. You can increase the value for the bulkSize for your DocManagers, so that more documents are sent in each request to the remote end.

Does Mongo Connector support dynamic schemas for Solr?

Mongo Connector does not currently support this. However, restarting Mongo Connector will cause it to re-read the schema definition.

How can I load several Solr cores with Mongo Connector?

There are two options:

  1. Use multiple solr_doc_managers. When you do this, all MongoDB collections go to all cores. This isn't a very common use case.
  2. Use multiple instances of mongo-connector, passing the base URL of the core to docManagers.XXX.targetURL. This allows you to refine what collections and what fields from each document get sent to each core.

I can't install Mongo Connector! I'm getting the error "README.rst: No such file or directory"

Make sure you have a recent version of setuptools installed. Any version after 0.6.26 should do the trick:

pip install --upgrade setuptools

Can I run more than one instance of mongo-connector at the same time?

The short answer is yes. However, care must be taken so that multiple connectors operate on mutually exclusive sets of namespaces. This is fine:

mongo-connector -n a.b -g A.B
mongo-connector -n c.d -g C.D

However, the following should be avoided:

mongo-connector -n a.b -g A.B
mongo-connector -n c.d -g A.B

as well as:

mongo-connector -n a.b -g A.B
mongo-connector -n a.b -g C.D

How can I install Mongo Connector without internet access?

On a server that does have internet access:

python -m pip install --download /path/to/some/dir mongo-connector

Then, on the offline server (which is connected to the first server):

python -m pip install  --ignore-installed --no-index --find-links /path/to/some/dir mongo-connector

N.B. Pip is available in standard Python build. However, if your offline server runs CentOS 6.x or any older Linux distro with Python 2.6.x, you might have an outdated version of pip that doesn't use the wheel package format. Some users have reported having difficulty installing packages offline without the wheel package format, so you should either upgrade pip, or you may need to run the first command several times:

On the server with http access, use same outdated pip 1.3.1 to grab dependencies. Make sure to run the commands several times pip install --download /path/to/some/dir mongo-connector (probably 2~3 times so it will grab all the needed dependencies, each time you'll see additional .gz in the folder). After that you can tar zcvf mongo-connector.tar.gz /path/to/grabbed/dependencies and transfer it to the offline server for pip install --no-index --find-links /path/to/some/dir mongo-connector. This is not an issue with pip 7.x.x as it uses Wheel.

Why is the last entry already processed, Up to date, while using namespace command line args, even though collections are not synced to destination?

Mongo-connector works by tailing the oplogs in mongodb. While using namespaces, mongo-connector specifically looks for oplog entries tagged with the given namespace. For example, if you have used mongorestore to restore a whole database with multiple collections in it, mongodb write only one entry with the database name to the oplog. So, trying to use mongo-connector specifically on one collection wouldn't sync anything because there are no entries for that collection. So, make sure some kind of operations are performed on the namespace you are trying yo use.

Why can't I use Mongo Connector with only the mongos?

Mongo-Connector must be able to read the oplogs, which live on the "local" database of each shard. The "local" database is not accessible through mongos, it can only be accessed by directly connecting to each shard. If you are getting the error "OperationFailure: not authorized on local to execute command { find: "oplog.rs", filter: {}, limit: 1, singleBatch: true }", then you are not able to connect to your shards. You can test this yourself by connecting to the mongos and running sh.status(). Try connecting directly to the shard addresses, if you cannot connect then mongo-connector will not be able to run.

If you are running your cluster through Compose or another hosting tool, make sure that you are able to directly access your shards.

Using Mongo Connector with Docker

We are collecting information from various users about their experiences using Mongo-Connector with Docker. Please check here before filing a new ticket.

  • ServerSelectionTimeoutError: Could not reach any servers in [(u'344da2f17060', 27017)]. Replica set is configured with internal hostnames or IPs? See issue 391.
  • ImportError: No module named 'mongo_connector.doc_managers. elastic_doc_manager See issue 436.
  • Last entry no longer in oplog cannot recover! See issue 287.
  • ConnectionFailed: ConnectionError(('Connection aborted.', BadStatusLine("''",))) caused by: ProtocolError(('Connection aborted.', BadStatusLine("''",))) See issue 251.

InvalidBSON: date value out of range

This happens when decoding a document that contains a date value that is outside the range that can be represented with Python datetimes. For example, a year greater than 9999. Since the issue is with Python datetimes, there isn't much that mongo-connector itself can do about it. To work around the issue, install the PyMongo C extensions. The Python C API allows the creation of datetimes to represent a wider range of dates.

Clone this wiki locally