-
Notifications
You must be signed in to change notification settings - Fork 479
FAQ
This document encompasses many of the frequently asked questions (FAQs) about Mongo Connector.
- My oplog progress file always seems really out of date. What's going on?
- Why are some fields in my MongoDB documents are not appearing in Solr?
- What is the
mongodb_meta
index in Elasticsearch? - Why are my documents empty in Elasticsearch? Why are updates not happening in Elasticsearch?
- How many threads does Mongo Connector start?
- How do I increase the speed of Mongo Connector?
- Does Mongo Connector support dynamic schemas for Solr?
- How can I load several Solr cores with Mongo Connector?
- I can't install Mongo Connector! I'm getting the error "README.rst: No such file or directory"
Mongo Connector updates the oplog progress file (called config.txt
, by default) whenever its cursor into the MongoDB oplog is closed. Note that this may come long after Mongo Connector has read and processed all entries currently in the oplog. This is due to the connector's use of a tailable cursor, which can be re-used to retrieve documents that arrive in the oplog even after the cursor is created. Thus, you cannot rely on the progress file being updated automatically after the oplog is exhausted.
Instead, Mongo Connector provides the --batch-size
option with which you can specify the maximum number of documents Mongo Connector may process before having to record its progress. For example, if you wanted to make sure that Mongo Connector records its progress at least every 100 operations in the oplog, you could run:
mongo-connector -m <source host/port> -t <destination host/port> --batch-size=100
Documents that are missing or have additional fields to the Solr collection schema cannot be inserted, and Solr will log an exception. Thus, Mongo Connector tries to read your Solr collection's schema prior to replicating any operations to Solr in order to avoid sending invalid requests. Documents replicated to Solr from MongoDB may need to be altered to remove fields that aren't in the schema, and the result may look as if your documents are missing certain fields.
The solution to this is to update your schema.xml
file and reload the relevant Solr cores.
Mongo Connector creates a mongodb_meta
index in Elasticsearch in order to keep track of when documents were last modified. This is used to resolve conflicts in the event of a replica set rollback event, but is kept in a separate index so that it can be removed easily if necessary.
Mongo Connector needs _source
to be enabled in order to apply update operations. Make sure that you have this enabled.
Mongo Connector starts one thread for each oplog (i.e., each replica set), and an additional thread to monitor them. Thus, if you have a three-shard cluster, where each shard is a replica set, you will have:
- 1 Connector thread (starts OplogThreads and monitors them)
- 3 OplogThreads (one for each shard)
- Increase the value for
--auto-commit-interval
(or, even better, don't specify it at all and let it beNone
). Setting this value higher means we don't need to refresh the remote system as often and can save time. Leaving this option out entirely leaves when to refresh indexes up to the remote indexing system itself. Most indexing systems have some way to configure this. - If you need only to replicate certain collections, use the
--namespace-set
option to specify these. You can also run separate instances of Mongo Connector, each with a single namespace to replicate, so that you can replicate those namespaces in parallel. Note that this may mean that some collections may be further ahead/behind others, especially if the number of operations is unbalanced across these collections. - You can increase the value for
--batch-size
, or leave it out, so that Mongo Connector records its timestamp less frequently.
Mongo Connector does not currently support this. However, restarting Mongo Connector will cause it to re-read the schema definition.
There are two options:
- Use multiple
solr_doc_manager
s. When you do this, all MongoDB collections go to all cores. This isn't a very common use case. - Use multiple instances of
mongo-connector
, passing the base URL of the core todocManagers.XXX.targetURL
. This allows you to refine what collections and what fields from each document get sent to each core.
Make sure you have a recent version of setuptools
installed. Any version after 0.6.26 should do the trick:
pip install --upgrade setuptools