Skip to content
This repository has been archived by the owner on Feb 12, 2024. It is now read-only.

Create an example for configuring the ipfs repo #1303

Merged
merged 7 commits into from
Apr 30, 2018
Merged

Conversation

jacobheun
Copy link
Contributor

Also demonstrates how to customize the repo lock, which is a new addition

depends on ipfs/js-ipfs-repo#162

// Initialize our repo and IPFS node
const myRepo = new Repo('/tmp/custom-repo/.ipfs', customRepositoryOptions)
const node = new IPFS({
repo: myRepo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jacobheun I think it makes sense to remove the step of requiring 'ipfs-repo' and set it up manually by just adding a repoOptions object to the IPFS constructor options that lets us pass the storagebackends directly.

@daviddias
Copy link
Member

@Mr0grog mind reviewing this tutorial?

@daviddias
Copy link
Member

daviddias commented Apr 9, 2018

@kumavis @hermanjunge with this addition, it means that a user should be able to plug the parity to IPFS bridge pretty easily. @vmx, you have experience in shipping npm modules that pull rust code, compile it and expose bindings to it, can you help us get the Parity bridge in a nice package?

Parity to IPFS issue #763

@jacobheun
Copy link
Contributor Author

@diasdavid I updated the IPFS constructor to accept repoOptions and the example is updated to leverage that. Let me know if this is what you were thinking, I think it's cleaner.

@@ -0,0 +1,24 @@
# Customizing the IPFS Repo

> This example shows you how to customize your repository, including where your data is stored and how the repo locking is managed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this formatted as a quote?

It might be helpful to provide some short examples of why or when you’d want to do this, for example:

  • If you want to store data somewhere that’s not on your local disk, like S3, a Redis instance, a different machine on your local network, or in your own database system, like MongoDB or Postgres, you might use a custom datastore.

  • If you have multiple browser windows or workers sharing the same IPFS storage, you might want to use a custom lock to coordinate between them. (Locking is used to ensure only one IPFS instance can access a repo at a time.)

I think the parenthetical on that last one is important (maybe it should be called out separately?), since otherwise you have to do some digging to see what operations hold locks — the only one is repo.open() as far as I can tell.

Since you’ve done some thinking here, you might have some better ideas or more concrete examples you could list :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be helpful to say a little bit here about how to set those things and link to those lines in the index.js file. Linking to the relevant docs at https://github.com/ipfs/js-ipfs-repo might good here, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think some examples of why would be very helpful, I'll get some added along with the ones you've mentioned as I think they are both likely to be common scenarios.

Also, your feedback is great and really appreciated, thank you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem! Glad it was helpful :)

## Other Options

### Custom Repo Lock
> This example sets the repo locker to `false`, preventing any locking from happening. If you would like to control how locking happens, such as with a centralized S3 ipfs repo, you can pass in your own custom locker. See [custom-locker.js](./custom-locker.js) for an example of a custom locker that can be used for [datastore-s3](https://github.com/ipfs/js-datastore-s3).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this formatted as a quote?

I know the internals of js-ipfs-repo use the term “locker,” but from the perspective of a native english speaker (and typical programming terminology), this is weird wording. The thing that locks and unlocks is the “lock,” while the thing that holds stuff and has a built-in lock is the “locker” (that would be the whole repo in this case). I feel like this would be clearer as custom-lock.js and class S3Lock.


const PATH = require('path')

class S3Locker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to have a class-level description of what this whole thing is doing. Maybe something this?

Uses an object in an S3 bucket as a lock to signal that an IPFS repo is in use. When the object exists, the repo is in use. You would normally use this to make sure multiple IPFS nodes don’t use the same S3 bucket as a datastore at the same time.

lock (dir, callback) {
this.lockPath = this.getLockfilePath(dir)

this.s3.put(this.lockPath, Buffer.from(''), (err, data) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t this call this.locked(dir, (error, inUse) => ...) first and call callback(new Error('Repo is locked') (or some similar message) if the lock is already held by someone else?

It looks like the memory lock in js-ipfs-repo doesn’t do this, which I think is a bug (@diasdavid / @dignifiedquire ?) — the fs lock does (via the underlying lock-me library).

* @returns {void}
*/
lock (dir, callback) {
this.lockPath = this.getLockfilePath(dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diasdavid or @dignifiedquire might be able to say more on this, but it seems like this is contrary to the existing lock API, where you ought to be able to request two different locks simultaneously:

MyLock.lock('first-directory', (error, firstLock) => {
  MyLock.lock('second-directory', (error, secondLock) => {
    // do some stuff
    secondLock.close()
    firstLock.close()
  })
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch. While the system currently only manages a single lock file the existing API in memory and fs don't prohibit multiple locks. I'll get this and the locked call added.


/**
* Storage Backends are fully customizable. Each backend can be stored in seperate services,
* or in a single service. Options can be passed into the datastores via the storageBackendOptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not really the backend that’s stored in a service, right? Maybe something like this would be a little clearer?

IPFS nodes store different information in separate storageBackends, or datastores. Each storage backend can use the same type of datastore or a different one — you could store your keys in a levelDB database while everything else is in files, for example. (See https://github.com/ipfs/interface-datastore for more about datastores.)

* property shown below.
*/
storageBackends: {
root: require('datastore-fs'), // version and config data will be saved here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This little description is super helpful. Maybe you could add something similar for the rest of these? Alternatively, maybe you could add some of this info to the docs for js-ipfs-repo.

}
},

// false will disable locking, you can also pass in a custom locker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comma should be a semicolon.

console.log('\nAdded file:', filesAdded[0].path, filesAdded[0].hash)
fileMultihash = filesAdded[0].hash
cb()
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this were split into steps and you used async.seq instead of async.series:

(cb) => node.version(cb),
(version, cb) => {
  console.log('Version:', version.version)
  cb()
},
(cb) => node.files.add({
  path: 'test-data.txt',
  content: Buffer.from('We are using a customized repo!')
}, cb),
(filesAdded, cb) => {
  console.log('\nAdded file:', filesAdded[0].path, filesAdded[0].hash)
  cb(null, filesAdded[0].hash)
},
(fileMultiHash, cb) => node.files.cat(fileMultihash, cb),
// etc.

You could get rid of the error handling logic so it reads a bit clearer for people trying to follow it. This would remove the need for the fileMultihash global, too.

Alternatively, using the promise-based return values would let you do this without a special dependency:

node.on('ready', () => {
  node.version()
    .then(version => console.log('Version:', version.version)
    .then(() => node.files.add({
      path: 'test-data.txt',
      content: Buffer.from('We are using a customized repo!')
    })
    .then(filesAdded => {
      console.log('\nAdded file:', filesAdded[0].path, filesAdded[0].hash)
      return filesAdded[0].hash
    })
    .then(fileMultihash => node.files.cat(fileMultihash))
    // etc.
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be good to catch any errors at the end whether you use series, seq, or promises, just to demonstrate good practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the promise structure is much easier to consume and that's a great point about the error catching. I'll include those changes in the next update I push.

@@ -43,7 +43,7 @@ class IPFS extends EventEmitter {

if (typeof options.repo === 'string' ||
options.repo === undefined) {
this._repo = defaultRepo(options.repo)
this._repo = defaultRepo(options.repo, options.repoPath)
Copy link
Contributor

@Mr0grog Mr0grog Apr 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a typo, right? In the examples, you set options.repoOptions.

If you are going to change the API by adding a new option, would you please document it with the rest of the constructor options in the README?

Personally, I’m also a little unsure about the value of this. It’s basically just replacing:

new IPFS({
  repo: new Repo('/some/path', {options})
})

with

new IPFS({
  repo: '/some/path',
  repoOptions: {options}
})

You don’t have to explicitly import ipfs-repo anymore to get a fully customizable repo, but the whole behavior is more complicated as a result (repoOptions won’t do anything if you don’t also set repo or if you set repo to an actual repo instance). There are also now multiple ways to do the same repo customization. I think this makes the API a little less clear.

That said, I’m more here to focus on docs, so I’ll try not to push or say any more on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good point and I agree. The system has a lot of complexity, so avoiding more options in the API is a good thing in my opinion. I think it also helps delineate using the default repo, versus a custom repo, more clearly.

console.log('\nFetched file content:')
process.stdout.write(data)
cb()
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn’t thinking about this yesterday, but it might also be useful to inspect the filesystem after all these operations and demonstrate that the custom repo options were used. You’d probably want to set options at the top here that aren’t the same as the defaults so their effects are visible, e.g. extension: '.customblocks' or something.

@jacobheun
Copy link
Contributor Author

@Mr0grog I've incorporated your feedback and I've included the latest lock changes from ipfs/js-ipfs-repo#162.

Copy link
Contributor

@Mr0grog Mr0grog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, except for one part that didn’t get updated with the rest! (And one other minor note inline that’s not as big a deal.)

## Other Options

### Custom Repo Lock
This example sets the repo locker to `false`, preventing any locking from happening. If you would like to control how locking happens, such as with a centralized S3 IPFS Repo, you can pass in your own custom lock. See [custom-lock.js](./custom-lock.js) for an example of a custom lock that can be used for [datastore-s3](https://github.com/ipfs/js-datastore-s3).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature (no lock at all) got removed from your repo PR, right? I think the first sentence here needs updating ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mr0grog good catch! I've pushed up this change along with the node.stop call.

// Log out the error, if there is one
.catch((err) => {
console.log('File Processing Error:', err)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to have a final .then(node.stop) that stops the IPFS node so the user doesn’t need to ctrl+c to kill the process.

@jacobheun
Copy link
Contributor Author

@Mr0grog when you have some time can you check the final adjustments I made per your feedback? Thanks!

console.log('\nFetched file content:')
process.stdout.write(data)
console.log('\n\nStopping the node')
return node.stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor technical nit: if you put this after the catch, then it will still run if there was an error, which is probably desirable:

.then((data) => {
  console.log('\nFetched file content:')
  process.stdout.write(data)
})
// Log out the error, if there is one
.catch((error) => {
  console.log('File Processing Error:', error)
})
.then(() => {
  console.log('\n\nStopping the node')
  return node.stop()
})

Otherwise, this all looks lovely! 👍 👍 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've squashed the change into the previous commit and added a comment about not needing to catch errors on the stop.

},
blocks: {
sharding: false, // Used by IPFSRepo Blockstore to determine sharding; Ignored by datastore-fs
extension: '.ipfsblock',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently just .data, let's avoid confusing users.

createIfMissing: true
},
keys: {
extension: '.ipfskey',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the keys repo use an extension?

*/
storageBackendOptions: {
root: {
extension: '.ipfsroot', // Used by datastore-fs; Appended to all files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No extension here. Check:

> ls ~/.jsipfs
blocks    config    datastore keys      repo.lock version

createIfMissing: true
},
datastore: {
extension: '.ipfsds',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No extension used by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added the non-default options was to demonstrate that configuration options to the users. Perhaps I could add in some comments about the defaults here and mention that they are being overridden for demonstration purposes only?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this would probably be useful to call that out here.

It might also be good to, in the last step of the script, do something like:

.then(() => {
  console.log('Check "/tmp/custom-repo/.ipfs" to see what your customized repository looks like on disk.')
}

Copy link
Member

@daviddias daviddias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's missing how to plug the S3 backend

@jacobheun
Copy link
Contributor Author

@diasdavid js-datastore-s3 has a working example of how to use s3 as a full backend, and we reference that in the comments in this example. I think it would be good to avoid duplication of that example to avoid out of date examples as the project evolves, but maybe I can call out that example more prevalently in the README?

@daviddias
Copy link
Member

@jacobheun please do, it was not evident to me and it is one of the main reasons on why to upgrade all repos are configured

remove ability to add repoOptions in favor of just supplying a repo

chore: remove outdated options
automatically stop the node in the custom repo example

docs: update custom repo example
chore: bump custom repo example ipfs-repo version
@Mr0grog
Copy link
Contributor

Mr0grog commented Apr 23, 2018

[the S3 repo] is one of the main reasons on why to upgrade all repos are configured

I don’t think that’s correct — it’s not even used in the example at all. The point of changing all the repo configurations is to make it obvious that what you did affected the output on disk. You can’t see the results if you just set everything to their defaults.

fix: resolve bugs in the custom s3 lock
@jacobheun
Copy link
Contributor Author

I've added in some additional comments and updated the readme to try and spell out the differences with the defaults more clearly.

I also updated the full s3 example in the datastore-s3 repo, ipfs/js-datastore-s3#5, to use the S3Lock instead of memory, and fixed a bug with that lock here. That new update has been tested against one of my S3 buckets.

@daviddias
Copy link
Member

@Mr0grog what I meant was that this endeavor, "upgrading the ipfs-repo module so that it supports custom locking and update js-ipfs to support custom repos" was started because we wanted to enable users to be able to use an S3 bucket as the storage backend.

Most users won't really need to touch the repo if they are just going to use regular fs. Most users, if not all, will trust the defaults. What is really interesting is to have multiple storage backends, from S3 (requested multiple times by IPFS users) and things like #763.

Copy link
Member

@daviddias daviddias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @Mr0grog any last remarks?

@Mr0grog
Copy link
Contributor

Mr0grog commented Apr 24, 2018

what I meant was that this endeavor, upgrading the ipfs-repo module

Ah, sorry, I misunderstood the context here.

Most users won't really need to touch the repo if they are just going to use regular fs. Most users, if not all, will trust the defaults. What is really interesting is to have multiple storage backends

100% agree with all that! I assumed the intent here was not to demo using switching the repo implementation (to S3 or whatever) because it wasn’t actually included in the example. So in absence of that, it seemed important to do something that had an observable effect.

Anyway, looks good to me.

@daviddias daviddias mentioned this pull request Apr 30, 2018
30 tasks
@daviddias daviddias merged commit 61e7f86 into master Apr 30, 2018
@ghost ghost removed the status/in-progress In progress label Apr 30, 2018
@daviddias daviddias deleted the docs/repo-example branch April 30, 2018 14:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants