Skip to content
This repository has been archived by the owner on Mar 25, 2022. It is now read-only.

CiteSeerX instance #123

Closed
davidar opened this issue Nov 22, 2015 · 8 comments
Closed

CiteSeerX instance #123

davidar opened this issue Nov 22, 2015 · 8 comments

Comments

@davidar
Copy link
Member

davidar commented Nov 22, 2015

Apparently, the best way forward regarding mirroring CSX's PDF collection is to setup our own CSX instance, and then mirror that to IPFS. They'll give us a copy of their data to get started, but we'll be responsible for handling DMCA takedowns, etc.

@lgierth Thoughts?

Cc: @jbenet

@davidar
Copy link
Member Author

davidar commented Nov 26, 2015

@mekarpeles Is this something @ArchiveLabs could do?

@mekarpeles
Copy link

For the public access stuff, I believe so.

@ghost
Copy link

ghost commented Dec 4, 2015

Sorry for letting you wait so long -- CSX looks pretty heavy, can you set it up on pollux? Or on another separate host. How much storage do you think that'll need?

@davidar
Copy link
Member Author

davidar commented Dec 4, 2015

@lgierth I think the database is ~4TB compressed, plus whatever extra overhead CSX requires.

@cleegiles Help? :)

@cleegiles
Copy link

It's a bit larger, more like 5T. But most of this are the compressed PDFs - 6.8M

The database, xml and extracted text is much smaller - compressed respectively 20G, 30G, 100G

@davidar
Copy link
Member Author

davidar commented Dec 6, 2015

@lgierth Here's a docker image with most of the dependencies required for CiteSeerX. I'm having trouble getting mysqld to start though, would you mind taking a look?

@cleegiles
Copy link

It would be best to take this up here:

https://github.com/SeerLabs/CiteSeerX/

On 12/6/15 3:02 AM, David A Roberts wrote:

@lgierth Here's a docker image with most of the dependencies required for CiteSeerX. I'm having trouble getting mysqld to start though, would you mind taking a look?


Reply to this email directly or view it on GitHub:
#123 (comment)

@ghost
Copy link

ghost commented Nov 3, 2016

@davidar do you still wanna work on this? We could just get you a host with a big disk and you're root

@ghost ghost added the storage label Nov 3, 2016
@ghost ghost closed this as completed Aug 6, 2018
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants