Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joyent Public Cloud shutting down #1827

Closed
rvagg opened this issue Jun 7, 2019 · 23 comments
Closed

Joyent Public Cloud shutting down #1827

rvagg opened this issue Jun 7, 2019 · 23 comments

Comments

@rvagg
Copy link
Member

rvagg commented Jun 7, 2019

(Edit 28-Jun-2019: this doesn't appear to be a problem that will impact Node.js directly. As Colin points out we should still be part of their "private cloud offering". So, panic averted.)

https://docs.joyent.com/joyent-public-cloud-eol

The Joyent Public Cloud (Triton Cloud) will reach End-of-Life (EOL) on November 9, 2019. The Joyent Public Cloud / Triton Cloud, including the my.joyent.com user login and customer-facing APIs, will no longer accept new customers as of June 6, 2019, and will discontinue serving existing customers upon EOL on November 9th.

In server numbers, Joyent is our 3rd largest donor. We've also had to worry less about their long-term commit than other donors due to their history and ongoing interest in Node.

So we're going to need to either find new resources, or downscale our infrastructure commitments. I've been advocating doing more of the latter for some time now anyway because I think it's clear that we've overstretched ourselves and the last couple of years has seen the negative results of that. Our infrastructure commitments should be in proportion to the ability of our team to manage it.

When looking at what we do with our Joyent infrastructure, it turns out that most of what we do there is maintaining (testing and releasing) SmartOS builds. Without alternative arrangements from Joyent it seems to me that we will need to entirely cease SmartOS testing and releasing across all release lines by November.

Here's a breakdown:

Testing

  • test-joyent-smartos14-x64-1
  • test-joyent-smartos14-x64-2
  • test-joyent-smartos14-x86-1
  • test-joyent-smartos14-x86-2
  • test-joyent-smartos15-x64-1
  • test-joyent-smartos15-x64-2
  • test-joyent-smartos16-x64-1
  • test-joyent-smartos16-x64-2
  • test-joyent-smartos17-x64-1
  • test-joyent-smartos17-x64-2
  • test-joyent-smartos18-x64-1
  • test-joyent-smartos18-x64-2
  • test-joyent-freebsd10-x64-1 - we have 2 more FreeBSD machines not on Joyent that can pick up this slack, one on DigitalOcean and one on Rackspace. They were (are?) being used for linter jobs. I'd be fine with dropping FreeBSD 10 from our infra entirely if we can remove dependence on it.
  • test-joyent-freebsd10-x64-2
  • test-joyent-ubuntu1604_arm_cross-x64-1 - redundant cross compiler for ARMv6 and ARMv7 testing binaries, we have another one of these on Azure, we'll need to decide how important redundancy is
  • test-joyent-ubuntu1604_docker-x64-1 - redundant Docker host, we have two more on DigitalOcean and one on SoftLayer, we may be able to pick up the slack with the others and we have the option to scale up the other 3 servers to enable more concurrent containers if needed.
  • test-joyent-ubuntu1804-x64-1 - redundant Ubuntu 18.04 test machine, the second one is on DigitalOcean. An important target, we should maintain redundancy somehow.

Release

  • release-joyent-smartos14-x64-1
  • release-joyent-smartos15-x64-1
  • release-joyent-smartos17-x64-1
  • release-joyent-smartos18-x64-1
  • release-joyent-ubuntu1604_arm_cross-x64-1 - this is our only ARMv6 and ARMv7 cross compiler for release, we'll need a new one of these somewhere

Infra

  • infra-joyent-smartos15-x64-1 "backup" - maintains daily backups of some of our most important pieces of infrastructure, www, ci, ci-release. We'll need a new backup strategy and keeping some history here might also be useful so we can't let this just disappear.
  • infra-joyent-ubuntu1604-x64-1 "unencrypted" - runs unencrypted.nodejs.org and also our secondary www machine (maintains a regular sync with the primary) which picks up the slack from CloudFlare when our primary server is down. This has been very important for the smooth running of nodejs.org over the last couple of years and is why we haven't seen any major downtime in that time. So yeah, we need a new strategy or this.

So there's a few important pieces here. It's not obvious to me yet where we're going to get these resources from without shedding more resources from our other main providers. Something to discuss further in the next couple of months but we can't let this creep up on us too quickly!

/cc @cjihrig if there's anything else from the Joyent side on this—especially as it related to our ability to test and release for SmartOS.
@mhdawson @sam-github you should probably give the TSC a heads-up on this. The dropping of SmartOS during the lifecycle of the majors we have going is certainly something they'll need to know about because we've never done that before.

@sam-github
Copy link
Contributor

Joyent still does managed private clouds, so, would they really be OK with us ending support for SmartOS because their public cloud offering went away? Any way to find out? @nodejs/platform-smartos

Since they still do managed private cloud, and have a bunch of partners who might care about Node.js on SmartOS (https://docs.joyent.com/joyent-public-cloud-eol/partners), perhaps they will be willing to provide hosts for SmartOS test&release through some mechanism other than their cloud.

@rvagg
Copy link
Member Author

rvagg commented Jun 7, 2019

Well, here's the thing about that: as I stated in the 12.x round of supported platforms consolidation, we are continuing to maintain SmartOS support primarily because of the significant additional infrastructure support that Joyent provides (as demonstrated above). It's a mutually beneficial arrangement and we're always happy to find ways to provide benefits for our donors as long as it doesn't undermine the integrity of our organisation.
If we end up with only the cost of maintaining SmartOS and no benefit from doing so, then I'm going to advocate that we drop it anyway because the cost is arguably greater than FreeBSD because it's so much more different than even BSD, has far fewer online help resources and there's just way too much XML and obscure management tools. There's also far fewer users of Node SmartOS than FreeBSD so it's hard to argue for our supported platforms strategy without the mutually beneficial arrangement.
But perhaps they could help with a path to a more stable phase out, such that we continue support for <=12 but drop it in 13 rather than just dump it all at once. I don't think I mind either option; we have the tools to provision all of the machine types we need if they have images for them all available and consistency of our supported platforms strategy is nice, but I'd also be happy to remove the burden of support sooner than later.

@targos
Copy link
Member

targos commented Jun 7, 2019

About getting resources from other providers, have we already asked Google if they're willing to donate something? I don't think we have any VM running on GCP yet?

@sam-github
Copy link
Contributor

@targos GCP does smartos? Or, you are suggesting we could perhaps use GCP to replace some of the generic infrastructure running on smartos at the moment?

@targos
Copy link
Member

targos commented Jun 7, 2019

I'm talking about the other important things like website / Ubuntu

@rvagg
Copy link
Member Author

rvagg commented Jun 7, 2019

Google is willing I believe, but they are very particular about what they offer. If we could do everything in containers then it'd probably be fine. They could probably take on some of our critical needs, like www-backup, but unfortunately I suspect there's going to be some very non-trivial cost in adjusting to the Google way ™️ just to get simple things done, like storing files long-term in something that acts like a normal filesystem. I've said to @MylesBorins in the past that it'd be nice to have someone Googly on the Build team to make something happen. Otherwise we're stuck with the choice of whether we personally want to invest in learning an entirely new stack just for this.

@bradleythughes
Copy link
Member

Regarding the FreeBSD 10.x hosts, these can most certainly be retired, since upstream support of the 10.x series ended on October 31, 2018.

@cjihrig
Copy link
Contributor

cjihrig commented Jun 7, 2019

My understanding is that nothing is going to change in the arrangement between Joyent and the Node.js project. Public cloud is going away, but as Sam pointed out, Joyent is still doing private cloud and will still have the resources to donate. The Build WG resources should not be impacted. If they are, please let me know.

@jbergstroem
Copy link
Member

The Build WG resources should not be impacted. If they are, please let me know.

How would we know if we are impacted? Would it pop up as a notice in the dashboard?

@cjihrig
Copy link
Contributor

cjihrig commented Jun 7, 2019

How would we know if we are impacted? Would it pop up as a notice in the dashboard?

For example, in late February / early March instances needed to be migrated. The Joyent team reached out to @mhdawson (and I believe @rvagg) to get them migrated. The DCs where the resources currently exist are not going away. They won't be going away anytime soon either - new private cloud customers are still coming in, and Joyent is still owned by Samsung, who they must support.

@jbergstroem
Copy link
Member

@cjihrig said: The Joyent team reached out

Thanks for clarifying.

@MylesBorins
Copy link
Contributor

I'm very happy to help get Google Cloud Resources. Are there specific resources that would be helpful?

@mhdawson
Copy link
Member

@MylesBorins in line with our current approaches virtual machines would be what would fit into our existing infrastructure. Does Google Cloud support that?

@MylesBorins
Copy link
Contributor

@mhdawson I can likely get us credits for GCP. We have a VM offering with a number of different images including the ability to bring your own. Would that suffice?

@rvagg
Copy link
Member Author

rvagg commented Jun 11, 2019

@cjihrig sounds good! Thanks for the clarification. I'm still unclear on what "public" and "private" are in this context--I was imagining that "private" is on-prem, but is it more of an "invite-only" situation where the resources are still being managed by Joyent?

@MylesBorins perhaps if we could get some credits we could have a look in and poke around and see what we could use. There's plenty of scope to do container work (we're already doing plenty and can expand), but VMs are the core of what we need to do since we need OS and even Linux kernel variability.

@cjihrig
Copy link
Contributor

cjihrig commented Jun 11, 2019

I'm still unclear on what "public" and "private" are in this context--I was imagining that "private" is on-prem, but is it more of an "invite-only" situation where the resources are still being managed by Joyent?

So in this context, public means anyone with a credit card can sign up and start using the cloud. Private doesn't necessarily mean invite-only, but there is some business agreement before you can begin provisioning things.

@mhdawson
Copy link
Member

@MylesBorins as @rvagg mentioned given that there are vms with different images if we can get some ongoing credits we can check out to see how we can use the vms.

@sam-github
Copy link
Contributor

Speaking of linter jobs, and trimming down the jobs that have to be maintained, why do we have a specific job for this? If lint fails, every platform fails on the test target, so why have one specifically for lint?

@rvagg
Copy link
Member Author

rvagg commented Jun 12, 2019

@sam-github the original idea was that we'd run linting first and if it fails we wouldn't bother with the other jobs, saving resources and short circuiting the feedback process for the user. Now, in node-test-pull-request-lite-pipeline, it runs lint in parallel to the normal test runs, this removes the shortcut but the parallelism makes total run times for lint-passing jobs shorter. If you look in Makefile you should find that test-ci doesn't lint, so we shave that time off all the non-lint test runs.

@sam-github
Copy link
Contributor

@rvagg makes sense, thanks for the background

@rvagg
Copy link
Member Author

rvagg commented Jun 28, 2019

Will close this since it seems mostly irrelevant if we are maintaining our relationship with Joyent (thanks again Joyent, it's very much appreciated!). Will put a note in the OP to avoid confusion for anyone reading this from the top.

But this is illustrative of the kind of concern we have with any of our major providers. We're nicely spread across all of them, so if you look across any of our infra providers, taking any of them out is going to leave a hole, and in most cases a very large and/or awkward hole.

@rvagg rvagg closed this as completed Jun 28, 2019
@nwilkens
Copy link

nwilkens commented Sep 25, 2019

A late note, but https://mnx.io is now home to a number of Joyent refugees -- we'd be glad to sponsor some infrastructure for Node, and we have SmartOS instances too. Reach me direct using nick at mnx io if this is of interest.

@rvagg
Copy link
Member Author

rvagg commented Sep 25, 2019

thanks for the info @nwilkens, I'll be in touch when I have some time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants