Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial CLI rewrite #412

Merged
merged 6 commits into from
Jan 30, 2024
Merged

Initial CLI rewrite #412

merged 6 commits into from
Jan 30, 2024

Conversation

cretz
Copy link
Member

@cretz cretz commented Jan 11, 2024

CLI Rewrite

This is the initial framework for adding commands, and the following commands are already implemented:

  • temporal env delete
  • temporal env get
  • temporal env list
  • temporal env set
  • temporal server start-dev
  • temporal task-queue describe
  • temporal workflow describe
  • temporal workflow execute
  • temporal workflow list
  • temporal workflow start

Please review.

Why

Address tech debt before we can't

Known improvements

  • Cobra (any arg at any place)
  • Customize path to env file
  • Global log-level customization
  • Global json vs text data output customization
  • Markdown-based code generation
  • Solid test framework
  • Added --input-encoding to support more payload types (e.g. can support proto JSON and even proto binary)
  • Library available for docs team to write doc generator with
  • JSON output is reasonable for tool use
  • Properly gives failing status code if workflow fails on "execute" but JSON output is set
  • --color is available to disable any coloring
  • Dev server reuses logger meaning it is on stderr by default
  • workflow execute now streams the event table instead of waiting
  • Use shorthand JSON payloads by default on (non-history) JSON output which makes payloads much more readable
  • workflow describe now doesn't force users to see JSON, there is a non-JSON text form

Known incompatibilities

NOTE: All of these incompatibilities are intentional and almost all decisions can be reverted if decided.

  • Removed --memo-file from workflow args
  • --color not currently implemented everywhere (like for logs)
  • Removed paging by default (i.e. basically --no-pager behavior)
  • Duration arguments require trailing unit (i.e. --workflow-timeout 5 is now --workflow-timeout 5s)
  • --output table and --output card blended to --output text (the default), but we may let table options be applied as separate params
  • TEMPORAL_CLI_SHOW_STACKS - no stack trace-based errors
  • --tls-ca-path cannot be a URL
  • Not explicitly setting TLS SNI name from host of URL
  • JSON output for things like workflow start use more JSON-like field names
  • Workflow history JSON not dumped by default as part of workflow execute when JSON set
  • Concept of --fields long is gone, now whether some more verbose fields are emitted is controlled more specifically
  • To get accurate workflow result, workflow follows continue as new for workflow execute
  • Removed the -f alias for --follow on workflow show
  • server start-dev will reuse the root logger which means:
    • Default is text (or "pretty") instead JSON
    • No way to set level to "fatal" only
    • All panic and fatal logs are just error logs
    • Goes to stderr instead of stdout
  • server start-dev --db-filename no longer auto-creates directory path of 0777 dirs if not present
  • workflow execute when using --event-details (equivalent of --fields long) now shows full proto JSON attributes instead of wrapped partial table
  • workflow start and workflow execute no longer succeeds by default if the workflow exists already, but --allow-existing exists
  • The text version of workflow describe does not have all the things the JSON version does (and in the past there was only JSON)
  • workflow list in JSON now does one object at a time instead of before where it was 100 objects at a time
  • workflow list in text now does a page at a time before realigning table (sans headers) instead of 100 at a time
  • workflow list in text no longer includes --fields long
  • Removed --context-timeout since it is confusing when you might want to customize it (can re-add timeout concepts if needed)

Notes about approach taken

  • Did not spend time trying to improve documentation, so all of the inconsistent documentation remains and should be cleaned up separately
  • Did not spend (much) time trying to completely change behavior or commands
  • Compatibility intentionally retained in most reasonable ways
  • File-level copyright notices retained on places with DataDog
  • Expecting better formatting to come later

TODO

In addition to all the known missing commands, I have opened the following issues:

@cretz cretz force-pushed the cli-rewrite-first branch 7 times, most recently from 1d5a35a to 9f307ef Compare January 11, 2024 22:55
@cretz cretz requested review from dnr and a team January 11, 2024 23:08
@cretz cretz marked this pull request as ready for review January 11, 2024 23:09
Copy link
Member Author

@cretz cretz Jan 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, much of this documentation was copied verbatim. Please do not use this review to review CLI docs. Documentation was an explicit non-goal of this initial framework. Feel free to submit a PR after this PR is merged to improve docs.

Copy link
Member

@Sushisource Sushisource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM! Nothing blocking.

I might've done the inverse with the command definitions here - generating the docs from the code (with docstrings) rather than the other way around, but, you've already done all this and I think it works fine too.

One thing I noticed while running in this branch: the UI isn't properly parsing some stuff. Event names for one. I wonder if there's some mismatch with proto versions?

The server output now also looks like this on a per-line basis by default in text mode:
time=2024-01-17T11:18:32.539 level=ERROR msg="service failures" operation=PollWorkflowTaskQueue wf-namespace=default error="last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:38383: connect: connection refused\""

IMO if this is changing by default (I think it should - it's awful right now in JSON mode) then we should make it even friendlier and look fairly standard like:
2024-01-17T11:18:32.539 [ERROR]: "service failures" operation=PollWorkflowTaskQueue wf-namespace=default error="last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:38383: connect: connection refused\""

Identity: clientIdentity(),
// We do not put codec on data converter here, it is applied via
// interceptor. Same for failure conversion.
// XXX: If this is altered to be more dynamic, have to also update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The heck is XXX?

Copy link
Member Author

@cretz cretz Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common type of callout like "TODO" but more like "note". Many IDE's recognize it and highlight it (it's an old Java thing). I was just making a callout without a TODO. I'm not gonna try to search it or its origins though, heh.

"github.com/temporalio/cli/temporalcli/internal/printer"
)

func (c *TemporalEnvDeleteCommand) run(cctx *CommandContext, args []string) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking but it might be nice to have a dry run option for these env commands that just prints the config which would be written

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Would suggest feature requests to improve beyond parity be opened separately though.

func (c *TemporalEnvDeleteCommand) run(cctx *CommandContext, args []string) error {
keyPieces := strings.Split(args[0], ".")
if len(keyPieces) > 2 {
return fmt.Errorf("env property key to get cannot have more than one dot")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("env property key to get cannot have more than one dot")
return fmt.Errorf("env property key to delete cannot have more than one dot")

Copy link
Member Author

@cretz cretz Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

// related to env config stuff above.
LookupEnv func(string) (string, bool)

// These two default to OS values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "These two" won't really make sense when peeking a docstring for just this field

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the godoc keeps the spacing for struct fields and therefore it should work. But I can change it to "These two fields below"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

return flagErr
}

// TODO(cretz): Make it clear this logs error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I will godoc this and other parts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines +72 to +73
} else if canAttr := closeEvent.GetWorkflowExecutionContinuedAsNewEventAttributes(); canAttr != nil {
closeEvent, runID = nil, canAttr.NewExecutionRunId
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonblocking: Probably makes sense to have a --dont-follow-continue-as-new option too, but that seems to not be present in the existing version either so no biggie.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 We should track all of the improvements somewhere (can just create issue)

}
histProto.Events = append(histProto.Events, event)
}
// Do proto serialization here that would never do shorthand payloads
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "shorthand payloads" mean collapsing long payload strings?

Suggested change
// Do proto serialization here that would never do shorthand payloads
// Do proto serialization here that won't shorten payloads

Copy link
Member Author

@cretz cretz Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically "lift" payload JSON out of the "data" to be pure JSON. Described in detail at https://github.com/temporalio/proposals/blob/master/api/http-api.md#payload-formatting. I can clarify in comment though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

false,
enums.HISTORY_EVENT_FILTER_TYPE_CLOSE_EVENT,
)
// TODO(cretz): Ok to error here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@@ -0,0 +1,785 @@
// Code generated. DO NOT EDIT.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check this in? Can it not be autogenerated as part of the Go build?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you cannot auto-generate anything in Go build really (can't auto-run any code on build). It's the Go way to do it this way (proto, mocks, this type of thing, etc). This is great for code readers in GH of course.


type Colorer func(string, ...interface{}) string

type Printer struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this might make more sense as an interface with JSON and Text implementations? A lot of the functions look to just be checking the JSON flag and returning early on a different path based on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how I originally had it, but the abstraction was leaky and there'd only be two implementations. And callers often have to know whether they're using JSON or not anyways. So there was no value in extra layer of indirection, so it being right on the struct was fine (there is also a bit of shared code).

@cretz
Copy link
Member Author

cretz commented Jan 19, 2024

I might've done the inverse with the command definitions here - generating the docs from the code (with docstrings) rather than the other way around

Markdown is a clearer place to write multiline English than Go I think and usually you wouldn't parse Go to generate Go code because it can get to a chicken-egg issue on compilation errors (but that could be worked around).

One thing I noticed while running in this branch: the UI isn't properly parsing some stuff. Event names for one. I wonder if there's some mismatch with proto versions?

Yes, the UI is a custom commit by me to just hack it in. I can't really merge this until UI server gogo stuff is fixed (I think temporalio/ui#1775 is doing this)

IMO if this is changing by default (I think it should - it's awful right now in JSON mode) then we should make it even friendlier and look fairly standard like

Yes, I chose to use the Go text default w/ a slight timestamp change, but totally support improving the test log handler

Copy link
Collaborator

@josh-berry josh-berry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review; I have not read any of the test code, nor have I read the command generator or much of the internal util/tooling code. I did do some basic manual testing and reported things that seemed weird or broken in my review comments.

CONTRIBUTING.md Show resolved Hide resolved
temporalcli/commands.gen.go Show resolved Hide resolved
var props []prop
// User can ask for single flag or all in env
if len(keyPieces) > 1 {
props = []prop{{Property: keyPieces[1], Value: env[keyPieces[1]]}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if out of scope for this branch, but i think it would be nicer if we just print the value here. This will be more script-friendly than printing something structured (modulo JSON output of course, where we should print just the value as JSON).

Copy link
Member Author

@cretz cretz Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit out of scope since this was meant to match what happens today (assuming it does happen today). I think the original goal was to show the same formatted output whether you wanted a single value or multiple.

temporalcli/internal/printer/printer.go Show resolved Hide resolved
temporalcli/commands.env.go Show resolved Hide resolved
temporalcli/commands.go Outdated Show resolved Hide resolved
temporalcli/commands.go Outdated Show resolved Hide resolved
temporalcli/commands.taskqueue.go Show resolved Hide resolved
temporalcli/commands.workflow_exec.go Show resolved Hide resolved
} else if ref.IsValid() && ((ref.Kind() == reflect.Struct && ref.CanInterface()) || ref.Type().Implements(jsonMarshalerType)) {
b, err := p.jsonVal(v, "", true)
if err != nil {
return fmt.Sprintf("<failed converting to string: %v>", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason I haven't fully-debugged, this error fires with any workflow result that includes a nested JSON object. For example, changing this line of the hello-world TypeScript sample to this:

  return {greeting: {}};

Causes the command ./temporal workflow execute -w foo --type example -t hello-world -i '"Josh"' to emit:

...
Results:
  RunTime  60ms
  Status   COMPLETED
  Result   <failed converting to string: json: error calling MarshalJSON for type json.RawMessage: invalid character '}' looking for beginning of value>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigating...

Copy link
Member Author

@cretz cretz Jan 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be an issue with the payload shorthand in our Go API library. I have opened temporalio/api#349 and notified @tdeebswihart.

WorkflowExecutionTimeout: w.ExecutionTimeout,
WorkflowTaskTimeout: w.TaskTimeout,
CronSchedule: w.Cron,
WorkflowExecutionErrorWhenAlreadyStarted: !w.AllowExisting,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm this is the default behaviour here is the existing behaviour of the CLI?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not. From "Known Incompatibilities" above:

workflow start and workflow execute no longer succeeds by default if the workflow exists already, but --allow-existing exists

Today you can't even error on existing I don't think. There was some discussion here on whether to do this or make the the fail-on-existing behavior opt-in and we accepted the current behavior. But I am totally willing to flip that decision if compatibility here can cause issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I missed that note. I may have missed that discussion. I would keep the behaviour the same as the old CLI and add an option to not allow existing instead:

  • It matches our most popular SDKs behaviour
  • Makes migrating to the new CLI easier

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Will bring it up internally for discussion, but I am kinda leaning your direction too. If we never had a previous CLI, I would hard-lean towards start failing if...well...start fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@josh-berry josh-berry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have read the remaining code (including skimming the tests) and I don't think there's anything new, really, that hasn't already been covered offline.

Comment on lines +10 to +14
// TODO(cretz): To test:
// * Env var actually sets CLI arg
// * Get single and all for env
// * Delete single and all for env
// * List envs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's either actually test these as part of this PR, or open Jira tickets to capture the fact that we have some testing debt we need to pay down. (Same thing goes for the TODOs in the other tests.)

Copy link
Member Author

@cretz cretz Jan 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Would like to avoid as part of this PR due to the already-existing size/scope and that it's blocking any other people from working on it, but can definitely open issues (unsure if they are JIRA or GH) for filling out the tests on these TODOs once we merge here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine; I have no opinion on whether Jira or GH is best. Jira's probably fine since it's more internal noise the community is unlikely to want to participate in, but there's no reason they can't be in GH if preferred.

temporalcli/internal/printer/printer.go Show resolved Hide resolved
Copy link
Collaborator

@josh-berry josh-berry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; let's make sure any outstanding issues get filed as tickets somewhere so we don't lose them after merge.

return
}
}
h.Fail("Pieces not found on any line together")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return true with the pieces in any order, but I don't think that's what we want. I think we should consider using a ContainsLineMatchingRegex utility for many of the assertions.

Copy link
Member Author

@cretz cretz Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with any order as I'm trying for very loose assertion here of stdout (otherwise it becomes brittle). But I expect as commands and tests are developed, more and more assertion helpers are written. There is already https://pkg.go.dev/github.com/stretchr/testify/require#Assertions.Regexp, so s.Regexp could be used to test if things are on the same line in a more specific way if the author wants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with any order as I'm trying for very loose assertion here of stdout

Can you give an example of when it would be appropriate to not care about the order of the substrings in the output? If there aren't examples of that then please change to requiring order. It's very unusual to make substring assertions but not care about the order of the substrings.

Copy link
Member Author

@cretz cretz Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote these tests I didn't care about order, so the examples are there. I also didn't care about spacing between them, what their indentation was, what the line above/below them is, etc. It can be said it's unusual to make assertions about data on a line without caring about the spacing or indentation too.

However, I don't see a harm in requiring sequence order in this case so I can add it on next PR. Though the function is more like "ContainsOnSameLineInOrder" instead of just contains anywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote these tests I didn't care about order, so the examples are there

Right, but I asked for examples of when it was appropriate to ignore order, and I would say that in those examples it wasn't :) IMO all these clearly should be asserting order:

	s.ContainsOnSameLine(out, "1", "WorkflowExecutionStarted")
	s.ContainsOnSameLine(out, "2", "WorkflowTaskScheduled")
	s.ContainsOnSameLine(out, "3", "WorkflowTaskStarted")
	s.ContainsOnSameLine(out, "Status", "COMPLETED")
	s.ContainsOnSameLine(out, "Result", `{"foo":"bar"}`)

I also didn't care about spacing between them, what their indentation was, what the line above/below them is, etc. It can be said it's unusual to make assertions about data on a line without caring about the spacing or indentation too.

Disagree. Order of words has vastly more impact on semantics than spacing and indentation.

// Confirm text has key/vals as expected
out := res.Stdout.String()
s.ContainsOnSameLine(out, "WorkflowId", "my-id1")
s.Contains(out, "RunId")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our uses of Contains and ContainsOnSameLine are all doing the same thing: they're asserting that there exists a line in which a sequence of strings occurs. The "sequence" might just be one string, and the strings may be separated by unspecified characters. So I think we should use the same assertion utility for sequences of length 1 and length > 1.

Copy link
Member Author

@cretz cretz Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do the same thing just like s.True(strings.Contains(out, "RunId")) but the invocation signals intent. ContainsOnSameLine is our helper, https://pkg.go.dev/github.com/stretchr/testify/require#Assertions.Contains already exists. We can't really stop people from using the existing one if it makes sense to them nor should we.

@cretz
Copy link
Member Author

cretz commented Jan 30, 2024

Merging with existing approvals. Opened 8 issues for future work in addition to the known missing commands. People can now work against the cli-rewrite branch.

@cretz cretz merged commit fb8f023 into cli-rewrite Jan 30, 2024
5 checks passed
@cretz cretz deleted the cli-rewrite-first branch January 30, 2024 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants