Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement WebSub publisher + hub and subscriber #1

Closed
MaryamZi opened this issue Oct 14, 2019 · 17 comments
Closed

Implement WebSub publisher + hub and subscriber #1

MaryamZi opened this issue Oct 14, 2019 · 17 comments

Comments

@MaryamZi
Copy link
Collaborator

Creating this issue as an overview of the tasks to be completed and to track the points/properties to be decided upon, in relation to $subject.

Publisher + Hub

Publisher

  • Is a separate publisher required? (a service between the upstream publisher and the hub), or can the upstream publisher directly publish to the hub?
  • Should discovery be supported - requires a resource URL which responds with the hub and topic URLs on GET/HEAD - or can it be assumed that whoever interested in the events would know the hub and topic URLs?

Hub

  • Authentication/authorization
  • Persistence of topics and subscribers - based on MySQL
  • Client configuration - the configuration for the client delivering content.
public type ClientConfiguration record {|
   string httpVersion = HTTP_1_1;
   ClientHttp1Settings http1Settings = {};
   ClientHttp2Settings http2Settings = {};
   int timeoutInMillis = 60000;
   string forwarded = "disable";
   FollowRedirects? followRedirects = ();
   PoolConfiguration? poolConfig = ();
   ClientSecureSocket? secureSocket = ();
   CacheConfig cache = {};
   Compression compression = COMPRESSION_AUTO;
   OutboundAuthConfig? auth = ();
   CircuitBreakerConfig? circuitBreaker = ();
   RetryConfig? retryConfig = ();
|};
  • The hub needs to know its publicly accessible URL - used in content delivery requests
  • Lease period - a subscription is removed once the lease period expires - 2 days?

Subscriber

  • Do we need a custom subscriber service or should we go ahead with the generic subscriber service? With the custom subscriber service - the subscriber would get a record parsed from the json payload.
  • Will the subscriber send the subscription request (on startup) or will someone else initiate the subscription process? Either way a publicly accessible URL needs to be known to include in the subscription request.
@sanjiva
Copy link
Contributor

sanjiva commented Oct 20, 2019

@sanjiva
Copy link
Contributor

sanjiva commented Oct 20, 2019

Answers to your questions:

Publisher

  • No, a separate service is not needed - we can have the results publisher publish directly to the hub.
  • Discovery is not needed.

Hub

  • Authn/authz: publisher should ideally have MTLS
  • subscribers will be issued credentials - lets discuss more precisely

Subscriber

  • Generic data is fine - its a fairly simple JSON
  • Subscriptions should ideally be managed as a set up process (in the system) and not by the subscribers themselves

@sanjiva
Copy link
Contributor

sanjiva commented Oct 20, 2019

Actually I'm not sure whether my answer for the separate publisher is right.

In addition to delivering results to subscribed media orgs, we need to do the following:

  • send an SMS to subscribed media orgs contacts saying "result for xyz is coming"
  • push the results files to the ftp site
  • update the results data archive website with the new result (this site is for media people to see all the released results in raw data form)

I guess we can make all of these also be subscribers of the websub engine but that seems rather overkill?

If we're not doing that, then we need to have a separate service for the results publisher to deliver to and then have that do the above items PLUS publish to the websub hub.

Thoughts?

@MaryamZi
Copy link
Collaborator Author

Yeah, +1 for a separate service. I don't think doing this via the WebSub hub would add value, given that it would just result in an additional topic with one subscriber, or three subscribers for not-so-independent tasks. We can have this intermediary service start up the hub, and it could have direct access to the hub.

Follow up questions/notes regarding the subscriptions:

    • Subscriptions should ideally be managed as a set up process (in the system) and not by the subscribers themselves

While this could be done while setting up, the subscriber would have to respond to an intent verification request sent by the hub (the request is sent to the callback URL specified in the subscription request). The request would have a query param hub.challenge (among other params such as the topic), which has to be echoed back by the subscriber to complete the subscription.

Can we expect the subscribers to echo the challenge? The ballerina/websub module does not allow turning off this intent verification phase atm either.

  • Also, if the subscribers are using a secret for authenticated content distribution - this would also have to be specified when issuing credentials, since the system would have to know it when setting up the subscription while the subscriber would have to know it to validate the content received.

Authenticated content distribution - If the subscription request specified a secret, the hub sends a "X-Hub-Signature" header calculating a HMAC value for the content with the secret specified as the key, the subscriber has to validate the header for each content delivery.

@sanjiva
Copy link
Contributor

sanjiva commented Oct 20, 2019

Yes we can ask them to respond to the intent verification.

However, we have to dumb this down totally for them ... maybe provide them a server they can run. Basically we can write a (compiled) Ballerina client they run that just produces the files on the disk. They just run "java -jar mediaclient.jar" and then type in the secret and the files simply show up in that directory.

That way they don't really need to understand anything - we say we will deliver the files to them directly.

What do you think?

Can we um get this working by Wednesday ;-). Straightforward really ..

@MaryamZi
Copy link
Collaborator Author

Yeah, we should be able to.

So basically, we write the WebSub subscriber service for them, build it and give them the JAR?

As input we would require from them

  • the port
  • the path (base path) for the service
  • the topic - assuming we can go ahead with one subscriber service subscribing for one topic per JAR
  • the secret
  • callback URL - publicly accessible

I was thinking a service like

import ballerina/config;
import ballerina/websub;

@websub:SubscriberServiceConfig {
    path: getAsStringOrPanic("subscriber.path"),
    subscribeOnStartUp: true,
    target: [getAsStringOrPanic("subscriber.hub"), getTopic()],
    leaseSeconds: 172800,
    secret: getAsStringOrPanic("subscriber.secret"),
    callback: getAsStringOrPanic("subscriber.url")
}
service subscriberService on new websub:Listener(getAsIntOrPanic("subscriber.port")) {
   resource function onNotification (websub:Notification notification) {
       // Intro logic to write to files.
   }
}

function getAsIntOrPanic(string key) returns int {
    int value = config:getAsInt(key);
    
    if (value == 0) {
        panic error("Error", message = key + " not specified or 0");
    }
    return value;
}

function getAsStringOrPanic(string key) returns string {
    string value = config:getAsString(key);
    
    if (value.trim() == "") {
        panic error("Error", message = key + " not specified or empty");
    }
    return value;
}

function getTopic() returns string {
    string topic = getAsStringOrPanic("subscriber.topic");

    match topic {
        "https://github.com/ECLK/Results-Dist-json"|
        "https://github.com/ECLK/Results-Dist-xml"|
        "https://github.com/ECLK/Results-Dist-text" => {
            return topic;
        }
        _ => {
            panic error("Error", message = "invalid topic specified: " + topic);
        }
    }
}

They would have to specify a ballerina.conf file similar to

[subscriber]
hub="http://localhost:9090/websub/hub" # what we advertise as the hub
port=8181
path="/subscriber"
topic="https://github.com/ECLK/Results-Dist-json"
secret="qweKLS"
url="http://localhost:8181/subscriber"

Also, regarding writing to the files,

They just run "java -jar mediaclient.jar" and then type in the secret and the files simply show up in that directory.

So a new file in the current directory (or we could make this configurable) per update (result)? Probably with a name that is a combination of the timestamp and something random?

@chamil321
Copy link
Collaborator

Yeah, we should be able to.

So basically, we write the WebSub subscriber service for them, build it and give them the JAR?

As input we would require from them

  • the port
  • the path (base path) for the service
  • the topic - assuming we can go ahead with one subscriber service subscribing for one topic per JAR
  • the secret
  • callback URL - publicly accessible

Perhaps we can include 4 subscribers for predefined 4 topics(JSON, XML, Text, Image), build it and give them a JAR. So media companies only have to work with a single JAR and a ballerina.conf file rather than letting them to work with multiple jars for each subscriber. Depending on their requirement media teams can list down subscribers along with the topics in the conf file.

When they lists down the topics that they need to subscribe in the conf file, our program can only start the required subscriber services. Apparently, they would have to specify a ballerina.conf file similar following it they need to subscribe for JSON and XML payloads.(We can provide a sample too)

[subscriber.json]
hub="http://localhost:9090/websub/hub" # what we advertise as the hub
port=8181
path="/subscriberJson"
topic="https://github.com/ECLK/Results-Dist-json"
secret="qweKLS"
url="http://localhost:8181/subscriberJson"

[subscriber.xml]
hub="http://localhost:9090/websub/hub" # what we advertise as the hub
port=8181
path="/subscriberXml"
topic="https://github.com/ECLK/Results-Dist-xml"
secret="qweKLW"
url="http://localhost:8181/subscriberXml"

@sanjiva
Copy link
Contributor

sanjiva commented Oct 21, 2019

No I'm going even MORE simple: we give them a program that gets them the results in ALL the formats as well as the signed document image. If they want to go fancy they can take this source code and do whatever they want.

Lets keep it ultra simple.

In that case the only info they need to give is their secret, the callback URL domain name/port (to get back to this service), and an optional path to store files into (otherwise we'll use CWD).

@MaryamZi
Copy link
Collaborator Author

We could also write the code late bind-ing the services depending on what the user wants.

We could enable all by default, but give the option to the user to opt out of receiving content in certain formats.

The information required would still be pretty much the same.

Lets keep it ultra simple.

In that case the only info they need to give is their secret, the callback URL domain name/port (to get back to this service), and an optional path to store files into (otherwise we'll use CWD).

The only addition would be to disable receiving content of a particular type.
e.g.,

[subscriber]
text=false

Suggesting this because sending updates to all subscribers in all formats would mean 50 x 4 subscriptions minimum.

This would also result in 4 files being created per result update to a single subscriber?

I'm OK with either approach though.

import ballerina/config;
import ballerina/websub;

websub:Listener websubListener = new(8181);

service jsonSubscriber = 
@websub:SubscriberServiceConfig {
    path: "/json",
    subscribeOnStartUp: true,
    target: ["http://localhost:9090/websub/hub", "https://github.com/ECLK/Results-Dist-json"],
    leaseSeconds: 172800,
    callback: "<JSON_CALLBACK>"
}
service {
    resource function onNotification(websub:Notification notification) {
    }
};

service xmlSubscriber = 
@websub:SubscriberServiceConfig {
    path: "/xml",
    subscribeOnStartUp: true,
    target: ["http://localhost:9090/websub/hub", "https://github.com/ECLK/Results-Dist-xml"],
    leaseSeconds: 172800,
    callback: "<XML_CALLBACK>"
}
service {
    resource function onNotification(websub:Notification notification) {
    }
};

service textSubscriber = 
@websub:SubscriberServiceConfig {
    path: "/text",
    subscribeOnStartUp: true,
    target: ["http://localhost:9090/websub/hub", "https://github.com/ECLK/Results-Dist-text"],
    leaseSeconds: 172800,
    callback: "<TEXT_CALLBACK>"
}
service {
    resource function onNotification(websub:Notification notification) {
    }
};

service imageSubscriber = 
@websub:SubscriberServiceConfig {
    path: "/image",
    subscribeOnStartUp: true,
    target: ["http://localhost:9090/websub/hub", "https://github.com/ECLK/Results-Dist-image"],
    leaseSeconds: 172800,
    callback: "<IMAGE_CALLBACK>"
}
service {
    resource function onNotification(websub:Notification notification) {
    }
};

public function main() {
    if (config:getAsBoolean("subscriber.json", true)) {
        checkpanic websubListener.__attach(jsonSubscriber);
    }

    if (config:getAsBoolean("subscriber.xml", true)) {
        checkpanic websubListener.__attach(xmlSubscriber);
    }

    if (config:getAsBoolean("subscriber.text", true)) {
        checkpanic websubListener.__attach(textSubscriber);
    }

    if (config:getAsBoolean("subscriber.image", true)) {
        checkpanic websubListener.__attach(imageSubscriber);
    }

    checkpanic websubListener.__start();
}

@sanjiva
Copy link
Contributor

sanjiva commented Oct 21, 2019

How about the following compromise:

  • we add some command line flags to select the formats: --all means all formats, --json, --text, --xml
  • require one of thees options
  • remove config file

I don't like the conf file because someone will mess it up.

@MaryamZi
Copy link
Collaborator Author

+1

Does this mean we wouldn't be using a conf file even for the other configs?

If so, do we expect the user to pass the values as arguments (to the main function)?

java -jar mediaclient.jar --all -secret=xxxx -port=8080

Or should we prompt the user to enter the values when they run the JAR?

e.g.,

$ java -jar mediaclient.jar
Enter secret (random string with no spaces): 
xxxx

Enter port:
8080

Enter required format (1-4):
1 - json
2 - text
3 - xml
4 - all
1

@sanjiva
Copy link
Contributor

sanjiva commented Oct 21, 2019

Prompting is hard to automate. So lets use options but use good defaults so they're minimal.

How about:

  • default to -json
  • default to port 8080

Secret of course its safer to take from stdin (command line args are not safe) but lets ignore that for now and use a required argument:

java -jar mediaclient.jar MY-SECRET

is the command to run with all defaults.

@MaryamZi
Copy link
Collaborator Author

The updated subscriber source is now available at https://github.com/ECLK/Results-Dist/blob/master/src/subscriber/subscriber.bal.

The main function has the following signature:

public function main(string secret, string content = "json", string domain = "localhost", int port = 8080,
                     string? keystorePath = (), string keystorePassword = "") {
}

Few more follow up questions :)

  • Are the images/PDFs sent to all subscribers irrespective of what content type they specify? So specifying json implies a subscription for json and a subscription for image/PDF? Similarly for all we have four subscriptions (3 for json, xml, text and one for image)?

  • Where does the results-distribution-service get the image/PDF from?

  • The files are saved with a timestamp + random identifier atm (https://github.com/ECLK/Results-Dist/blob/master/src/subscriber/subscriber.bal#L172). Is this OK or should the file name reflect something from the result (e.g.,division)?

@sanjiva
Copy link
Contributor

sanjiva commented Oct 23, 2019

Shouldn't the signature be:

public function main(string secret, boolean jsonData = false, boolean xmlData = false, boolean textData = false, int port = 8080) {
}

That is, by default we assume you want json/xml/text all 3 and you indicate that by saying nothing. If you want one (or more specifically) you have to say -jsonData=true (for example).

So the test in main() is to see whether all 3 are false and if so take that as user wants all. If any of them is true, then subscribe only to those.

Please correct if that logic is flawed!

(Note that I removed the keystore parts per the comment I put in the gdoc.)

Ref follow up questions:

  • image/PDF is always sent as that's the official released result; data is not normative. Yes, one image (basically its the letter).
  • image/PDF is given to the service by the results tabulation system along with the data (which comes in json format)
  • ref file names - there's a standard code for every division .. we'll just name the file by the division code. However, there is a possibility that a result is re-released (I think; will check) - so we need to support that too.

@sanjiva
Copy link
Contributor

sanjiva commented Oct 23, 2019

Um I had forgotten that I suggested defaulting to json as the data format! So how about this:

public function main(string secret, boolean jsonData = true, boolean xmlData = false, boolean textData = false, int port = 8080) {
}

And we just subscribe to all the topics they have selected?

So if I want just json I have to do nothing. If I want others xml only I need to turn off json and turn on xml.

If that's complicated lets just have all of them be false and force them to pick the formats. Maybe that's even easier!

@sanjiva
Copy link
Contributor

sanjiva commented Oct 25, 2019

FYI I'm changing the design as follows:

  • image will not be pushed to subscribers, but will be available on the subscriber-only results website
  • there will be only one topic, which sends the data in JSON
  • the subscriber app will locally convert to text, xml and allow users to store in any format - so the ux remains the same if people use the subscriber app

@sanjiva
Copy link
Contributor

sanjiva commented Oct 28, 2019

Closing this issue as we've got new issues for particular items.

@sanjiva sanjiva closed this as completed Oct 28, 2019
sanjiva added a commit that referenced this issue Nov 7, 2019
chamil321 pushed a commit that referenced this issue Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants