Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better documentation of module relations, data requirements, output formats #19

Open
beepsoft opened this issue Jan 21, 2021 · 8 comments

Comments

@beepsoft
Copy link

emtsv is a really great tool, thanks for your work!

I'm all new to NLP so maybe that's the reason for all my problems, but only reading the documentation it is rather difficult to work effectively with emtsv

One main thing I miss from the documentation is what each module's input and output is:

https://github.com/dlt-rilmta/emtsv#modules

For example, if I want to use the chunk module I don't know what data it needs so that it can run.

Starting naively like this:

echo "Ez jó lenne, ha működne!" | python3 main.py chunk

... I get this error:

xtsv.pipeline.ModuleError: ERROR: 'Tagger' module requires {'form', 'xpostag'} fields but the previous module 'Input Text' has only {'Ez jó lenne, ha működne.'} fields!

That's fine, but which module will generate 'form', 'xpostag'? After some trial and errors I could figure out that I need tok,morph,pos,chunk, but this is a tedious way to find it out.

The topology description is somewhat helpful (https://github.com/dlt-rilmta/emtsv/blob/master/docs/emtsv_modules.pdf) but it uses the "package names" instead of the module names expected by emtsv. Eg. it contains emToken while in emtsv it needs to be referenced as tok.

It would also be great to know what each column in the result actually means and how these columns should be interpreted. This is also something really difficult to find out even after reading a lot of publication related to emtsv and e-magyar.

So, a nice documentation structure for someone just getting started with emtsv would be something like this:

  1. Description of packages (emToken, emMorph, etc)
  2. What modules each package provides for emtsv (tok, morph, etc)
  3. Required input and output of each module + modules providing those inputs
  4. Description of the output formats (form, anas, xpostag, etc.)

1-2. is already available, 3. and 4. is what I am missing.

@sassbalint
Copy link
Member

@dlazesz Balázs, I guess .fig format may be thrown out as only few :) people is eager to work with it.
Have you any idea about a convenient format which is suitable for shared work?

@beepsoft
Copy link
Author

@sassbalint thanks for picking up this issue!

@dlazesz Balázs, I guess .fig format may be thrown out as only few :) people is eager to work with it.
Have you any idea about a convenient format which is suitable for shared work?

You mean for replacing emtsv_modules.pdf or what would this .fig would be used for? Unfortunately I have no idea about this.

@dlazesz
Copy link
Collaborator

dlazesz commented Jan 25, 2021

For the record. The FIG is meant to be edited and then converted to the PDF. Bálint (@sassbalint) used to maintain the FIG.

As both Bálint and I have been left the project. I proposed that Noémi (@vadno) could do a one-time rewrite in Tikz to enable it for others to edit it more conveniently in the future as new modules emerge. I do not want to speak on her behalf.

I have no other ideas how it would be easier for everybody to maintain the figure or who would actually do it in the first place. All ideas, suggestions and applications for maintaining are welcome!

@beepsoft You could send PRs on the documentation (or any part of the project) if you have any ideas how to improve it.

@sassbalint
Copy link
Member

@dlazesz Balázs, could you draw (by hand!) a figure on the current state of the system?
If yes, we could talk about it on zoom and then I will create a new version (in .fig...).

@beepsoft .fig is to be edited by xfig which is an old but very good quality piece of software, I think.

@vadno
Copy link
Collaborator

vadno commented Jan 26, 2021

As @dlazesz mentioned, I'll draw a tikz version of the figure.
@sassbalint, xfig is great, but for me tikzpicture is a bit easier to use.
I try to do it asap... OK?

@sassbalint
Copy link
Member

As @dlazesz mentioned, I'll draw a tikz version of the figure.
xfig is great, but for me tikzpicture is a bit easier to use.
I try to do it asap... OK?

Thank you, @vadno Noémi. :)

While, as Balázs put it, "one-time rewrite in Tikz to enable it for others to edit it more conveniently in the future" sounds good, I guess that there is a chance that by creating the Tikz version you just take over this task for a long time, in practice. Are you OK with this? :)

@vadno
Copy link
Collaborator

vadno commented Jan 26, 2021

@sassbalint No, I'm not OK with this :)
I try to write it as clear as possible, hoping that later others can extend it without my help. But of course I help if needed ;)

@dlazesz
Copy link
Collaborator

dlazesz commented Jun 16, 2021

UPDATE:
Thanks to @vadno the new module figure design has been commited: https://github.com/nytud/emtsv/blob/master/docs/emtsv_modules.pdf

Hope it can handle better the growing number of modules.
We plan to restructure and maybe split the figure as more input-output modules are planned in the near future.

I keep this issue open as the current update does not solve the OP just tries to ease the situation. More documentation is on its way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants