Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the behavior for multiple modules in a file? #13977

Open
BryantLam opened this issue Sep 4, 2019 · 39 comments
Open

Change the behavior for multiple modules in a file? #13977

BryantLam opened this issue Sep 4, 2019 · 39 comments

Comments

@BryantLam
Copy link

BryantLam commented Sep 4, 2019

Impacts #13948

Forked from #13524 (comment) — description in first list, 1i; or the text below copied from end of that post:


There is one assumption in some of my counter arguments: files are modules. The overall rationale is sound because it is true in most situations, but there are some behaviors today that make it not true.

These current behaviors would have to be re-evaluated or changed. (This is for this issue.)


I copied the examples from the linked posts for further discussion.

Example 1

M.chpl:

module P {  // Whoops, M.chpl doesn't define a module named M!
 writeln("In P");
}

Example 2

From @bradcray:

I hate to say it, but it sounds like you still don't have this rule right. :) The compiler only inserts an implicit module in the case that a file contains module-level statements (e.g., any statements other than module declarations and comments) at file scope. Such statements have to belong to some module, so the compiler treats the file as their module (which also supports conveniences like being able to write an entire program as file-scope code as in scripting languages). If the file only contains module declarations and comments at file scope, no implicit module is inserted. So code like your example above:

M.chpl

module L { }
module N { }

results in a module namespace hierarchy like this:

./  # global scope
  L/
  N/

whereas if you were to write:

M.chpl

writeln("Hi");
module L { }
module N { }

then you'd get the module hierarchy:

./  # global scope
  M/
    L/
    N/

Example 3a

M.chpl:

module M { }
module Q { }

(Omitting 3b because it's the same as 1, and 3c because it's the same as 3a.)


I believe the current behaviors for these examples should change. How these examples should change is an open question. With the addition of #13524, it will get more confusing when the compiler starts enforcing a module hierarchy; however, independent of that issue, the current behavior is already confusing to me.

Notably, Example 2 stands out to me because the module hierarchy is allowed to subtly and significantly change with inclusion of module-level statements anywhere in that file. I personally don't like behaviors like that because my hierarchy changes if I add/remove what feels like code that is independent of module declarations. Also, the explanation provided in Example 2 is just non-intuitive if the scope increases beyond this toy example into an application that has multiple dependencies.

Because such code in Example 2 already creates implicit modules, a satisfactory resolution would be to always create implicit modules (i.e., files-as-modules), which immediately answers what Example 1 and Example 3a should do.

It comes back around to whether that behavior could be seen as intuitive or not for Examples 1 and 3a. It would be reasonable for these to be errors if we take my suggestion (from #13524 (comment)) of simply enforcing the one, true module declaration in a file.

The one, true module declaration isn't ideal because:

  1. it's requiring redundancy in specifying the module declaration and the filename which are the same
    • it's also minorly annoying that renamed files also have to be sed'd to fix their module identifier or the compiler won't allow it (if Example 1 is turned into a compile-time error)
  2. What do we do about simple script-like codes? Do we allow the implicit module for these? Why make it a special case at all if the rationale isn't strong enough; the implicit modules could be the default (and the special case would be the one, true module declaration). There are a ton of codes like this in the Chapel test suite.
  3. minor: it increase the level of indentation
    • can be resolved with a style enforcer or style guideline that says the top-level module is special and to not indent inside it similar to C++ top-level namespaces.

But there is one benefit: it makes it obvious to the reader that there is a module declaration that is this file without knowing the language. It does not benefit users that know this rule, but it would be a helpful guide for users unfamiliar with Chapel.

Of course, the con is that it's another special case, and I dislike special cases. I would prefer files-are-implicit-modules as the solution, possibly with some helpful style warnings for Example 3a because that's the case that is most likely to be unintended among the three examples. (3a would be an implicit module M that defines submodules named M and Q.)

@BryantLam
Copy link
Author

There's also a side concern with error handling and how implicit-modules-are-prototype-modules today.

@BryantLam
Copy link
Author

Also, what to do about privacy specifiers related to #13524 for the files-are-implicit-modules approach..

@mppf
Copy link
Member

mppf commented Sep 4, 2019

Note that one common situation with Example 1 (at least in the test system) is with e.g. some-test.chpl. Here some-test is not a legal identifier name in Chapel, so even if the user wanted to explicitly write the module name, they'd have to either rename the file or create a module with a different name. I find myself creating an inner module with a different name in some cases for reasons to do with the difference in behavior between implicit/prototype modules and other modules.

One way to handle it would be to allow module { }, where the omitted module name would be inferred from the file name.

I'm not sure what the right direction here is.

@BryantLam
Copy link
Author

BryantLam commented Sep 4, 2019

Another way would be to replace hyphens with underscores: some-test.chpl -> module some_test. It would create a conflict with another some_test.chpl, but the compiler can check for that case. The former is more common than the conflicting case.

It also fits more in-line with common GitHub project names since people tend to prefer hyphens.

@lydia-duncan
Copy link
Member

Notably, Example 2 stands out to me because the module hierarchy is allowed to subtly and significantly change with inclusion of module-level statements anywhere in that file. I personally don't like behaviors like that because my hierarchy changes if I add/remove what feels like code that is independent of module declarations. Also, the explanation provided in Example 2 is just non-intuitive if the scope increases beyond this toy example into an application that has multiple dependencies.

I think it's fair to find the behavior change confusing when a single module scope statement is added. I don't know that it means the current behavior is wrong, though. There's always going to be a jump - would it be better to have:

M.chpl

module M {
  var x: int;
  ...
}

mean you have to use M.M; in order to get at x when there is literally nothing else in the file or to have:

M.chpl

module M { ... }
module N { ... }

suddenly cause an outer module M to appear? Or for

M.chpl

module P {
  var x: int;
}

to mean there is no module M at all, just a module P?

And sure, having the declared module at the top level of the file differ from the file name means you have to look in the file to determine what to use, but that seems better than having the behavior of this situation differ depending on if you're on a filesystem that is case insensitive:

M.chpl

// lower case module name, upper case filename
// Requires `use M.m;`?  Or `use m.m;`?  Or sometimes `use M.m;` and sometimes `use m;`?
// Or just always `use m;`?
module m {
  var x: int;
}

@lydia-duncan
Copy link
Member

Michael rightly pointed out that the first single module case I added was a bit confusing. To add some additional context - I'm following a bit of a slippery slope argument to show that there isn't a single answer that will make things consistent. If the presence of two sibling modules at the top level necessitates the inclusion of a file-level wrapper module, why shouldn't the same be true of a single module? I don't think this is something we want for the reasons I've stated, but it's a case where the behavior changes and the motivation for it to do so seems reasonable.

Is this a case of "1 module -> 2 modules, and therefore N modules -> N + 1 modules"? Not necessarily, but the rule is more strongly connected than "1 module -> 1 module and a different statement entirely"

@mppf
Copy link
Member

mppf commented Sep 19, 2019

I think we should just make the tricky cases errors:

M.chpl storing module m { } ? Error
M.chpl storing module P { } ? Error
M.chpl storing module M { } writeln("Hi"); Error
M.chpl storing module M { } module N { } Error

The only way for it not to be an error is if:

  • M.chpl contains no top-level module declarations, or
  • M.chpl contains one outer top-level module declaration, the name matches, and there is nothing outside of it (other than comments).

@bradcray
Copy link
Member

I'm not at all on-board with the direction the proposals in this issue are taking. I don't think that the status quo is as bad or broken as suggested here, and I think there are things we could do (some simple, some only slightly less so) to make it better. Jumping directly to Michael's proposal just above, I find it problematic for a few reasons (many of which apply to the more general thread of conversation taking place here):

  • For tools like TIO in which the user has no control over the filename, it would only support top-level code (or a single explicit module whose name matches the tool's filename). If you ran into a bug that involved two modules (or simply wanted to experiment with modules), you would not be able to illustrate that bug to the team in a TIO session.
  • It complicates the ability to copy a source file, edit it, and compile it. This is a very common pattern in my workflow where I take a file someone's written, rename it foo-blc.chpl or snapshot it as foo-works.chpl, make simple edits, and recompile. It feels onerous to say that I'd have to also rename the module in that file to match the filename.
  • It breaks the ability to use filenames like foo-1.0.chpl, foo-1.1.chpl, foo-2.0.chpl to reference multiple versions of a file.
  • Philosophically, I don't believe that languages should be so rigid with respect to how they consume code or expect it to be organized on the file system. In my opinion, it should be OK for you to enter your code on a telegraph that's connected to a compiler without any assumptions that you have stored the code in a file with a specific name on a file system that meets certain requirements.

I would be open to adding an opt-in flag to the compiler that would warn or error when it encountered code that didn't follow a convention like this to ease the ability of a user or community who liked this rule to enforce it in their code bases. But I don't think it should be the default behavior of the compiler, nor how the language is defined.

@mppf
Copy link
Member

mppf commented Oct 4, 2019

@bradcray - I'm somewhat convinced that just making all of the challenging cases errors here isn't the right move. I copied a Chapel program to file-1.20.chpl recently... (but I think such things occur more with implicit file-level modules than with libraries proper).

But I think there are other proposals in line with this issue. To take one example, @BryantLam was proposing above that

// M.chpl
module L { }
module N { }

creates M.L and M.N. This would match better what some of us expected (e.g. me, vass ).

Right now, the compiler adds an implicit module if there are any top-level statements other than module declarations and comments.

The change would be to make it add an implicit module if there are any top-level statements other than comments and a single module declaration.

How do you feel about that change?

(I am intentionally leaving out the issue of what happens if the top level module name does not match; let's discuss the more first adjustment first).

@BryantLam
Copy link
Author

BryantLam commented Oct 4, 2019

@bradcray -- I don't claim to have the right answers for any of the cases that you're concerned about. I only have an answer that would work. I'm concerned with how to interpret Example 1 for design #13524 -- How does the compiler know to look in M.chpl for module P without parsing all files?

@bradcray
Copy link
Member

bradcray commented Oct 4, 2019

The change would be to make it add an implicit module if there are any top-level statements other than comments and a single module declaration.

How do you feel about that change?

Negatively. I think having the compiler add an implicit module declaration should be the exceptional case that is supported only for convenience when the user doesn't bother adding one, not the rule.

Top-level modules vs. sub-modules can result in distinct behaviors in certain cases. For example, if the two modules in your example were in different files and each declared a config ... debug I would use L.debug vs. N.debug to disambiguate them on the command-line. However, if they were in the same file, under your proposal, I would use M.L.debug vs. M.N.debug to disambiguate them. Similarly if the main module were ambiguous, the user would have to use --main-module L vs. --main-module M.L to disambiguate them based on the code organization. Or, if we adopted the proposal under consideration to require use statements to always give a full path from a top-level module, the uses for these two cases would need to change from use L, N; to use M.L, M.N;. I don't think source code should generally be this sensitive to how it's organized in files.

I don't particularly like this proposal (mostly because of the work it would involve), but just to throw it out there as part of the exploration: I'd much rather say that the implicit modules introduced by the compiler have no name that the user can refer to if it seems too confusing that sometimes M.chpl introduces a module named M and sometimes it doesn't. That would prevent anyone from ever relying on it having a certain name (and would also handle cases that we currently can't name such as filenames that don't have legal identifiers).

without parsing all files?

I'm unclear what the concern about parsing files is:

(a) it doesn't have to be a full parse, it could just run the scanner looking for module statements outside of comments and strings. This would be quite fast (not that parsing is currently particularly slow, especially when you squash the building of the AST);

(b) if that amount of time still gave people concerns, the result of the parse could be dropped into an index file in the directory that would only be incrementally updated when its timestamp lagged that of the source. For reasonably static library directories this would only require a scan when it was first installed (or maybe the installation could even carry the index files with it).

@BryantLam
Copy link
Author

BryantLam commented Oct 5, 2019

I'd much rather say that the implicit modules introduced by the compiler have no name that the user can refer to if it seems too confusing

I'm intrigued by this idea. How would you / could you call a function that was defined at module-level? Are the implicit modules still considered modules, or are their submodules inlined so that parent module can still access that submodule? I.e.,

// M.chpl
module M {
  import L;
  L.fn(); // [[1]]
}

// Submodule.chpl
module L { // Is L "inlined" into the anonymous implicit module?
  config var someOtherConfigVar;
  proc fn() { ... } // ... so that L.<symbol> is called at [[1]]?
}
config var someConfigVar; // Is this settable?
proc someFunc() { ... } // Is this callable?
writeln(...);

I'm unclear what the concern about parsing files is:

"Parse" was an incorrect choice of word. Python has bad filesystem behavior at scale because it searches everywhere for code on an import statement. I'd like good filesystem scaling for chpl to find code ideally in one or a few (MASON_REGISTRY) a priori locations if it needs to look for a (sub)module.

(b) Regarding the index file,

  • how does the compiler find module P in M.chpl? Scan through all files in a submodule directory?
    • how does the user find module P in M.chpl? Scan through all files in a submodule directory?
  • how would this work if a submodule directory had three versions of module M represented as M.chpl, M-1.0.chpl and M-fixes.chpl?

(I'm not saying the index file is a bad idea since a variant may be needed for incremental compilation.)

@mppf
Copy link
Member

mppf commented Oct 5, 2019

@bradcray - Another issue here is that I find the change in whether or not the implicit module is added, between

// M.chpl
module L { }
module N { }

and

// M.chpl
module L { }
module N { }
writeln("Hi");

confusing. What if we made the implicit module only appear if the file did not contain any (top-level) module statements at all? Then the first case still behaves as you are hoping, and the 2nd case would just become an error (which it probably is anyway).

I'd much rather say that the implicit modules introduced by the compiler have no name that the user can refer to if it seems too confusing that sometimes M.chpl introduces a module named M and sometimes it doesn't.

I'm not saying I'm for or against that but one of the sources of confusion is the difference in behavior between top-level modules and sub-modules (e.g. the example here that might still be confusing even if you couldn't refer to the implicit top-level module.

@bradcray
Copy link
Member

I'd much rather say that the implicit modules introduced by the compiler have no name that the user can refer to if it seems too confusing

I'm intrigued by this idea. How would you / could you call a function that was defined at module-level?

In my mental model, I don't think you would be able to call such a function from outside of that module. Essentially, the module's contents wouldn't be accessible outside of the module because the module could not be named.

Are the implicit modules still considered modules, or are their submodules inlined so that parent module can still access that submodule?

I'd still consider them modules, simply ones whose names can't be spoken.

Python has bad filesystem behavior at scale because it searches everywhere for code on an import statement. I'd like good filesystem scaling for chpl to find code ideally in one or a few (MASON_REGISTRY) a priori locations if it needs to look for a (sub)module.

I wonder if the situation might be worse for Python than for Chapel since, as an interpreted language, this searching goes on during execution time rather than at compile-time?

how does the compiler find module P in M.chpl? Scan through all files in a submodule directory?

I'm currently considering this issue independently of any submodule directory feature, so would say that it would consider all of the .chpl files along the module path.

how does the user find module P in M.chpl? Scan through all files in a submodule directory?

If a user didn't know what module use P referred to, I'd expect them to be able to ask the compiler to tell them where it found it (I'm imagining a variation on the --print-module-files flag that might describe both module name and file location).

how would this work if a submodule directory had three versions of module M represented as M.chpl, M-1.0.chpl and M-fixes.chpl?

I'd expect the compiler to complain about the ambiguity and to require the user to disambiguate either by specifying one of the files directly on the command-line, or by removing the others from the search path.

Again, though, I'm not a big fan of this proposal, just throwing it out as food for thought.

@bradcray
Copy link
Member

bradcray commented Oct 12, 2019

What if we made the implicit module only appear if the file did not contain any (top-level) module statements at all? Then the first case still behaves as you are hoping, and the 2nd case would just become an error (which it probably is anyway).

My preference would be to extend the current warning we have so that it fires whenever there is a mix of module declaration statements and any other statements at the top-level, to make the user aware of the introduction of the implicit module (and possible mistake on their part). To me that feels like the behavior that is the most consistent, the easiest to implement, and the smallest change from the status quo.

(e.g., could we generalize and reword the existing warning in such a way that someone who typed your example would not be confused? For example, "All Chapel code must be contained within a module, so an implicit module named M is being introduced to store the contents of file M.chpl due to the file-scope code found on line 3).

@BryantLam
Copy link
Author

BryantLam commented Oct 12, 2019

However, if they were in the same file, under your proposal, I would use M.L.debug vs. M.N.debug to disambiguate them. Similarly if the main module were ambiguous, the user would have to use --main-module L vs. --main-module M.L to disambiguate them based on the code organization. Or, if we adopted the proposal under consideration to require use statements to always give a full path from a top-level module, the uses for these two cases would need to change from use L, N; to use M.L, M.N;. I don't think source code should generally be this sensitive to how it's organized in files.

I don't see why that's a bad thing. If modules are the same as files, then the module name is apparent in the filename. It makes it easier for the user to find the source of a line of code if they know the filename it comes from.

I wonder if the situation might be worse for Python than for Chapel since, as an interpreted language, this searching goes on during execution time rather than at compile-time?

This is true, though not to be understated even if chpl had to look for modules by walking e.g., Lustre or GPFS at compile-time. Good filesystem behavior is still important to minimize how often a distributed filesystem is queried during the walk.

how does the compiler find module P in M.chpl? Scan through all files in a submodule directory?

I'm currently considering this issue independently of any submodule directory feature, so would say that it would consider all of the .chpl files along the module path.

I don't want chpl to parse every .chpl file in the module path to find a module that might not exist. That's bad filesystem behavior and is worse than how Python behaves.

Fortunately, chpl doesn't behave like this today. The compiler's behavior is closer to what I would expect when it looks for a module that doesn't exist. I ran strace on the compiler for an input with use FileNotFound; and it only looks for FileNotFound.chpl for each component of the module path (Ctrl+F for it). The same notion can be easily extended to #13524 for submodules, but this issue attempts to tackle the disconnect of submodules and modules being treated differently when they're in files.

how would this work if a submodule directory had three versions of module M represented as M.chpl, M-1.0.chpl and M-fixes.chpl?

I'd expect the compiler to complain about the ambiguity and to require the user to disambiguate either by specifying one of the files directly on the command-line, or by removing the others from the search path.

Isn't forcing the user to remove the conflicting files from the search path considered bad behavior? I certainly wouldn't care for it if I had to tinker on a module variant. Having a user create a symlink seems much more tenable for this corner case.

My preference would be to extend the current warning we have so that it fires whenever there is a mix of module declaration statements and any other statements at the top-level, to make the user aware of the introduction of the implicit module (and possible mistake on their part).

I don't personally like this approach, but it'd be fine with me. Wouldn't this also affect TIO code though? (TIO would always emit a warning.)

I assert that the most reasonable approach is to create an implicit file-level module with an identifier equal to the filename, unless the file has one and only one module declaration of the same name. This covers the TIO case and is unambiguous how the compiler can find it.

@bradcray
Copy link
Member

However,

If modules are the same as files, then the module name is apparent in the filename.

Note that the proposal I was replying to in that comment was not your original proposal (files are always modules), but Michael's (files are only modules when there are top-level statements other than comments and a single module declaration).

I don't want chpl to parse every .chpl file in the module path to find a module that might not exist. That's bad filesystem behavior and is worse than how Python behaves.

I hear you, but just to make sure it's not lost: it wouldn't parse every file every time, just when the file had been modified since the last run. And it seems like it could also remove many files from needing to be parsed by grepping for the module name in the file. I'm also assuming in this proposal that module paths are typically short (specifically, I almost never use them, and am assuming that mason packages won't need to either).

Wouldn't this also affect TIO code though? (TIO would always emit a warning.)

It would if the TIO code contained both one or more explicit module declarations and also top-level code outside of those module declarations. But I don't see that as problematic, just consistent with the compiler's behavior.

I assert that the most reasonable approach is to create an implicit file-level module with an identifier equal to the filename, unless the file has one and only one module declaration of the same name.

Sorry, I'm just not a fan.

@BryantLam
Copy link
Author

I hear you, but just to make sure it's not lost: it wouldn't parse every file every time, just when the file had been modified since the last run. And it seems like it could also remove many files from needing to be parsed by grepping for the module name in the file. I'm also assuming in this proposal that module paths are typically short (specifically, I almost never use them, and am assuming that mason packages won't need to either).

Sorry; again, "parse" was the wrong word. I'm looking at filesystem behavior. From the truncated strace output of chpl, there are 86 unique modules that are openat'd by name, presumably because some internal modules are useing them.

Running a find on $CHPL_HOME/modules/, shows that there are 147 modules. Are you proposing the compiler's behavior will change to some variant of openat -> stat -> grep for all 147 modules? stat doesn't happen today. grep would presumably only happen if the stat timestamp changed, so I think you're suggesting that we stat and timestamp-compare every file, every time and then grep afterwards. The openat and stat calls will dominate. What about a Mason registry with 10,000 packages in it?

The behavior you're suggesting is O(n) with the number of modules instead of the compiler's current searching method of O(1).

@bradcray
Copy link
Member

bradcray commented Oct 14, 2019

Are you proposing the compiler's behavior will change to some variant of openat -> stat -> grep for all 147 modules?

For the modules directories that are packaged with Chapel, I'm imagining that we would either (a) pre-populate the directories with index information saying what is contained in each file or (more likely) (b) require that filenames and module names match.

What about a Mason registry with 10,000 packages in it?

For mason packages, I'm imagining a world similar to the above: that either the package would itself contain any directory information required to find where things live or that mason packages would require filenames and module names to match. E.g., that the creation and establishment of the mason package would require one of these things to happen in order to be accepted as a legal package. In the former model, I could imagine that the directory information could be rolled up into a top-level cross-package mason index such that only one file would need to be checked for all mason packages in a hierarchy rather than one per package. Again, though, I'm not expert in existing package managers, so if those ideas don't make sense for some reason, let me know.

(I almost posted something to this effect last night: that it's hard for me to link this issue back to the mason world very strongly because it seems to me that we can adopt much more strict or controlled requirements for a mason package without pushing that requirement onto every piece of code a user might write or sketch out quickly. It's also why I said that I tend to assume that search paths are fairly short—where I'm thinking of the user's manually search path, not directories that might be used for mason packages).

@BryantLam
Copy link
Author

(more likely) (b) require that filenames and module names match.

I obviously prefer (b) approach over option (a). From a user-created project perspective, in 99% of the cases where people split a module into a separate file, they will name it the same. This issue discusses how to resolve the 1% of cases where it isn't true. I don't think Chapel should take option (a) because it's designing around a corner case.

For (a),

  • The directory index has to be generated at some time. When?
  • What if the original developer forgets to generate it? Does the user's invocation of chpl now have to scan through that directory to create the index? Where is it stored? In /tmp? Or is it persistent?
  • The directory index can now go stale. How do you make sure it's still recent? Don't you now have to stat all the files to make sure the directory index doesn't need to be regenerated, especially if say, some sysadmin made a fix to one of the system modules without updating that directory information?

Suffice to say, I'm skeptical of any approach that isn't a one-to-one correspondence with files as a module. Is there a language that was invented after 2000 not do option (b)? — Python, Rust, and Swift all use option (b) to find a module. It isn't a coincidence that all three of those languages also rely heavily on their package managers for success.

@mppf
Copy link
Member

mppf commented Oct 14, 2019

@BryantLam - I feel that your last post didn't really answer @bradcray's main thought above, which I would rephrase as this question: "Is it sufficient for mason packages to enforce that top-level modules have file names that match even if the language itself does not mandate it?"

@BryantLam
Copy link
Author

BryantLam commented Oct 14, 2019

"Is it sufficient for mason packages to enforce that top-level modules have file names that match even if the language itself does not mandate it?"

My response is "no" strictly on the notion of not having the compiler become more complicated with how it searches and finds modules. There's also more complexity to a related concern with how Chapel today doesn't have a notion of packages (modules between packages can have conflicting identifiers). #12923

@BryantLam
Copy link
Author

BryantLam commented Oct 14, 2019

After re-reading @bradcray's #13977 (comment), here's another interpretation:

  • For tools like TIO in which the user has no control over the filename, it would only support top-level code (or a single explicit module whose name matches the tool's filename). If you ran into a bug that involved two modules (or simply wanted to experiment with modules), you would not be able to illustrate that bug to the team in a TIO session.

The merits of a mythical bug are impossible to disprove. With that said, if the bug is reproducible with two submodules, that would be sufficient. If the bug only manifests with two top-level modules, it is possible that it would not be reproducible in TIO anyway because of externals like CHPL_MODULE_PATH.

With #13524, many bugs that would appear with two top-level module files would also appear with two submodule files.

Using TIO to experiment with submodules is still possible. Ideally, there isn't a difference with experimenting between top-level modules and submodules in TIO.

  • It complicates the ability to copy a source file, edit it, and compile it. This is a very common pattern in my workflow where I take a file someone's written, rename it foo-blc.chpl or snapshot it as foo-works.chpl, make simple edits, and recompile. It feels onerous to say that I'd have to also rename the module in that file to match the filename.

I assume when you say "recompile", you mean something like chpl App.chpl foo-works.chpl, which requires specifying the other module file at compile time. This behavior could conceivably not change and you can still specify foo-works.chpl to override the foo.chpl in the directory (probably with a flag like --module-alias=foo=foo-works.chpl,... or simply foo=foo-works.chpl).

An alternative is to use a symlink to foo.chpl, which has a Bonus advantage:

  • It breaks the ability to use filenames like foo-1.0.chpl, foo-1.1.chpl, foo-2.0.chpl to reference multiple versions of a file.

See previous about using a symlink to pick the canonical version that you are building your source with.

Bonus: if you ever send that source to other users or if someone is working with you in the same working directory (pair programming), your colleague doesn't have to guess which foo-*.chpl you used to build.

  • Philosophically, I don't believe that languages should be so rigid with respect to how they consume code or expect it to be organized on the file system. In my opinion, it should be OK for you to enter your code on a telegraph that's connected to a compiler without any assumptions that you have stored the code in a file with a specific name on a file system that meets certain requirements.

This is already not true today with module-level code creating an implicit module.

@bradcray
Copy link
Member

Sorry for the hiccup in the conversation.

For (a),

The directory index has to be generated at some time. When?

For mason packages, I was imagining that this index would be generated when the package was installed; or alternatively (but less ideally, I think), when it was published.

Outside of mason packages, I'd imagine it to be generated by the compiler only when necessary (it can't find a given module; the index file for a given directory is out of date w.r.t. the source files).

Ideally, there isn't a difference with experimenting between top-level modules and submodules in TIO.

At least some of the module / namespace proposals under consideration distinguish between top-level modules and sub-modules. For example, one proposal is that users have to always specify a full module path from top-level module to sub-module. This would make a program with top-level modules A and B behave differently than one in which they were sub-modules Filename.A and Filename.B (specifically, use A and use B would have to be changed to become use Filename.A and use Filename.B in the source code). If we ended up in a world where the behavior of a module was guaranteed to be unchanged regardless of how far down into a module hierarchy it was pushed, then I agree that this may become a non-issue.

This is already not true today with module-level code creating an implicit module.

It's true that this is an exception to my philosophy, although it's generally been a popular one due to its convenience and has been true since Chapel's origins. This exception is partially what led me to propose not having the file-scope module have a well-defined name (though I still find it more useful for it to do so to preserve the current behavior, and it also has the advantage of not requiring all existing tests that rely on it to be rewritten).

Another possibility that I keep waiting to see if someone will propose is that when looking for a module named M, Chapel looks for files named M.chpl, and upon not finding any, will consider files named M[suffix].chpl where "suffix" would start with a character that is not legal in an identifier.

@bradcray
Copy link
Member

bradcray commented Oct 25, 2019

Catching a breath, here are the major approaches that I think have been proposed or discussed on this issue. If you think I've missed one, or want to propose another, or have another favorite example in mind, let me know and I'll add it:

  • "Every filename a module": Inserts a module based on the filename regardless of the file's contents
  • "Every filename a module when mismatched": Inserts a module based on the filename if and only if the file doesn't contain a single module definition whose name matches the filename
  • "Current behavior": If the file contains file-scope code, creates an implicit module whose name is based on the filename (proposed improvement: and issues a warning if the file also contains module declarations)
  • "Implicit modules unnamed": If the file contains file-scope code, an implicit module is created whose name is anonymous / can't be referenced in the source code (indicated by _ here).
  • "Non-ident filename characters ignored": A variant on the previous two in which any non-identifier characters in the filename are ignored when coming up with the implicit module name.

By example:

F.chpl

proc foo() { ... }

Every filename a module: Creates F
Every filename a module when mismatched: Creates F
Current behavior: Creates F
Implicit modules unnamed: Creates _
Non-ident filename characters ignored: Creates F

F.chpl

module F {
  proc foo() { ... }
}

Every filename a module: Creates F.F
Every filename a module when mismatched: Creates F
Current behavior: Creates F
Implicit modules unnamed: Creates F
Non-ident filename characters ignored: Creates F

F.chpl

module M {
  proc foo() { ... }
}

Every filename a module: Creates F.M
Every filename a module when mismatched: Creates F.M
Current behavior: Creates M
Implicit modules unnamed: Creates M
Non-ident filename characters ignored: Creates M

F-1-1.chpl

proc foo() { ... }

Every filename a module: Creates F-1-1
Every filename a module when mismatched: Creates F-1-1
Current behavior: Creates F-1-1
Implicit modules unnamed: Creates _
Non-ident filename characters ignored: Creates F

F.chpl

module M { ... }
module N { ... }

Every filename a module: Creates F.M and F.N
Every filename a module when mismatched: Creates F.M and F.N
Current behavior: Creates M and N
Implicit modules unnamed: Creates M and N
Non-ident filename characters ignored: Creates M and N

F.chpl

module F { ... }
module H { ... }

Every filename a module: Creates F.F and F.H
Every filename a module when mismatched: Creates F.F and F.H
Current behavior: Creates F and H
Implicit modules unnamed: Creates F and H
Non-ident filename characters ignored: Creates F and H

F.chpl

proc foo() { ... }
module M { ... }

Every filename a module: Creates F and F.M
Every filename a module when mismatched: Creates F and F.M
Current behavior: Creates F and F.M (and would generate a warning in the proposed extension)
Implicit modules unnamed: Creates _ and _.M
Non-ident filename characters ignored: Creates F and F.M

F-1-1.chpl

proc foo() { ... }
module M { ... }

Every filename a module: Creates F-1-1 and F-1-1.M
Every filename a module when mismatched: Creates F-1-1 and F-1-1.M
Current behavior: Creates F-1-1 and F-1-1.M (and would generate a warning in the proposed extension)
Implicit modules unnamed: Creates _ and _.M
Non-ident filename characters ignored: Creates F and F.M

@bradcray
Copy link
Member

Since my taxonomy hasn't resulted in any comments yet, I thought I'd give my opinion about the approaches:

  • Every filename a module: This is the purest of all of the proposals, but a frustrating one to me because I think having module F within file F.chpl create F.F feels counter-productive

  • Every filename a module when mismatched: I don't like this approach because it feels more special-casey than the alternatives and relies too much on the filename to determine its behavior. As mentioned in the discussion above, I also don't like that the behavior of putting two explicit modules within a file differs from the behavior of having those explicit modules in two separate files as that ties the language semantics too closely to how it's organized into files for me.

  • Current behavior: I find this interpretation attractive because if we take as a given that all Chapel code must be contained within some module, it only introduces implicit modules in the cases where there is code outside of any explicitly defined module (i.e., those where it's necessary). It's also attractive in that it's already implemented. And I think adding the proposed warning would completely reduce any lingering confusion around some of the mixed file-scope and explicit module cases without generating too many warnings for typical code (I think most files only either use file-scope code or explicit modules and rarely mix the two intentionally unless they're tests written to exercise that behavior).

  • Implicit modules unnamed: As mentioned previously, I offered this as a bargaining position, but don't actually like it. I don't see a big value in having M.chpl not define a module named M in the event that it contains file-scope code. And I don't look forward to the prospect of updating existing tests to reflect this rule change. So it feels like added work without significant benefit, just for the sake of being principled by some definition ("You couldn't be bothered to name this, so we won't either, nyah.")

  • Non-ident filename characters ignored: This seems like a potentially attractive corollary to the current behavior, but not strictly necessary to me. Notably, I've had it in mind for years, but never actually found myself wishing it was already implemented.

It's probably obvious by now, but w.r.t. the OP, I don't think the current behavior is particularly broken nor as complicated to understand as the issue suggests, where the guiding philosophy is "all Chapel code must be in some module, so the compiler will introduce a module for you in cases that you needed one but didn't do so." If the case of mixing file-scope code with explicit modules is considered to be too confusing (or a sign of a likely mistake on the user's part?), I think the proposed warning will help keep people on track while also serving the purpose of reinforcing the guiding philosophy. Michael has taken a stab at implementing such a warning in #14292.

The other topic that came up in this issue which I think warrants more attention was the question of "How does the compiler find a module that it doesn't already know about?" and I think that's an interesting and important question, particularly given Bryant's assertion that we should never rely on O(size of all files in path) approaches (which I was originally proposing, but am now being more mindful of). I think we've kicked around some useful seedling ideas for that question here, and think we should fork off a separate thread to summarize and continue that part of the discussion. But in the meantime, I'd like to resolve the OP here if possible.

@BryantLam
Copy link
Author

BryantLam commented Oct 31, 2019

F-1-1.chpl

What to do about this filename is arguably for another issue. One interpretation is to transform the minus (and only the minus) character into an underscore. This is the approach that Rust takes because (1) GitHub projects like using minuses a lot and (2) if you need to get to a submodule inside this file, there's no way to do it with this commonly used illegal identifier, so transform it into a legal identifier.

Other illegal identifiers are still illegal and are not transformed.

assertion that we should never rely on O(size of all files in path) approaches

The current behavior with multiple modules in a file is arguably okay despite some confusion it causes even to core Chapel developers, but it isn't okay when talking about the premise that this issue is derived from: submodules in different files #13524.

Given that the two (or more) pieces are linked, I'm personally not comfortable with any resolution until there's a cohesive full-picture story for how the current behavior can work with that assertion and the assertion that an index file is unacceptabie. I really dislike cache and index files when I know they aren't simply necessary. Edit: An index file is equivalent to a Chapel-specific implementation of a filesystem. The index is the metadata that holds the "inodes" (files) to where code lives.

Fundamentally, trying to separate a language from the real-world environment of filesystems is simply not attractive to me. It's an unnecessary level of mental indirection for users and compilers to find where code lives and it divorces a programming language from its effective use (a packaging+distribution system), further hampering a user's on-ramp.

@mppf
Copy link
Member

mppf commented Nov 8, 2019

Fundamentally, trying to separate a language from the real-world environment of filesystems is simply not attractive to me.

I think there are some philosophical discussions here that might be worth discussing on their own. I'm not sure what the most effective way to do that is, but I feel that we keep running into Philosophy A vs Philosophy B in these discussions.

assertion that we should never rely on O(size of all files in path) approaches

The current behavior with multiple modules in a file is arguably okay despite some confusion it causes even to core Chapel developers, but it isn't okay when talking about the premise that this issue is derived from: submodules in different files #13524.

Given that the two (or more) pieces are linked, I'm personally not comfortable with any resolution until there's a cohesive full-picture story for how the current behavior can work with that assertion and the assertion that an index file is unacceptabie.

I disagree. I think that we can make submodules in different files #13524 work reasonably well with something along the lines of the current behavior. I would simply expect that submodules-in-different-files would have some restrictions about naming. (E.g. for the F.chpl containing module M, it would simply not function as a submodule-in-a-different-file - just as one cannot use F today and get that file (without also naming F.chpl on the command line) ).

assertion that we should never rely on O(size of all files in path) approaches

Just to be clear, the compiler doesn't use such an approach today, right? (It might search all path components for Something.chpl, but it doesn't open all files in all path directories).

@BryantLam
Copy link
Author

BryantLam commented Nov 12, 2019

I disagree. I think that we can make submodules in different files #13524 work reasonably well with something along the lines of the current behavior. I would simply expect that submodules-in-different-files would have some restrictions about naming. (E.g. for the F.chpl containing module M, it would simply not function as a submodule-in-a-different-file - just as one cannot use F today and get that file (without also naming F.chpl on the command line) ).

I'm willing to be flexible if there's a design that can prove my assertion wrong. That said, the current behavior isn't acceptable to me. What do you think these two programs do:

// In file: F.chpl
module F {}
module M {
  compilerError("M in F.chpl");
}

// In file: M.chpl
module M {
  compilerError("M in M.chpl");
}

// In file: main1.chpl
// compiled with: chpl main1.chpl
use F;
use M;

// In file: main2.chpl
// compiled with: chpl main2.chpl
use M;
use F;
Answers

Answer for main1.chpl:

./F.chpl:3: error: M in F.chpl
# Note: Hijacked. M.chpl is never used.

Answer for main2.chpl

./M.chpl:1: error: 'M' has multiple definitions, redefined at:
  ./F.chpl:2

I'd like to think I know enough about the Chapel compiler looks for modules, but I couldn't intuit what was the behavior of these programs without trying it.

Another example. Same as above, except M.chpl now only defines `P`.
// In file: F.chpl
module F {}
module M {
  compilerError("M in F.chpl");
}

// In file: M.chpl
module P {
  compilerError("P in M.chpl");
}

// In file: main1.chpl
// compiled with: chpl main1.chpl
use F;
use M;

// In file: main2.chpl
// compiled with: chpl main2.chpl
use M;
use F;

Answer for main1.chpl and main2.chpl are the same:

./F.chpl:3: error: M in F.chpl

This is not a surprising result, but it is surprising to a developer. Who would have thought that M.chpl wouldn't have module M and they now need to grep for it in the entire source tree. Maybe this flexibility is warranted for some use case or could be considered bad practice. If Chapel isn't going to make this bad practice illegal, then there better be a really good design to motivate allowing it.

@mppf
Copy link
Member

mppf commented Nov 12, 2019

@BryantLam - Couldn't the above examples be made to function in an obvious manner with relatively modest adjustments?

In particular, if you have a use M, the compiler looks in the module search path for M.chpl and then loads up the M.chpl it finds. It could:

  • error if M.chpl does not contain a module M (an implicit file-level module would be M, so implicit file modules are still OK)
  • error if M.chpl contains a module M and some other top-level module also

I think that these would be consistent with restrictions that exist now and also with the proposed restriction for submodules-in-different-files that I mentioned earlier:

I would simply expect that submodules-in-different-files would have some restrictions about naming. (E.g. for the F.chpl containing module M, it would simply not function as a submodule-in-a-different-file - just as one cannot use F today and get that file (without also naming F.chpl on the command line) ).

In particular one could still create M-1.chpl containing module M and use it as long as M-1.chpl is named on the command line. The current framework doesn't really support finding module M inside of a file with a different name anyway (like M-1.chpl) through a use statement alone - such files need to be named on the command line. As a result, I don't think we're really giving anything up with this proposal.

@BryantLam
Copy link
Author

BryantLam commented Nov 12, 2019

@BryantLam - Couldn't the above examples be made to function in an obvious manner with relatively modest adjustments?

In particular, if you have a use M, the compiler looks in the module search path for M.chpl and then loads up the M.chpl it finds. It could:

  • error if M.chpl does not contain a module M (an implicit file-level module would be M, so implicit file modules are still OK)

  • error if M.chpl contains a module M and some other top-level module also

From #13977 (comment), are you proposing the following behavior?

M.chpl storing module m { } ? Error
M.chpl storing module P { } ? Error
M.chpl storing module M { } writeln("Hi"); Error -> M.M
M.chpl storing module M { } module N { } Error

I think you're proposing that top-level modules and submodules-in-files will have different behaviors. I'm okay with this, though @bradcray may have a differing opinion. In general, I worry about the confusion, but if more allowed cases turn into errors, the compiler can provide fix-it hints.

(Also, there's a slight concern with how searching works with use M and whether it finds a top-level module or a submodule to trigger the correct behavior #13915.)

(And maybe another concern regarding "more binaries". E.g., if proc main-style unit tests are provided inlined with some of the (sub)modules. This is workable though.)

@mppf
Copy link
Member

mppf commented Nov 12, 2019

From #13977 (comment), are you proposing the following behavior?

No, not right now. Right now I'm just proposing that use M would have restrictions on how it could find module M (namely; module M is in a file named on the command line; or M.chpl exists somewhere in the module search path and that file contains module M (or an implicit module M) and not top-level modules with other names).

It's subtle but the key difference between this current proposal and previous ones is that it doesn't rule out any of the various scenarios for files and modules (e.g. M.chpl storing module P) but instead simply says that use M won't generally work in such cases where the module name and file name don't match.

@BryantLam
Copy link
Author

BryantLam commented Nov 12, 2019

I'm not sure I follow the distinction. Is your modest proposal of specifying the use M behavior (which makes it similar to how it already works today) independent of the suggestion for additional changes on said proposal:

In particular, if you have a use M, the compiler looks in the module search path for M.chpl and then loads up the M.chpl it finds. It could:

  • error if M.chpl does not contain a module M (an implicit file-level module would be M, so implicit file modules are still OK)

  • error if M.chpl contains a module M and some other top-level module also

If so, that's fine to me. I don't know how useful it actually is since the other proposed designs are still moving.

@mppf
Copy link
Member

mppf commented Nov 12, 2019

Is your modest proposal of specifying the use M behavior (which makes it similar to how it already works today) independent of the suggestion for additional changes on said proposal:

I don't think so. But at this moment I'm not sure what else I could explain to clarify.

Maybe an example would help?

// In file: F.chpl
module F {}
module M {
  compilerError("M in F.chpl");
}

// In file: main1.chpl
// compiled with: chpl main1.chpl
use F; // Error: `use F` needed to open `F.chpl` but `F.chpl` contains more than just `module M`

But chpl F.chpl main1.chpl would still work as it does now.

@BryantLam
Copy link
Author

BryantLam commented Nov 12, 2019

It's subtle but the key difference between this current proposal and previous ones is that it doesn't rule out any of the various scenarios for files and modules (e.g. M.chpl storing module P) but instead simply says that use M won't generally work in such cases where the module name and file name don't match.

Wow that is subtle. So you're proposing that ((placeholder: most of the behaviors I want)) will be true / the default, but the user can override by specifying the filename directly on the chpl command invocation? This is definitely clever because users have to compile by explicitly naming the file anyway. 👍

Regarding #13524, naming an explicit submodule file via chpl main.chpl A/B/C-new.chpl would have to be smart to realize that the entities in C-new.chpl need to be inserted as a submodule to A.B. It's a side concern, but is relevant for specializing/bug-fixing a standard module (what compiler flag or manipulation of module search paths, if any, that would override e.g., std.Module with the user's Module.chpl or MyModule.chpl).

@mppf
Copy link
Member

mppf commented Nov 13, 2019

Regarding #13524, naming an explicit submodule file via chpl main.chpl A/B/C-new.chpl would have to be smart to realize that the entities in C-new.chpl need to be inserted as a submodule to A.B. It's a side concern, but is relevant for specializing/bug-fixing a standard module (what compiler flag or manipulation of module search paths, if any, that would override e.g., std.Module with the user's Module.chpl or MyModule.chpl).

I don't think we have to allow command-line overriding of submodules at all. Sure, we could consider such a feature, but I don't think it's in any way essential for the proposal. For one thing, AFAIK, there is no expectation/use case for overriding submodules within a single file. I'm saying that we should consider that as a separate add-on feature and that we don't need to constrain our design thinking around submodules-in-different-files with it.

(I am aware there is some desire to be able to replace standard modules with other versions - but even with submodules-in-different-files I think that could work with special additions to the module search path).

@BryantLam
Copy link
Author

I'm saying that we should consider that as a separate add-on feature and that we don't need to constrain our design thinking around submodules-in-different-files with it.

I agree. I bring it up as a consideration because I don't see the desire to replace some M.chpl with M-1.0.chpl to be different whether it is a top-level module or a submodule-as-a-file.

mppf added a commit that referenced this issue Dec 12, 2019
Add warning for more implicit module cases

This PR adds the warning suggested in
#13977 (comment)

(Note that we might choose to go further in adjusting implicit modules as
#13977 discusses, but it seems that this addition helps the current
situation and can be more easily agreed upon).

For example
``` chapel
// my-test.chpl
writeln("X");
module M {
  writeln("hello");
}
```


my-test.chpl:2: warning: This file-scope code is outside of any explicit
module declarations (e.g., module M), so an implicit module named
'code-outside-of-module-error' is being introduced to contain the file's
contents.


This PR updates ~150 tests for the new behavior.
   * primers and chpldoc tests include the warning in the .good
   * most tests gained a module OuterModule { }
   * some tests used a particular module name

Reviewed by @lydia-duncan and discussed with @bradcray - thanks!

- [x] full local testing
@bradcray
Copy link
Member

bradcray commented Jan 8, 2021

I'd like to propose that we continue to stick with the status quo here and retire this issue. @BryantLam, any chance I can convince you of that?

@mppf
Copy link
Member

mppf commented Jan 8, 2021

Just to be clear - IIRC we have added some compiler warnings beyond what we had when this issue was filed, so the current state is probably better than it was.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants