Quantcast
Channel: OCaml - Topics tagged dune
Viewing all articles
Browse latest Browse all 520

Properly wrap a package’s modules with dune

$
0
0

@Maelan wrote:

dune is great for brainless, fast compilation. However it’s hard to control with precision what it produces.

Situation

Say we are writing a library pkg with modules A and B, and we want a module PKG as the single entry point of the library.

Files:

$ ls
A.ml   B.ml   dune
A.mli  B.mli  pkg.opam

File dune:

(library
	(public_name  pkg)
	(name         PKG)
	(wrapped      true)    ; this is the default
)

As of dune 1.11.3 (with OCaml 4.08.1), when building this package, dune will first alpha-rename modules A and B to PKG__{A,B}, then compile them, then compile a generated wrapper module PKG, with submodules {A,B} which are aliases to PKG__{A,B}.

This does provide an entry point named PKG with the contents we want, but it also exposes internal modules PKG__{A,B} in the environment (for example, they pollute the suggestions in utop). We cannot remove them because PKG.{A,B} are mere aliases to them (we need their .cmi for compiling, and their .cmti for documentation).

Things become worse when we try to write the wrapper module ourselves (for documentation purposes, or to select what to expose, or to customize module paths, or to have toplevel values). Then, it appears that dune still generates the same wrapper module as before, only this time it calls it PKG__. Only after that the generated module is compiled, our custom PKG.ml file is compiled, and references to either A or B are aliased to PKG__.{A,B} (which are themselves aliases to PKG__{A,B}). So now we have one more internal module polluting the environment.

(I was going to develop further on how it makes the documentation generation fragile as you try more and more to customize your build, with mentions of “underscored” modules sparkling the final user documentation, and/or missing pages for the “non-underscored” modules, but I realize it’s going out-of-topic.)

0th try: private modules

dune has an option (private_modules A B), but its only apparent effect is to hide the pKG__{A,B}.cmi in a directory .private/, effectively making these modules inaccessible, but it also breaks the module aliases.

1st try: module inclusion

So the issue is providing modules which are aliases. Then, what about this:

File PKG.ml:

module A
: module type of A
= struct include A end

module B
: module type of B
= struct include B end

In this case, building the library with dune still produces the underscored modules PKG__ and PKG__{A,B} but, as we used module inclusion instead of module aliases, the sub-modules of PKG are not linked to these underscored modules. We can get rid of them once PKG is built. That is, we can get rid of their .cmi files, and they won’t be accessible anymore to the final user. We still need their .cmt/.cmti files for documentation.

That’s closer to what we want, but I found a number of pitfalls.

  1. I suspect that, if B depends on A, then A would be duplicated in the archive (both the original module PKG__A and its copy PKG.A would be packed in the archive; PKG.B would use PKG__A internally).
  2. If module B has a type u = A.t, then the type equality between PKG.A.t and PKG.B.u is lost.
  3. odoc produces cyclic links (e.g. if A has a functor Make, the documentation for A.PKG.Make is missing, instead it brings back to PKG.A).

2th try: source-level preprocessing

In fact, the only way I found to get exactly what I expect is by using a preprocessor to include modules at the source level:

File PKG.ml.cpp:

module A
#if __has_include("PKG/A.mli")
  : sig
    #include "PKG/A.mli"
  end
#endif
  = struct
    #include "PKG/A.ml"
  end

module B
#if __has_include("PKG/B.mli")
  : sig
    #include "PKG/B.mli"
  end
#endif
  = struct
    #include "PKG/B.ml"
  end

File dune:

(library
	(public_name  pkg)
	(name         PKG)
	(modules      PKG)
)

(rule
	(target  PKG.ml)
	(deps    (:dep PKG.ml.cpp) (glob_files PKG/*.ml{,i}))
	(action  (run cpp %{dep} -o %{target}))
	; source files are moved to a directory PKG/, because otherwise
	; the glob would also match PKG.ml, creating a cyclic dependency
)

This is easier to understand, and produces the right thing with respect to both exposed modules and documentation (you’ll still have to customize a bit the pkg.install, because you’ll likely want to provide the true source files PKG.ml.cpp and PKG/* instead of the generated PKG.ml). But, of course, we lose separate parallel compilation…

At this point, it does not seem reasonable to keep trying. Has someone else solved this issue? Or is it regarded by the community as a non-issue? I can see that real-world packages such as re are happy with bundling modules Re__*.

Posts: 4

Participants: 3

Read full topic


Viewing all articles
Browse latest Browse all 520

Trending Articles