Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using conda to manage multiple native library configurations without Anaconda or Python in the mix #911

Closed
mwiebe opened this issue May 3, 2016 · 33 comments
Labels
locked [bot] locked due to inactivity stale::closed [bot] closed after being marked as stale stale [bot] marked as stale due to inactivity

Comments

@mwiebe
Copy link
Contributor

mwiebe commented May 3, 2016

We're most of the way through setting this up so we can use conda for this internally at Thinkbox. It's much better than managing 50 dependent libraries and SDKs manually, but could be better. I'm posting this now because I saw https://twitter.com/wesmckinn/status/725448295774969857 from @wesm, and thought I would describe some parts of what we're doing. Our solution is passable as an internal tool, but integrating support for what we're doing directly from conda and conda-build would be much nicer.

Initially we evaluated between conda and rez (https://github.com/nerdvegas/rez). Being a tool from VFX, I thought rez might make integrating with some graphics-related packages easier out of the box, but conda came out as a clear winner in the comparison. The biggest challenge to be able to use conda was to find a way to manage multiple compiler configurations. Conda-build has some hard-coded internal logic that selects a compiler based on platform and python version, so out of the box it doesn't support this idea at all.

Example compiler configurations one might want:

  • MSVC configured in debug mode. This is a different ABI than release, so all dependent packages must be built this way.
  • MSVC 2013 or 2015 with the /Gv flag, so numeric code uses the faster vectorcall convention.
  • Clang or gcc with various sanitizer instrumentations enabled. This works best if all dependent packages are built with the same flags.
  • Builds with everything configured to link statically, or to link dynamically. To be able to build applications that link into one big executable or as a collection of libraries depending on the chosen configuration.
  • Builds with the compiler flags set to target a specific CPU microarchitecture, e.g. the one used for a machine type in cloud infrastructure, to optimize usage of computing within that cloud.

Our solution to defining multiple compiler configurations uses a conda feature paired with a conda package of the same name. The feature gets applied to all packages built with that compiler configuration.

For example, the package msvc2012rel provides the feature msvc2012rel, and installs files to configure the compiler for builds with MSVC 2012 in release mode. Its meta.yaml file looks like:

package:
    name: msvc2012rel
    version: 1.0

build:
  number: 4
  track_features:
    - msvc2012rel

and its bld.bat installs the file compiler_config.bat which looks like the following. It's bit messy, admittedly, but provides environment variables to easily tell cmake and other build systems the compiler configuration.

@echo on
set MSVC_VER=11.0
set MSCFULL_VER=17.00.61030
set MSVC_VER_YEAR=2012
set ARCH=@ARCH@

set CMAKE_BUILD_TYPE=Release
set BUILD_TYPE_LOWER=release
set BUILD_DYNAMIC=OFF

set CFLAGS=-DUNICODE -D_UNICODE
set CXXFLAGS=%CFLAGS%

REM ------------------------------------------------------
REM The rest below is generic

if "%ARCH%" == "64" set VCVARS_ARCH=amd64
if "%ARCH%" == "32" set VCVARS_ARCH=x86
if "%ARCH%" == "64" set CMAKEGEN_ARCH= Win64
if "%ARCH%" == "32" set CMAKEGEN_ARCH=
if "%ARCH%" == "64" set MSVC_ARCH=x64
if "%ARCH%" == "32" set MSVC_ARCH=Win32

REM Activate the compiler command line
call "C:\Program Files (x86)\Microsoft Visual Studio %MSVC_VER%\VC\vcvarsall.bat" %VCVARS_ARCH% || exit /b 1
@echo on

REM Validate version of MSVC
cl 2>&1 | C:\Gow\bin\grep %MSCFULL_VER%
if "%ERRORLEVEL%" == "0" goto good_msc_ver
echo Bad MSVC Version: expected %MSCFULL_VER%
cl 2>&1
exit /b 1
:good_msc_ver

set CMAKE_GENERATOR=Visual Studio %MSVC_VER:~0,2% %MSVC_VER_YEAR%%CMAKEGEN_ARCH%

With this system, normal conda-build recipes do not work, they must be told which compiler configuration to use, and then they need to depend on the feature tag for that compiler configuration. This is done through the COMPILER_CONFIG environment variable, which gets interpolated into the recipe using conda's jinja2 templating. See the meta.yaml file for a zlib recipe following this standard:

package:
  name: zlib
  version: 1.2.8

source:
  fn: zlib-1.2.8.tar.gz
  url: http://zlib.net/zlib-1.2.8.tar.gz
  md5: 44d667c142d7cda120332623eab69f40

build:
  number: 3
  features:
    - {{ environ['COMPILER_CONFIG'] }}

requirements:
  build:
    - {{ environ['COMPILER_CONFIG'] }}

about:
  home: http://zlib.net/
  license: zlib (http://zlib.net/zlib_license.html)
  summary: "A Massively Spiffy Yet Delicately Unobtrusive Compression Library"

Every recipe's bld.bat and build.sh first source the compiler_config.bat or compiler_config.sh file provided by the compiler config package, then use the defined variables and configured compiler to build. The bld.bat for zlib is:

call compiler_config.bat || exit /b 1
@echo on

mkdir build
cd build

REM Configure step
cmake -G "%CMAKE_GENERATOR%" -DBUILD_SHARED_LIBS=%BUILD_DYNAMIC% -DCMAKE_BUILD_TYPE=%CMAKE_BUILD_TYPE% -DCMAKE_PREFIX_PATH=%LIBRARY_PREFIX% -DCMAKE_INSTALL_PREFIX:PATH=%LIBRARY_PREFIX% %SRC_DIR% || exit /b 1

REM Build step
cmake --build . --config %CMAKE_BUILD_TYPE% || exit /b 1

REM Install step
cmake --build . --config %CMAKE_BUILD_TYPE% --target install || exit /b 1

copy ..\contrib\iostream3\zfstream.h %LIBRARY_PREFIX%\include || exit /b 1
copy ..\contrib\iostream3\zfstream.cc %LIBRARY_PREFIX%\include || exit /b 1

and the build.sh is basically identical to how a typical recipe's would be except for sourcing compiler_config.sh:

#!/bin/bash
set -xe

source compiler_config.sh

./configure --prefix=$PREFIX
make
make install

cp $SRC_DIR/contrib/iostream3/zfstream.h $PREFIX/include
cp $SRC_DIR/contrib/iostream3/zfstream.cc $PREFIX/include

To conclude things, this system is working, but would be a lot better with first-class support from conda. What we're using isn't suitable for dropping directly into conda, but maybe the way we've done things can inspire something that is.

@msarahan
Copy link
Contributor

msarahan commented May 3, 2016

Wow! This is well-timed! Things are coming to a head with several different proposals along these lines. If you'd like to join us for a call on Monday (5/9) morning at 9 AM Central time, your thoughts and opinions are welcome. The other related issues (I think) are:

#728 #747 #848 #857 and probably more that I'm missing (sorry).

@mcg1969
Copy link
Contributor

mcg1969 commented May 3, 2016

This seems relevant, if we want to try to do this without features (and I think we might): conda-forge/staged-recipes#525 (comment)

@mwiebe
Copy link
Contributor Author

mwiebe commented May 3, 2016

I'd love to join that call - can't guarantee I'll say coherent things though, as 7 AM Pacific time is not the best time for that.

@mcg1969 with conda-forge/staged-recipes#525 (comment), our system would have a metapackage compiler_config to replace the feature/package pair, and then each package built using it depends on the particular build string, e.g. compiler_config * msvc2012rel in the example I wrote. Then the conda dependency solver will always prefer this more specific package to one that has no compiler_config dependency? This sounds very promising to me.

@mcg1969
Copy link
Contributor

mcg1969 commented May 3, 2016

That's right. The dependency solver will automatically make sure that only one of the build strings will be selected, of course---and packages that don't specify one will defer to the ones that do.

@mcg1969
Copy link
Contributor

mcg1969 commented May 3, 2016

Basically, what we really want is a sort of "key/value" feature capability, something that might ultimately look like compiler_config:msvc2012rel. With these metapackages, we're hacking the solver to get the same behavior without a pretty syntax. Features are OK, but they're just not quite enough for the kinds of things many of us want to do.

@mcg1969
Copy link
Contributor

mcg1969 commented May 3, 2016

Roughly speaking, the existing feature capability is somewhat like having two packages:

feature-1-off.tar.bz2
feature-0-on.tar.bz2

By default, version 1 will be preferred, which is "off". But if you either 1) specify feature * on on the command line, or you install a package that has a hard dependency on feature * on, it will select that.

@mwiebe
Copy link
Contributor Author

mwiebe commented May 3, 2016

That sounds great! We've run into some problems isolating environments from the Anaconda packages with the feature (e.g. it used Anaconda's zlib instead of ours, we fixed it by increasing our build number), and would be nice for the solver to the right thing in the face of that ambiguity. Being able to conveniently isolate environments from the Anaconda default packages is another thing to eventually get working better.

@mcg1969
Copy link
Contributor

mcg1969 commented May 3, 2016

I think you'll find the improvements that we're making to channel handling will help with this second problem. We may not be all the way there yet in master, but for instance, we're no longer "interleaving" identically-named packages from two different channels. The packages from higher-priority channels will always be preferred over lower-priority ones.

@mcg1969
Copy link
Contributor

mcg1969 commented May 3, 2016

Actually, I'm wrong, the current feature capability can be implemented with just one metapackage representing "on".

@wesm
Copy link

wesm commented May 4, 2016

This is very helpful. I'm still learning more about conda (and metapackages) but I'd eventually like to conda-ify the C++ library toolchain I'm currently using:

https://github.com/cloudera/native-toolchain

The goal would be to be able to more easily build both debug and release builds like Mark is describing. It also needs to be able to build multiple versions of certain libraries (e.g. multiple versions of LLVM and Thrift).

As another example of a 3rd-party library toolchain:

https://github.com/apache/parquet-cpp/tree/master/thirdparty

in particular:

https://github.com/apache/parquet-cpp/blob/master/thirdparty/set_thirdparty_env.sh

Outside of the compiler configuration, need to be able to indicate to cmake where to look for build / runtime dependencies first.

I also am trying to figure out how to manage cmake modules in a way that would be compatible with a conda-supplied toolchain. For example, the Thrift cmake module here depends on the $THRIFT_HOME environment variable.

https://github.com/apache/parquet-cpp/blob/master/cmake_modules/FindThrift.cmake

I guess one solution is that you can just set export THRIFT_HOME=$PREFIX in your build.sh

@mcg1969
Copy link
Contributor

mcg1969 commented May 4, 2016

I'm sure between the lot of us we can figure something out. I won't be much help on the compiler suite, but if we need to get conda's dependency solver to do something it wasn't designed for, well, I'm your man.

@wesm
Copy link

wesm commented May 4, 2016

Any guidance on automating a toolchain build from a directory of recipes? Or is that a DIY affair right now? What about "build everything that hasn't been built"?

@koverholt
Copy link

I often use directories of conda recipes like conda.recipe/primary conda.recipe/dependency1, conda.recipe/depedency2, as shown in one example here. conda build will walk the dependency tree like (using an unrelated example):

$ conda build adam --python 3.5
Removing old build environment
Removing old work directory
BUILD START: adam-1.0.0-py35_1
Using Anaconda Cloud api site https://api.anaconda.org
Fetching package metadata: ......
Solving package specifications: .
 Packages missing in current linux-64 channels:
  - click-plugins
  - terminaltables
  - schematics
Missing dependency click-plugins, but found recipe directory, so building click-plugins first
 Packages missing in current linux-64 channels:
  - click-plugins
  - terminaltables
  - schematics
Missing dependency terminaltables, but found recipe directory, so building terminaltables first
 Packages missing in current linux-64 channels:
  - click-plugins
  - terminaltables
  - schematics
Missing dependency schematics, but found recipe directory, so building schematics first
Removing old build environment
Removing old work directory
BUILD START: click-plugins-1.0.3-py35_0
Fetching package metadata: ......
Solving package specifications: .........

Is this along the lines of what you are looking for?

@wesm
Copy link

wesm commented May 4, 2016

IIUC, conda build looks for binary artifacts to install from channels before building the build dependencies (is there some way to change this?). I'm looking for a way to build an isolated toolchain from the bottom up without depending on any other anaconda artifacts. This is important for creating debug or release builds of things.

@koverholt
Copy link

Good point, my example only involves packages that are not in defaults (e.g., repo.continuum.io). I'll let others respond to the feasibility of excluding packages in defaults when using conda build.

@mcg1969
Copy link
Contributor

mcg1969 commented May 4, 2016

I don't know if there's an option to clear out the conda-bld directory, but it can't hurt to just remove it outright before such a build. I also don't know if it relies on anything in the package cache (the pkgs subdirectory), but if so that would need to be wiped, too. This is just off the top of my head though.

@mcg1969
Copy link
Contributor

mcg1969 commented May 4, 2016

Well, there is --override-defaults but I'm not sure it responds well to having no channels. Perhaps the solution (at this point) would require an empty, dummy channel. Though conda-bld itself is constructed as a channel, so perhaps that's sufficient.

@mwiebe
Copy link
Contributor Author

mwiebe commented May 4, 2016

To process builds of many packages, we've created a script which loads all the meta.yaml files from the package directories (with a little bit of monkey-patching in conda-build so we can submit jobs to different platforms than the submitter is running on), then submit all the requested jobs to process on Deadline mirroring the package dependency structure into a job dependency structure there. For submitting rebuilds of everything, this works pretty well, but we haven't tackled doing partial rebuilds of a package and all its downstream dependencies when there's a change in the git repository for a library.

Maybe loading metadata of many packages and constructing/processing the dependency graphs is functionality to add in conda-build?

@mwiebe
Copy link
Contributor Author

mwiebe commented May 4, 2016

For serving the conda packages, we've got Apache serving them, and made a trivial server in Python code on port 8080 that accepts PUT requests, and runs conda index after each put in the appropriate directory. Really basic setup, but it's working alright.

In particular, this provides the place for conda to look for dependencies, and the combination of uploading to the server and having the Deadline job dependencies match the package dependencies allows the rebuilding of a bunch of packages to happen in a distributed fashion while respecting those dependencies.

@wesm
Copy link

wesm commented May 4, 2016

So what I'm hearing is that if I am unable to depend on repo.continuum.io, then I need to write my own dependency analysis script to build the package tree manually from the bottom up? Feels like this should be a part of conda.

@mcg1969
Copy link
Contributor

mcg1969 commented May 4, 2016

I'm pretty sure that conda-build can process a directory of recipes in dependency order; but I agree that if it currently does not, it should.

@mcg1969
Copy link
Contributor

mcg1969 commented May 4, 2016

@msarahan and @kalefranz definitely have more knowledge than I do about how conda-build works. I tend to focus on conda core, and jump into conda-build only when my work breaks it :-)

@msarahan
Copy link
Contributor

msarahan commented May 4, 2016

If you have a flat folder of recipes, conda-build will try first to download packages, but if they are not available, it will build them. It is primitive in knowing where recipes might be, so that flat folder hierarchy is currently essential.

Note that this also doesn't have very good ways to specify when things should be rebuilt. If it can be downloaded, conda-build will always prefer that. So, you either have to hide potential download sources and clear existing package builds, or manually specify recipes to be built. No ordering analysis is done for manually specified lists of recipes.

@mcg1969
Copy link
Contributor

mcg1969 commented May 4, 2016

OK, so there is work that needs to be done in order to build a flat directory of recipes from scratch, in dependency order.

@wesm
Copy link

wesm commented May 4, 2016

@msarahan so, is the easiest option to fix this to add an option to conda build that disables use of channels / downloading packages in favor of an all-local build? This is the only viable option for my use case, and I would really like to use conda

@wesm
Copy link

wesm commented May 4, 2016

Another option is to install conda into a purpose-built non-Anaconda Python then emulate a local package server (so you have a single channel available on localhost), but this seems really hacky

@msarahan
Copy link
Contributor

msarahan commented May 4, 2016

Unless you have all of your system's dependencies as recipes, that probably won't work. I think we should add a --prefer-local flag anyway that prefers local recipe builds. In that case, it would look first for a then a local recipe, check for an up-to-date local package, if found, use it, if not, build it, and finally the remote channels. Does that sound workable?

@wesm
Copy link

wesm commented May 4, 2016

Let's assume that all of the system's dependencies (including gcc) are available as recipes. This is literally what we are doing currently with a manually-maintained set of shell scripts building things from the ground up.

@mcg1969
Copy link
Contributor

mcg1969 commented May 4, 2016

Wouldn't conda build --override-channels disable any references to anaconda defaults? The question then is whether conda build is OK with having the local build directory being the only channel.

@mwiebe
Copy link
Contributor Author

mwiebe commented Jun 28, 2016

I've added conda/conda#2901, a feature idea for conda to better support toolchain systems as described here.

@wizofe
Copy link

wizofe commented Apr 3, 2019

I know this is 3 years old but @mwiebe I stumbled across it while looking for ideas involving Conda over Rez for VFX.

Is this still applicable and did you have any success?

@mwiebe
Copy link
Contributor Author

mwiebe commented Apr 3, 2019

Hi @wizofe, it is still applicable, and the input we and others provided was synthesized by the Conda developers into a feature called Build Variants documented at https://docs.conda.io/projects/conda-build/en/latest/resources/variants.html. Something like VFX Platform could be treated as a self-consistent package ecosystem that's mentioned near the bottom of https://docs.conda.io/projects/conda-build/en/latest/resources/variants.html#self-consistent-package-ecosystems.

We haven't migrated our system from using environment variables as parameters to using conda build variants, but the way we have evolved it is successful, and we use it to build hundreds of conda recipes across many supported operating systems, compilers, and software versions like Python 2.7 and Python 3.6 to output thousands of packages. We index these packages into an S3 bucket, which conda supports referencing natively as a conda channel.

@github-actions
Copy link

Hi there, thank you for your contribution!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed automatically if no further activity occurs.

If you would like this issue to remain open please:

  1. Verify that you can still reproduce the issue at hand
  2. Comment that the issue is still reproducible and include:
    - What OS and version you reproduced the issue on
    - What steps you followed to reproduce the issue

NOTE: If this issue was closed prematurely, please leave a comment.

Thanks!

@github-actions github-actions bot added the stale [bot] marked as stale due to inactivity label Jun 16, 2022
@github-actions github-actions bot added the stale::closed [bot] closed after being marked as stale label Jul 17, 2022
@github-actions github-actions bot added the locked [bot] locked due to inactivity label Jul 17, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity stale::closed [bot] closed after being marked as stale stale [bot] marked as stale due to inactivity
Projects
None yet
Development

No branches or pull requests

6 participants