Setup Python workbench
Here is a short documentation for the setup of my home at the NYGC cluster. I started by cloning my dotfiles from GitHub.
Directory structure
opt
for installing softwaresoftware
for downloading and/or developing softwarework
for experiments / simulations / analysis
Setup modules
- Add
export MODULEPATH="${MODULEPATH}:${HOME}/modulefiles"
to include personal modulefiles.
Setup Miniconda
- Download and install latest version of Miniconda (23.3.1).
- Copy conda init script from
.bashrc
toopt/miniconda3/conda_bash_init.sh
. - Create a modulefile to initialize conda on demand. Similar issue for LMOD modulefiles.
#%Module2.1.10#####################################################################
##
## modulefiles conda/23.3.1
##
set version 23.3.1
set modroot /gpfs/commons/home/sbanerjee/opt/miniconda3
proc ModulesHelp { } {
global version modroot
puts stderr "conda/$version - adds Miniconda to PATH"
}
"Sets the environment for using Miniconda"
module-whatis
# Only one version at a time
conflict conda
if [module-info mode load] {
puts stdout "source $modroot/conda_bash_init.sh ;"
}
if [module-info mode remove] {
puts stdout "conda deactivate ;"
remove-path PATH $modroot/bin
remove-path PATH $modroot/condabin
## unset variable names.
## env | grep "CONDA"
unsetenv CONDA_SHLVL
unsetenv CONDA_EXE
unsetenv _CE_CONDA
unsetenv CONDA_PYTHON_EXE
## unset function names
## declare -F
unsetenv __conda_activate
unsetenv __conda_exe
unsetenv __conda_hashr
unsetenv __conda_reactivate
unsetenv conda}
- Setup
.condarc
(see dotfiles)
Install Python
module load gcc/11.2.0 lapack/3.10 openmpi/4.1.4
conda create --name py311 python=3.11 -c conda-forge
conda activate py311
conda config --env --add channels conda-forge
conda install numpy scipy pandas scikit-learn matplotlib
conda install jupyterlab
Connect to JupyterLab
mkdir ~/.jupyter
cp /nfs/sw/jupyter3/jupyter3-4.1/.jupyter/jupyter_notebook_config.py ~/.jupyter/jupyter_notebook_config.py
jupyter lab
There is no need for local port forwarding. (I want to know how it is set up, ask NYGC Cluster Linux help).
Load R libraries with Python
- Set CXX17 path for compiling R packages
echo 'CXX17 = g++ -std=gnu++17 -fPIC' > ~/.R/Makevars
- Create modulefile for loading environment variables (required for loading R from Python) which are not set by NYGC.
#%Module2.1.10#####################################################################
##
## modulefiles Renv/4.2.3
##
4.2.3
prereq R/
set version 4.2.3
set rhome $::env(rhome)
proc ModulesHelp { } {
global version
puts stderr "Renv/$version - adds extra environment variables not provided by NYGC"
}
"Sets the environment for using R/$version"
module-whatis
# Only one version at a time
conflict Renv
$rhome/lib64/R/lib
prepend-path LIBRARY_PATH $rhome/lib64/R/include
prepend-path CPATH $rhome/
prepend-path CMAKE_PREFIX_PATH $rhome
setenv R_ROOT $rhome/lib64/R setenv R_HOME
- Load the R module and the environment variables
module load R/4.2.3
module load Renv/4.2.3
- Install some R packages (also checks that gcc is working)
::detectCores()
paralleloptions(Ncpus = 48)
install.packages(c("devtools", "ggplot2"))
- Install and verify rpy2
conda install tzlocal # other dependencies are already installed while installing the previous packages
pip install rpy2
python -m rpy2.situation
python
>> import rpy2.robjects as robj