Setup a proper environment with specific python version and required packages is one of the main challenges that people who are working in Big Data faced daily. Pip, Conda and Anaconda are tools that is used extensively that together with Isolated environment make the life much easier.
In this document, first I will go through pip, conda and anaconda and will compare them in more details. Afterward I will focus on Virtual environment that will enable us to have a full control over our system (environment) which is useful specially when we do not have root access on the system beside many other advantages.
Introduction
Let’s understand two main concepts before focusing on pip, conda and anaconda:
Package Manager: It is a tool that automates the process of installing, updating and removing packages on the corresponding system. Pip and conda are two examples of package manager.
Software Distribution: It is a pre-build and pre-configured collection of packages that can be easily used on a system. Anaconda and Miniconda are two examples of software distribution that are specially useful for beginners.
Pip, Conda & Anaconda
I will go through each of the tools in more details and will try to present a reasonable comparison among these tools and its relation withe each other.
Pip (https://github.com/pypa/pip):
It is a python package manager which handles only python packages mainly from a third-party software repository which is called PyPI (python package Index). Around 113000 python packages can be accessed through PyPI.
When we use pip to install a package, it will be installed in “dist-packages” folder of specific python version. To specify that specific python version and not the default one, we can use pip{$version-num) which usually being installed by python distribution of that specific version. Or alternatively we can use following command.
- python{$version} -m pip install numpy
Conda (https://conda.io/docs/):
conda is a package management tool for installing scientific and analytical computing packages, which may be written in Python or other programming languages. Conda is a general-purpose package management system, designed to build and manage software of any type from any language. It is also the default package manager of Anaconda. So it is more like “apt” package management in Ubuntu or “Yum” in Centos.
Pip vs Conda
pip has been designed only for python packages and it will neglect any non-Python library dependencies, which in contrast conda is a general purpose packages management that can install packages from any language.
Because Conda introduces a new packaging format, you cannot use pip and Conda interchangeably. We can use the two tools side by side but they do not inter-operate.
Each of them has a virtual environment that more or less created for that package distribution, however can be used by both of them. As I will discuss later, “virtualenv” has been created to be used by pip and “conda virtual environment” to be used mainly by conda. However as I mentioned earlier can be used inter-operately.
Let’s compare pip and conda from Command perspective:
search available packages:
- conda search $pack_name
- pip search $pack_name
install a package:
- conda install $pack_name
- pip install $pack_name
list installed packages:
- conda list (or conda list –name $environment_name)
- pip list
update a package:
- conda update $pack_name (or conda update –name $environment_name $pack_name)
- pip install –upgrade $pack_name
updating package manager itself:
- conda update conda
- pip install –upgrade pip
Anaconda:
Anaconda is an open-source Python distribution provided by Continuum Analytics (they also created conda), which includes over 720 of the most popular Python packages for science, math, engineering and data analysis such as numpy, scipy, ipython notebook which is very useful specially for beginners.
Very interesting part of anaconda is the fact that it comes with different python distribution, conda itself, a way to manage environment and lots prepared packages which pre-installed. So basically the installation of anaconda leads to installation of conda as well.
Anaconda Installation:
1. Download Anaconda Linux installer from https://www.anaconda.com/download/#linux
Copy the link as I did here for python3.6 version of 64-bit(x86):
2. Install
first make the script executable:
- chmod +x Anaconda3-5.0.1-Linux-x86_64.sh
and then install it:
- ./Anaconda3-5.0.1-Linux-x86_64.sh
I installed it in the default directory. During the installation it will ask if you want to include the anaconda to the path, that “yes” will include following line to .bashrc:
- export PATH=”/lhome/hrouhan/anaconda3/bin:$PATH”
So from now on the default python version will be the one which is in the above directory, in my case is python3.6:
- hrouhan@hrouhani.org:~$ python
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
[GCC 7.2.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.
>>> - hrouhan@hrouhani.org:~$ which python3.6
/lhome/hrouhan/anaconda3/bin/python3.6
As a result of anaconda, the conda, related python version and over 720 open-source packages also will be installed. It is a good news since at the same time we have installed “conda”.
Take into consideration that python version that we have installed is located at following path:
- /usr/bin/python
4. quick test:
After the installation, we have a default environment that can be used with the default installed anaconda packages. Lets have a test:
- hrouhan@hrouhani.org:~$ python
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
[GCC 7.2.0] on linux
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import numpy
>>> import matplotlib
>>> exit()
We can see the default conda environment here (root *):
- hrouhan@hrouhani.org:~$ conda env list
# conda environments:
root * /lhome/hrouhan/anaconda3
All default installed packages are in /lhome/hrouhan/anaconda/pkg/ which also can be seen with “conda list” command.
How to use Anaconda
As I mentioned earlier, after the installation of anaconda, we will have a default environment (root * /lhome/hrouhan/anaconda3) included with some packages that has been installed. We can check the list of installed packages in default anaconda environment by:
- hrouhan@hrouhani.org:~$ conda list
# packages in environment at /lhome/hrouhan/anaconda3:
…
…
If we need a package which is not included in default anaconda installation, we can install it from Continuum Analytics repository with following command: (however I recommend to install it in an virtual environment)
- conda install <pkg name>
if the package that we need does not exit in anaconda repository, we can install it using “pip” or simply install it through source code using “setup.py”.
In order not to mess with default environment, I recommend to create a separate environment. It is specially useful when we are working on different project which each of them needs different packages or even different version of python.
Managing environment
There are two ways of creating virtual environment (isolated environment) within a current path:
- Virtualenv
- Conda built-in environment manager
I am more fan of Conda built-in environment due to following reasons:
- Conda combines the functionality of pip & virtualenv in a single package. It basically means when you create a virtual environment with Conda, you still can use pip.
- With conda environment we can manage different version of python including with installation of python. But Virtualenvs must be created upon an existing python and we can not install new python inside the environment.
I will go through both, in case some people still prefer Virtualenv (not me 🙂 )
a. Virtualenv
I will create a testProject1 environment by using python3.4:
- hrouhan@hrouhani.org:~/pythonENV$ virtualenv –python=/usr/bin/python3.4 testProject1
and then we activate it:
- hrouhan@hrouhani.org:~/pythonENV/testProject1$ . bin/activate
- (testProject1)hrouhan@hrouhani.org:~/pythonENV/testProject1$ python –version
Python 3.4.3
as an example we can install the numpy in this environment:
- (testProject1)hrouhan@hrouhani.org:~/pythonENV/testProject1$ pip install numpy
- (testProject1)hrouhan@hrouhani.org:~/pythonENV/testProject1$ pip list
numpy (1.14.0)
pip (1.5.4)
setuptools (2.2)
b. Conda built-in environment manager
Let’s create a new environment called testProject with python3.6 by using conda. Keep it in mind that we need to include at least one package into the new created environment which in our case is python3.6.
- hrouhan@hrouhani.org:~/CondaEnv$ conda create –name testProject1 python=3.6
Fetching package metadata ………..
Solving package specifications: .
Package plan for installation in environment /lhome/hrouhan/anaconda3/envs/testProject1:
The following NEW packages will be INSTALLED:
ca-certificates: 2017.08.26-h1d4fec5_0
certifi: 2017.11.5-py36hf29ccca_0
libedit: 3.1-heed3624_0
libffi: 3.2.1-hd88cf55_4
libgcc-ng: 7.2.0-h7cc24e2_2
libstdcxx-ng: 7.2.0-h7a57d05_2
ncurses: 6.0-h9df7e31_2
openssl: 1.0.2n-hb7f436b_0
pip: 9.0.1-py36h6c6f9ce_4
python: 3.6.4-hc3d631a_1
readline: 7.0-ha6073c6_4
setuptools: 38.4.0-py36_0
sqlite: 3.20.1-hb898158_2
tk: 8.6.7-hc745277_3
wheel: 0.30.0-py36hfd4bba0_1
xz: 5.2.3-h55aa19d_2
zlib: 1.2.11-ha838bed_2
Proceed ([y]/n)?
and then we need o activate it by:
- hrouhan@hrouhani.org:~/CondaEnv$ source activate testProject
and now if we look at python we see that new pythong is used:
- (testProject1) hrouhan@hrouhani.org:~/CondaEnv$ which python
/lhome/hrouhan/anaconda3/envs/testProject1/bin/python
we can now install whatever we need in this environment with both conda and pip. Let’s install numpy here:
- (testProject1) hrouhan@hrouhani.org:~$ conda install numpy
Fetching package metadata ………..
Solving package specifications: ….
and then I will install “matplotlib” with pip:
- (testProject1) hrouhan@hrouhani.org:~$ pip install matplotlib
and afterward you check again the list of all packages exist in this environment by “conda list”.
Now we can exit by deactivating the environment (source deactivate). We can list all the environment that currently we have by:
- hrouhan@hrouhani.org:~$ conda env list
# conda environments:
#
testProject /lhome/hrouhan/anaconda3/envs/testProject
testProject1 /lhome/hrouhan/anaconda3/envs/testProject1
root * /lhome/hrouhan/anaconda3
At the end I recommend to use Isolated environment preferably with Conda. In this case you have full control over the environment, plus you can easily replicate the same environment and program in another system even outside of current system.