Pip, Conda & Anaconda

Setup a proper environment with specific python version and required packages is one of the main challenges that people who are working in Big Data faced daily. Pip, Conda and Anaconda are tools that is used extensively that together with Isolated environment make the life much easier.

In this document, first I will go through pip, conda and anaconda and will compare them in more details. Afterward I will focus on Virtual environment that will enable us to have a full control over our system (environment) which is useful specially when we do not have root access on the system beside many other advantages.

Introduction

Let’s understand two main concepts before focusing on pip, conda and anaconda:

Package Manager: It is a tool that automates the process of installing, updating and removing packages on the corresponding system. Pip and conda are two examples of package manager.

Software Distribution: It is a pre-build and pre-configured collection of packages that can be easily used on a system. Anaconda and Miniconda are two examples of software distribution that are specially useful for beginners.

 

Pip, Conda & Anaconda

I will go through each of the tools in more details and will try to present a reasonable comparison among these tools and its relation withe each other.

Pip (https://github.com/pypa/pip):

It is a python package manager which handles only python packages mainly from a third-party software repository which is called PyPI (python package Index). Around 113000 python packages can be accessed through PyPI.

When we use pip to install a package, it will be installed in “dist-packages” folder of specific python version. To specify that specific python version and not the default one, we can use pip{$version-num) which usually being installed by python distribution of that specific version. Or alternatively we can use following command.

  • python{$version} -m pip install numpy

 

Conda (https://conda.io/docs/):

conda is a package management tool for installing scientific and analytical computing packages, which may be written in Python or other programming languages. Conda is a general-purpose package management system, designed to build and manage software of any type from any language. It is also the default package manager of Anaconda. So it is more like “apt” package management in Ubuntu or “Yum” in Centos.

Pip vs Conda

pip has been designed only for python packages and it will neglect any non-Python library dependencies, which in contrast conda is a general purpose packages management that can install packages from any language.

Because Conda introduces a new packaging format, you cannot use pip and Conda interchangeably. We can use the two tools side by side but they do not inter-operate.

Each of them has a virtual environment that more or less created for that package distribution, however can be used by both of them. As I will discuss later, “virtualenv” has been created to be used by pip and “conda virtual environment” to be used mainly by conda. However as I mentioned earlier can be used inter-operately.

Let’s compare pip and conda from Command perspective:

search available packages:

  • conda search $pack_name
  • pip search $pack_name

install a package:

  • conda install $pack_name
  • pip install $pack_name

list installed packages:

  • conda list (or conda list –name $environment_name)
  • pip list

update a package:

  • conda update $pack_name (or conda update –name $environment_name $pack_name)
  • pip install –upgrade $pack_name

updating package manager itself:

  • conda update conda
  • pip install –upgrade pip

 

Anaconda:

Anaconda is an open-source Python distribution provided by Continuum Analytics (they also created conda), which includes over 720 of the most popular Python packages for science, math, engineering and data analysis such as numpy, scipy, ipython notebook which is very useful specially for beginners.
Very interesting part of anaconda is the fact that it comes with different python distribution, conda itself, a way to manage environment and lots prepared packages which pre-installed. So basically the installation of anaconda leads to installation of conda as well.

Anaconda Installation:

1. Download Anaconda Linux installer from https://www.anaconda.com/download/#linux

Copy the link as I did here for python3.6 version of 64-bit(x86):

2. Install

first make the script executable:

  • chmod +x Anaconda3-5.0.1-Linux-x86_64.sh

and then install it:

  • ./Anaconda3-5.0.1-Linux-x86_64.sh

I installed it in the default directory. During the installation it will ask if you want to include the anaconda to the path, that “yes” will include following line to .bashrc:

  • export PATH=”/lhome/hrouhan/anaconda3/bin:$PATH”

So from now on the default python version will be the one which is in the above directory, in my case is python3.6:

  • hrouhan@hrouhani.org:~$ python
    Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
    [GCC 7.2.0] on linux
    Type “help”, “copyright”, “credits” or “license” for more information.
    >>>
  • hrouhan@hrouhani.org:~$ which python3.6
    /lhome/hrouhan/anaconda3/bin/python3.6

As a result of anaconda, the conda, related python version and over 720 open-source packages also will be installed. It is a good news since at the same time we have installed “conda”.

Take into consideration that python version that we have installed is located at following path:

  • /usr/bin/python

4. quick test:

After the installation, we have a default environment that can be used with the default installed anaconda packages. Lets have a test:

  • hrouhan@hrouhani.org:~$ python
    Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
    [GCC 7.2.0] on linux
    Type “help”, “copyright”, “credits” or “license” for more information.
    >>> import numpy
    >>> import matplotlib
    >>> exit()

We can see the default conda environment here (root *):

  • hrouhan@hrouhani.org:~$ conda env list
    # conda environments:
    root * /lhome/hrouhan/anaconda3

All default installed packages are in /lhome/hrouhan/anaconda/pkg/ which also can be seen with “conda list” command.

How to use Anaconda

As I mentioned earlier, after the installation of anaconda, we will have a default environment (root * /lhome/hrouhan/anaconda3) included with some packages that has been installed. We can check the list of installed packages in default anaconda environment by:

  • hrouhan@hrouhani.org:~$ conda list
    # packages in environment at /lhome/hrouhan/anaconda3:

If we need a package which is not included in default anaconda installation, we can install it from Continuum Analytics repository with following command: (however I recommend to install it in an virtual environment)

  • conda install <pkg name>

if the package that we need does not exit in anaconda repository, we can install it using “pip” or simply install it through source code using “setup.py”.

In order not to mess with default environment, I recommend to create a separate environment. It is specially useful when we are working on different project which each of them needs different packages or even different version of python.

 

Managing environment

There are two ways of creating virtual environment (isolated environment) within a current path:

  • Virtualenv
  • Conda built-in environment manager

I am more fan of Conda built-in environment due to following reasons:

  • Conda combines the functionality of pip & virtualenv in a single package. It basically means when you create a virtual environment with Conda, you still can use pip.
  • With conda environment we can manage different version of python including with installation of python. But Virtualenvs must be created upon an existing python and we can not install new python inside the environment.

I will go through both, in case some people still prefer Virtualenv (not me 🙂 )

a. Virtualenv

I will create a testProject1 environment by using python3.4:

  • hrouhan@hrouhani.org:~/pythonENV$ virtualenv –python=/usr/bin/python3.4 testProject1

and then we activate it:

  • hrouhan@hrouhani.org:~/pythonENV/testProject1$ . bin/activate
  • (testProject1)hrouhan@hrouhani.org:~/pythonENV/testProject1$ python –version
    Python 3.4.3

as an example we can install the numpy in this environment:

  • (testProject1)hrouhan@hrouhani.org:~/pythonENV/testProject1$ pip install numpy
  • (testProject1)hrouhan@hrouhani.org:~/pythonENV/testProject1$ pip list
    numpy (1.14.0)
    pip (1.5.4)
    setuptools (2.2)

 

b. Conda built-in environment manager

Let’s create a new environment called testProject with python3.6 by using conda. Keep it in mind that we need to include at least one package into the new created environment which in our case is python3.6.

  • hrouhan@hrouhani.org:~/CondaEnv$ conda create –name testProject1 python=3.6
    Fetching package metadata ………..
    Solving package specifications: .
    Package plan for installation in environment /lhome/hrouhan/anaconda3/envs/testProject1:
    The following NEW packages will be INSTALLED:
    ca-certificates: 2017.08.26-h1d4fec5_0
    certifi: 2017.11.5-py36hf29ccca_0
    libedit: 3.1-heed3624_0
    libffi: 3.2.1-hd88cf55_4
    libgcc-ng: 7.2.0-h7cc24e2_2
    libstdcxx-ng: 7.2.0-h7a57d05_2
    ncurses: 6.0-h9df7e31_2
    openssl: 1.0.2n-hb7f436b_0
    pip: 9.0.1-py36h6c6f9ce_4
    python: 3.6.4-hc3d631a_1
    readline: 7.0-ha6073c6_4
    setuptools: 38.4.0-py36_0
    sqlite: 3.20.1-hb898158_2
    tk: 8.6.7-hc745277_3
    wheel: 0.30.0-py36hfd4bba0_1
    xz: 5.2.3-h55aa19d_2
    zlib: 1.2.11-ha838bed_2
    Proceed ([y]/n)?

and then we need o activate it by:

  • hrouhan@hrouhani.org:~/CondaEnv$ source activate testProject

and now if we look at python we see that new pythong is used:

  • (testProject1) hrouhan@hrouhani.org:~/CondaEnv$ which python
    /lhome/hrouhan/anaconda3/envs/testProject1/bin/python

we can now install whatever we need in this environment with both conda and pip. Let’s install numpy here:

  • (testProject1) hrouhan@hrouhani.org:~$ conda install numpy
    Fetching package metadata ………..
    Solving package specifications: ….

and then I will install “matplotlib” with pip:

  • (testProject1) hrouhan@hrouhani.org:~$ pip install matplotlib

and afterward you check again the list of all packages exist in this environment by “conda list”.

Now we can exit by deactivating the environment (source deactivate). We can list all the environment that currently we have by:

  • hrouhan@hrouhani.org:~$ conda env list
    # conda environments:
    #
    testProject /lhome/hrouhan/anaconda3/envs/testProject
    testProject1 /lhome/hrouhan/anaconda3/envs/testProject1
    root * /lhome/hrouhan/anaconda3

 

At the end I recommend to use Isolated environment preferably with Conda. In this case you have full control over the environment, plus you can easily replicate the same environment and program in another system even outside of current system.

 

 

 

 

 

%d bloggers like this: