Main Proxy Concepts + Http Protocol

Proxy Server

There are some confusion regarding the proxy concepts. As I mentioned in Checkpoint firewall (https://hrouhani.org/checkpoint-firewall-crach-course/) there are in total 3 categories of firewall:

  • Packet Filters
  • Application-layer Gateways (Proxy-based firewalls or simply Proxy Server)
  • Stateful Inspection

The idea with Proxy-base firewalls or simply Proxy server is that do not let the client and server communicate directly with each other but all happens through a firewall in between which basically equipped with a daemon that emulate both server and client for different direction and basically specified services. So to put it in a very simple term, Proxy server acts as intermediary machine between Local area network and External area network which can be Internet but not necessarily.

Usually it is the case that Proxy server is appointed to the specific kind of network traffics (services in application layer). As can be seen in the following figure, for each services that proxy server has been configured, there should be an agent for incoming and outgoing traffics.

 

proxy1

The service or services that Proxy server is usually configured to be used in production environment is related to web traffics. Therefore such a proxy server is called Web Proxy Server. There are some devices in the market that are only web proxy devices (like ironport) and there are some others which does web proxy in parallel to other Firewall tasks. There are also some Open-Source solution for web proxy server such as Squid Proxy server.

There are many advantages of using a web proxy server in an organization such as following points:

  • Increasing Security: we can have a proper control over the people that can access any web content via implementing proper authentication mechanism. We can also blocking some websites by having the knowledge of malicious websites or those containing phishing links. In principle since the web proxy intermediating all traffics, we can inspect all traffics if needed.
  • Caching: web proxy also can cache data and server clients by the data from its cache. It is useful for reduce latency, reduce traffic and prioritizing the traffics.

It is important to keep mind that in big companies we can have several layers of web proxy servers like location based proxies and central proxy servers. We can define some sort of routing algorithms which determine whether contact the destination web address directly or through another Proxy server like central proxy servers.  I won’t go through detailed concept of how authentiation happens in web proxy, but will mention only the available methods:

  1. Basic user authentication : like authenticate to Local user db or Ldap directory
  2. NTLM authentication
  3. Kerberos authentication

There are basically two main types of Web proxy server which are:

  1. Forward proxy:

A proxy server is known as a forward proxy server when it acts as a proxy to the devices that are connecting to it. When people talk about a web proxy server, more often they are referring to a forward web proxy server.

proxy2

If we configure the web browsers on our computers to a proxy server that can access the Internet then this is known as a forward proxy. As can be seen in the above figure, the proxy can serve as a single point of access and control, making it easier to enforce security policies. A forward proxy is typically used in connection with a firewall to enhance an internal network’s security by controlling traffic originating from clients in the internal network that are directed at hosts on the Internet. Thus, from a security standpoint, a forward proxy is primarily aimed at enforcing security on client computers in your internal network.

So here we need a client side configuration to determine the Web Proxy server IP add and the port to be used. Let’s remind here that firewall and proxy server design in a big enterprise is usually as follow. It is possible to mix the role of firewall and proxy server in One device, but it is recommended. It is usually the case that specific web proxy server such as IronPort being used in parallel with a Firewall such as Checkpoint or Fortinet.

Following figure shows the design of a web proxy server usuage in a big companies.  It is possible to mix the role of firewall and proxy server in One device, but it is not recommended. It is usually the case that specific web proxy server such as IronPort being used in parallel with a Firewall such as Checkpoint or Fortinet.

proxy3

We have to keep it in mind that we need also a mechanism for Failover cluster for Web proxy. There are several solutions for this purpose, like using a built-in vendor solution such as in Ironport that we can configure multiple proxy server in active and backup mode. However it is usually the case that in big enterprises a Load Balancer being used. Following figure shows the design of having a load balancer in connection with several usually active Proxy servers:

Capture

Forward web proxy server itself can have different types based on the way it serves the clients. It also can have combination of all or some of them as I mention here the more common ones:

  • Http Proxy
  • Https Proxy (SSL proxy)
  • Ftp Proxy
  • Socks Proxy

A SOCKS server is a general purpose proxy server that establishes a TCP connection to another server on behalf of a client, then routes all the traffic back and forth between the client and the server. It works for any kind of network protocol on any port. SOCKS Version 5 adds additional support for security and UDP. The SOCKS server does not interpret the network traffic between client and server in any way, and is often used because clients are behind a firewall and are not permitted to establish TCP connections to servers outside the firewall unless they do it through the SOCKS server. Most web browsers for example can be configured to talk to a web server via a SOCKS server. Because the client must first make a connection to the SOCKS server and tell it the host it wants to connect to, the client must be “SOCKS enabled.”

An HTTP proxy is similar, and may be used for the same purpose when clients are behind a firewall and are prevented from making outgoing TCP connections to servers outside the firewall. However, unlike the SOCKS server, an HTTP proxy does understand and interpret the network traffic that passes between the client and downstream server, namely the HTTP protocol. Because of this the HTTP proxy can ONLY be used to handle HTTP traffic, but it can be very smart about how it does it. In particular, it can recognize often repeated requests and cache the replies to improve performance. Many ISPs use HTTP proxies regardless of how the browser is configured because they simply route all traffic on port 80 through the proxy server. (http://www.jguru.com/faq/view.jsp?EID=227532)

 

2. Reverse proxy:

As its name implies, it does exactly opposite of what a forward proxy does which are following:

  • A reverse proxy proxies in behalf of servers in contrast to forward proxy which proxies in behalf of clients.
  • A reverse proxy hides the identities of servers in contrast to forward proxy which hides the identities of clients.

 

proxy4

 The main difference with forward proxy server is that proxy server will requests resources on behalf of client. As a result, the client does not have any idea of the proxy server as he thinks all resources he needs reside in the same exact server. Therefore, client does not need any client side configuration, and all traffic are handled transparently. From now on by calling web proxy server, I mean forward web proxy server which is the normal proxy server, unless I mention specifically that I mean reverse web proxy server.

As I mentioned earlier in order clients be able to use the web proxy server and access resources in internet or other possible resources in intranet network, they need to know the IP address and the port of the web proxy server. Keep it in mind that any application that need to use internet, need the proxy setting configuration. It might be a case that we can configure system-wide proxy settings such as in Windows which is usually the case. Let’s here consider only how to configure proxy settings in the web browsers.

Methods of configuring the browser on how to find (use) Proxy servers:

  • a. Manual: this can be configured easily from browser. As an example we can see in the following figure how it can be done in Firefox:

proxy5

As can be seen, there is an option to enter the websites that does not need any proxy to access it. There is also a part to configure a specific URL in “Automatic proxy configuration URL” which is usually refer to a pac file as I will explain later.

  • b. Automatic Proxy configuration: There are several concepts and solution for automatic proxy configuration. One of the most famous one is using the PAC (proxy auto-config) file.

PAC (proxy auto-config) file

It’s nothing but proxy auto-configuration file which contains JavaScript that is used mainly by browsers to determine how the requests are handled. So basically it helps web browsers or user agents to find the appropriate proxy settings for each URL that user want to visit. Here we have more flexibilities in comparison to manual solution that we could configure only One Proxy server and port for all requests. But here with PAC file, we can have many different proxy settings for different situations. Following is a good example of pac file which is usually called proxy.pac:

proxy.pac
. function FindProxyForURL(url, host)
{
// If the IP address of the local machine is within a defined subnet, send to a specific proxy.
if (isInNet(myIpAddress(), “10.10.10.0”, “255.255.255.0”)
return “PROXY hrouhani:3128”;

// If the hostname matches
if (dnsDomainIs(host, “intranet.domain.com”) ||
shExpMatch(host, “(*.abcdomain.com|abcdomain.com)”))
return “PROXY hrouhani:3128”;

// If the protocol or URL matches
if (url.substring(0, 4)==”ftp:” ||
shExpMatch(url, “http://abcdomain.com/folder/*”))
return “PROXY y.hrouhani:3128”;

// If the requested website is hosted within the internal network, send to specific:
if (isPlainHostName(host) ||
shExpMatch(host, “*.local”) ||
isInNet(dnsResolve(host), “10.0.0.0”, “255.0.0.0”) ||
isInNet(dnsResolve(host), “172.16.0.0”, “255.240.0.0”) ||
isInNet(dnsResolve(host), “192.168.0.0”, “255.255.0.0”) ||
isInNet(dnsResolve(host), “127.0.0.0”, “255.255.255.0”))
return “PROXY x.hrouhani:3128”;

// If the IP address of the local machine is within a defined subnet, send to a specific proxy.
if (isInNet(myIpAddress(), “10.10.5.0”, “255.255.255.0”))
return “PROXY y.hrouhani:3128”;

//if non of the if-condition above works, it comes to the following one:
return “PROXY Main.hrouhani:3128”;
}

Now the main question is how configure the Browsers that can find the pac file automatically. Each browsers has an option to write the address of url which refer to pac file. It is usually the case that we write out pac file in a file und host it in a website and use the corresponding url for automatic configuration. We already have seen it in Firefox in “Automatic proxy configuration URL” in connection setting. There is similar option as an example in Internet Explorer as can be seen in the following figure:

proxy6

As you might know there is a possibility to configure proxy setting system wide in Windows which as a result effect all other web browsers. You can configure it in:

Control Panel -> Internet Options -> Connections -> LAN settings -> Use automatic config script

In windows we can do it system wide by Group Policy and set above configuration via it.

In Linux also we can configure the proxy setting differently. As an example in Firefox, besides referring the pac file manually by writing it in “Automatic proxy configuration URL” as we have seen previously, we can do it via configuration file and make it automatic: (it means after doing following configuration, it will automatically fill the “Automatic proxy configuration URL” in firefox setting fo us):

a. /usr/lib/firefox/defaults/pref/autoconfig.js (create new)
//first line need to be comment out!
pref(“general.config.filename”, “hrouhani.cfg”);
pref(“general.config.obscure_value”, 0);

b. /usr/lib/firefox/hrouhani.cfg
//first line need to be comment out!
pref(“network.proxy.type”, 2);
pref(“network.proxy.autoconfig_url”, “http://wpad.hrouhani.org/wpad.dat”);

One of the most important proxy setting in Linux is to configure it for Terminal. We can do it easily by exporting proxy settings as follow:

we can above lines in .bashrc to make it automatic. There are many other options here, like create a proxy.sh file in /etc/profile.d/ and add proxy configuration to it which will load proxy settings for all users or even add it to the /etc/profile. Off course it can be done for each user separately as well by doing it in ~/.profile.

PAC has two advantages over manual configurations:

  • pac files are centrally administered and easy to update and as a result decrease management tasks.
  • PAC has support for load balancing and failover which is not possible through manual configuration within browser’s proxy settings. Example:

function FindProxyForURL(url, host) {return “PROXY 192.168.0.222:9090; PROXY 192.168.0.223:9090”;}

As part of automatic configuration, the way we help browsers to find pac file plays an important role in our automatic proxy configuration. I mentioned already some ways of doing that like using Group Policy in windows or changing the browser configuration as we did for firefox. However there is another quite famous solution which is called WPAD (Web Proxy Autodiscovery Protocol).

 

Web Proxy Auto-Discovery Protocol (WPAD)

It is a method used by clients to automatically locate the URL of pac file using Dhcp or DNS discovery method. There is a need for minimal configuration in the web browser side which is usually achieved by enabling a menu which is usually called ‘Auto-Detect’. If the browser support both DHCP and DNS methods, will check the DHCP assignment first, before attempting the DNS method. I will explain both methods briefly here:

  1. DHCP: here the DHCP server need to be configured to serve an additional setting in an IP address assignment, ‘site-local’ option 252 (“auto-proxy-config”). The file basically does not need to have any special naming convention, however if DNS method also need to be used in parallel for pac file discovers, then the filename should be wpad.dat.

A Web browser implementing this method sends the DHCP server a DHCPINFORM query, the DHCP server will return the expected IP settings along with the 252 option which defines the location of the PAC file. The browser will then download this PAC file from the URL provided. (cisco.com)

Let’s see the example of configuring the DHCP server in Linux for option 252:
/etc/dhcp/dhcpd.conf
# do windows-style proxy autoconfig:
option local-proxy-config code 252 = text;

subnet 192.168.0.0 netmask 255.255.255.0 {
range 192.168.0.20 192.168.0.30;

option local-proxy-config “http://www.hrouhani.org/proxy.pac”;
}

2. DNS: here also the DNS server need to be configured in order to have a record for wpad.<domanName>. The interesting part of DNS way of locating the pac file is that client web browser needs to keep guessing the location of the pac file. On Windows, this is based on the domain the machine is joined to, while on Linux and Mac OS X this is based on the Search Domain(s) configured in the network settings.

When attempting the WPAD DNS method, the browser will prefix the domain with wpad and attempt to download the file wpad.dat. If, for example, the network name of the user’s computer is pc.department.branch.hrouhani.org, the browser will try the following URLs in turn until it finds a proxy configuration file within the domain of the client:

 

At the end I would like to mention another category for proxy server which is related to how we are using Web Proxy in our environment to filter the web traffics:

  • Transparent deployment
  • Explicit deployment

In a transparent proxy deployment, the user’s client browsers (or any software) is unaware that it is communicating with a proxy. So basically here client request the contents on Internet as usual without having any clue that a proxy server which has been configured in transparent mode intercepts all traffics.

In an explicit proxy deployment, the client (e.g. browser, desktop application etc.) is explicitly configured to use a proxy server, meaning the client knows that all requests will go through a proxy.

As we have seen proper underestanding of http protocol is quite necessary to grasp the concept of proxy. Here I will go briefly through the concept of http protocol.

 

HTTP

HTTP is an application layer protocol which uses typically TCP as transport layer protocol. In a very simple term, it is an asymmetric request-response client-server protocol which client sends a request message (pull info) and server returns a response message. TCP port 80 is the default port number for http protocol, but we can run http server at other user-assigned port number (1024-65535) such as 8080. If the user dose not define the port number when issues a URL, the browser by default connect to the port 80.

 

Capture1

But what is very important is the fact that http protocol itself is stateless which means the server will forget everything related to client/browser state. However it uses TCP which is connection orientated. But the two mentioned concept does not have anything to do with each other since they operates in different layers of 7 (application) and 4 (transport). The fact that http protocol is stateless is very useful since the web servers would not need to maintain the state of potentially millions of clients which come and go randomly.

However it’s usually the case that application needs to keep the state, or in another words it forced to behave stateful. This can be accomplished if the server sends the state to the client, and if the client to sends it back again to the server, every time. As an example what is in shopping cart when user buying some stuff needs to be kept. For such a situation, there are some solution that has been added into application such as cookies which I go through them here.

There are two ways this may be accomplished in HTTP:

  • Cookie as the main solution that has been designed for websites to remember stateful information.
  • URL extension, in which case the state is sent as part of the URL as response.

Now let’s have a look at how http messages are look like. In general we have 2 types of http message which are Request & Response messages. There is a small difference between these 2 types of messages which I will go through it here:

Request Message: it consists of following 3 sections as can be seen in the figure X:

  • Request line: It consist of 3 fields which are:
  1. Method: GET, POST, HEAD, PUT, DELETE, etc.
  2. Request-URI
  3. http-version
  • Headers
  • Message Body (optional)

 

request1

To add some extra info regarding the Request-URI (Uniform Resource Identifier), it is worthy to mention that it could have 4 options dependent on the nature of the request as follow:

  • Asterisk “*”: means that the request does not apply to a particular resource but to the server itself.
  • absoluteURI: used when the request is being made to a proxy server. In this case, the proxy server must be able to extract all needed info from this absoluteURI and not from headers. Let’s have an example:

GET https://hrouhani.org/main-proxy-concepts/  HTTP/1.1

  • absolutePath (abs_path): used when the request is being made to an original server. In this case, the network location of the web server must be transmitted in a Host header field and not in Request-URI, let’s have a same example:
  • Authority: used only by the CONNECT method which reserves the method name CONNECT for use with a proxy that can dynamically switch to being a tunnel (e.g. SSL tunneling).

 

After the origin server where the resources resides, got the request message, it will look at the URI, headers and http method if it can handle the message. If yes, it will process and generate a Response message.

Response Message: it consists of following 3 sections:

  • Status line: it consists of 3 fields:
    • version
    • status code
    • phrase

     

  • Headers
  • Message Body(optional)

 

 

 

 

 

 

%d bloggers like this: