what happens when you type https://www.holbertonschool.com in your browser and press Enter?

Andrés Felipe García Rendón
21 min readApr 8, 2020

Before explaining what happens we must be clear about the basic concepts of web infrastructure:

URL is:

Uniform resource locator. Defines the web address of a specific resource. A URL is made up of:
* Protocol: http: //, https: //, ftp: // Also define on which port the server is requested.
* host: www.example.com will be resolved by DNS = hostname to ip address
* Path: /index.html or / states / 1 / cities path from the root of the website or webservice on the host
* Query string: always starts with the symbol? and follows a line of key value parameters separated by &

HTTP is:

It is a hypertext transfer protocol. It is the most used on the internet. It is an asymmetric client-server request-response protocol. An HTTP client sends a request message to an HTTp server. The server returns a response message.
The scheme for a HTTP URL:

protocolo // host: port/ path? query-string # fragment-id

DNS request:

DNS is, in simple words, the technology that translates human-adapted, text-based domain names to machine-adapted, numerical-based IP

Be sure to know the main DNS record types:

1. A Record:

An A record maps a domain name to the IP address (Version 4) of the computer hosting the domain. An A record is used to find the IP address of a computer connected to the internet from a name.

The A in A record stands for Address. Whenever you visit a web site, send an email, connect to Twitter or Facebook, or do almost anything on the Internet, the address you enter is a series of words connected with dots.

For example, to access the DNSimple website you enter www.dnsimple.com. At our name server there’s an A record that points to the IP address 208.93.64.253. This means that a request from your browser to www.dnsimple.com is directed to the server with IP address 208.93.64.253.

A Records are the simplest type of DNS records, and one of the primary records used in DNS servers.

You can do a lot with A records, including using multiple A records for the same domain in order to provide redundancy. Additionally, multiple names could point to the same address, in which case each would have its own A record pointing to that same IP address.

A record format

The structure of an A record follows the standard top-level format definition defined RFC 1035. The RDATA section is composed of one single element:

addressA 32 bit Internet address representing an IPv4 address

Hosts that have multiple Internet addresses have multiple A records.

The canonical representation is:

A <address>

where <address> is an IPv4 address and looks like 162.159.24.4.

In DNSimple, the A record is represented by the following customizable elements:

NameThe host name for the record, without the domain name. This is generally referred to as “subdomain”. We automatically append the domain name.TTLThe time-to-live in seconds. This is the amount of time the record is allowed to be cached by a resolver.AddressThe IPv4 address the A record points to.

2. CNAME Record:

A Canonical Name record (abbreviated as CNAME record) is a type of resource record in the Domain Name System (DNS) which maps one domain name (an alias) to another (the canonical name).

This can prove convenient when running multiple services (like an FTP server and a web server, each running on different ports) from a single IP address. One can, for example, point ftp.example.com and www.example.com to the DNS entry for example.com, which in turn has an A record which points to the IP address. Then, if the IP address ever changes, one only has to record the change in one place within the network: in the DNS A record for example.com.

CNAME records must always point to another domain name, never directly to an IP address.

3. MX Record:

A mail exchanger record (MX record) specifies the mail server responsible for accepting email messages on behalf of a domain name. It is a resource record in the Domain Name System (DNS). It is possible to configure several MX records, typically pointing to an array of mail servers for load balancing and redundancy.

4. TXT Record:

A TXT record (short for text record) is a type of resource record in the Domain Name System (DNS) used to provide the ability to associate arbitrary text with a host or other name, such as human readable information about a server, network, data center, or other accounting information.[1]

It is also often used in a more structured fashion to record small amounts of machine-readable data into the DNS.

The root domain and sub domain — differences

A root domain is the parent domain to a sub domain, and its name is not, and can not be divided by a dot.

While creating any domain at a website of domain provider, the provider system will always ask you to choose a domain name without a dot in the name. In other words, the address of the root domain may be mydomain.com but it can never be my.domain.com. Domain providers block the ability to create such a root domain until you type a name without the dot. Why?

The dot in the domain name delimits the sub domain name (the part of the name before the dot, eg. www.my.) and the root domain name ( the part after the dot, ie .domain.com). This means that the address my.domain.com is a sub domain of the root domain, whose name is domain.com

In an administrator panel at domain provider account, you can create any number of sub domains. This means that for the root domain called domain.com it is possible to create different sub domains eg. my.domain.com, your.domain.com, holberton.domain.com… Creating multiple sub domains is always free and does not require you to set up new accounts on a domain provider website.

As you can see, all of the domain addresses used as an example (above) do not start with the www prefix. www is also a sub domain. The www prefix always leads to the main domain

TCP/IP:

The TCP / IP model is a description of network protocols developed by Vinton Cerf and Robert E. Kahn in the 1970s. It was implemented in the ARPANET network, the first wide area network (WAN), developed on behalf of DARPA , an agency of the United States Department of Defense, and predecessor of the Internet; For this reason, it is sometimes also called the DoD model or the DARPA model.

The TCP / IP model is used for network communications and, like any protocol, describes a set of general operating guides to allow a computer to communicate on a network. TCP / IP provides end-to-end connectivity specifying how data should be formatted, routed, transmitted, routed, and received by the recipient.

The TCP / IP model and related protocols are maintained by the Internet Engineering Task Force.

To achieve reliable data exchange between two computers, many separate procedures must be carried out. The result is that communications software is complex. With a layered or tiered model it is easier to group related functions and implement modular communications software.

The layers are nested. Each layer is built on its predecessor. The number of layers and, in each of them, their services and functions are variable with each type of network. However, in any network, the mission of each layer is to provide services to the upper layers making them transparent the way those services are carried out. In this way, each layer should exclusively deal with its immediately lower level, to whom it requests services, and the immediately higher level, to which it returns results.

Firewall

In computing, a firewall is a network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules.[1] A firewall typically establishes a barrier between a trusted internal network and untrusted external network, such as the Internet.[2]

Firewalls are often categorized as either network firewalls or host-based firewalls. Network firewalls filter traffic between two or more networks and run on network hardware. Host-based firewalls run on host computers and control network traffic in and out of those machines.

HTTPS/SSL

Hyper Text Transfer Protocol Secure (HTTPS) is the secure version of HTTP, the protocol over which data is sent between your browser and the website that you are connected to. The ‘S’ at the end of HTTPS stands for ‘Secure’. It means all communications between your browser and the website are encrypted. HTTPS is often used to protect highly confidential online transactions like online banking and online shopping order forms.

Web browsers such as Internet Explorer, Firefox and Chrome also display a padlock icon in the address bar to visually indicate that a HTTPS connection is in effect.

How Does HTTPS Work?

HTTPS pages typically use one of two secure protocols to encrypt communications — SSL (Secure Sockets Layer) or TLS (Transport Layer Security). Both the TLS and SSL protocols use what is known as an ‘asymmetric’ Public Key Infrastructure (PKI) system. An asymmetric system uses two ‘keys’ to encrypt communications, a ‘public’ key and a ‘private’ key. Anything encrypted with the public key can only be decrypted by the private key and vice-versa.

As the names suggest, the ‘private’ key should be kept strictly protected and should only be accessible the owner of the private key. In the case of a website, the private key remains securely ensconced on the web server. Conversely, the public key is intended to be distributed to anybody and everybody that needs to be able to decrypt information that was encrypted with the private key.

What is a HTTPS certificate?

When you request a HTTPS connection to a webpage, the website will initially send its SSL certificate to your browser. This certificate contains the public key needed to begin the secure session. Based on this initial exchange, your browser and the website then initiate the ‘SSL handshake’. The SSL handshake involves the generation of shared secrets to establish a uniquely secure connection between yourself and the website.

When a trusted SSL Digital Certificate is used during a HTTPS connection, users will see a padlock icon in the browser address bar. When an Extended Validation Certificate is installed on a web site, the address bar will turn green.

Why Is an SSL Certificate Required?

All communications sent over regular HTTP connections are in ‘plain text’ and can be read by any hacker that manages to break into the connection between your browser and the website. This presents a clear danger if the ‘communication’ is on an order form and includes your credit card details or social security number. With a HTTPS connection, all communications are securely encrypted. This means that even if somebody managed to break into the connection, they would not be able decrypt any of the data which passes between you and the website.

Benefits of Hypertext Transfer Protocol Secure

The major benefits of a HTTPS certificate are:

  • Customer information, like credit card numbers, is encrypted and cannot be intercepted
  • Visitors can verify you are a registered business and that you own the domain
  • Customers are more likely to trust and complete purchases from sites that use HTTPS

HTTP and HTTPS: What do they do, and how are they different?

You click to check out at an online merchant. Suddenly your browser address bar says HTTPS instead of HTTP. What’s going on? Is your credit card information safe?

Good news. Your information is safe. The website you are working with has made sure that no one can steal your information.

Instead of HyperText Transfer Protocol (HTTP), this website uses HyperText Transfer Protocol Secure (HTTPS).

Using HTTPS, the computers agree on a “code” between them, and then they scramble the messages using that “code” so that no one in between can read them. This keeps your information safe from hackers.

They use the “code” on a Secure Sockets Layer (SSL), sometimes called Transport Layer Security (TLS) to send the information back and forth.

How does HTTP work? How is HTTPS different from HTTP? This tutorial will teach you about SSL, HTTP and HTTPS.

How Does HTTP Work?

In the beginning, network administrators had to figure out how to share the information they put out on the Internet.

They agreed on a procedure for exchanging information and called it HyperText Transfer Protocol (HTTP).

Once everyone knew how to exchange information, intercepting on the Internet was not difficult. So knowledgeable administrators agreed upon a procedure to protect the information they exchanged. The protection relies on SSL Certificate to encrypt the online data. Encryption means that the sender and recipient agree upon a “code” and translate their documents into random-looking character strings.

The procedure for encrypting information and then exchanging it is called HyperText Transfer Protocol Secure (HTTPS).

With HTTPS if anyone in between the sender and the recipient could open the message, they still could not understand it. Only the sender and the recipient, who know the “code,” can decipher the message.

Humans could encode their own documents, but computers do it faster and more efficiently. To do this, the computer at each end uses a document called an “SSL Certificate” containing character strings that are the keys to their secret “codes.”

SSL certificates contain the computer owner’s “public key.”

The owner shares the public key with anyone who needs it. Other users need the public key to encrypt messages to the owner. The owner sends those users the SSL certificate, which contains the public key. The owner does not share the private key with anyone.

The security during the transfer is called the Secure Sockets Layer (SSL) and Transport Layer Security (TLS).

The procedure for exchanging public keys using SSL Certificate to enable HTTPS, SSL and TLS is called Public Key Infrastructure (PKI).

Load-balancer

When you have an enterprise application or website that gets lot of hits, your server might be under heavy load. In that case, you may want to consider distributing the load across multiple servers.

Load balancer will distribute the work-load of your system to multiple individual systems, or group of systems to to reduce the amount of load on an individual system, which in turn increases the reliability, efficiency and availability of your enterprise application or website.

In this article, we’ll cover the basics of software and hardware load balancer, and explain the various algorithms used by the load balancers.

The following are the advantages of load balancing your application:

  • Reduced the work-load on an individual server.
  • Large amount of work done in same time due to concurrency.
  • Increased performance of your application because of faster response.
  • No single point of failure. In a load balanced environment, if a server crashes the application is still up and served by the other servers in the cluster.
  • When appropriate load balancing algorithm is used, it brings optimal and efficient utilization of the resources, as it eliminates the scenario of some server’s resources are getting used than others.
  • Scalability: We can increase or decrease the number of servers on the fly without bringing down the application
  • Load balancing increases the reliability of your enterprise application
  • Increased security as the physical servers and IPs are abstract in certain cases.

On a high level, there are two types of load balancers, which implements different types of scheduling algorithms and routing mechanisms.

  1. Software load balancer
  2. Hardware load balancer

Software Load Balancers

Software load balancers generally implements a combination of one or more scheduling algorithms.

The following are the three different basic algorithms used by load balancers. Most modern load balancers use combination of these algorithms to reach high performance and to set a trade off between various parameters.

Weighted Scheduling Algorithm

Work is assigned to the server according to the weight assigned to the server. For different types of the server in the group different weights are assigned thus the load gets distributed.

The diagram below depicts a generic scenario where a load balancer is used and how the weighted scheduling will work if there are total of 10 request ( R1, R2…… R10 ) coming to the server farm/cluster.

As you see, the request will be assigned to the server as per its weight. This weight is determined by the administrators wisely by considering the hardware capabilities of each server. Assuming that we have different weights assigned to each server as you can see in the figure below.

The load balancer will compute the percentage of traffic to be sent to a particular server according to the weight assigned to it.

When is this algorithm mostly used?: This is used when there is a considerable difference between the capabilities and specification of the servers present in the farm or cluster.

This algorithm stands out to be efficient in managing the load without swarming the low capability servers the most and efficiently utilizing the available server resource at any instant of time.

The following points can be noted in the above diagram:

  • A load balancer can be of two types: hardware load balancer and software load balancer.
  • Software load balancer are often installed on the servers and consumes the processor and memory of the servers. So, in the diagram above software load balancer is over lapping the server farm.
  • Hard ware load balancers are specialized hardware deployed in-between server and the client. It can be a switching/routing hardware or even a dedicated system running a load balancing software with specialized capabilities.

Round Robin Scheduling

Requests are served by the server sequentially one after another. After sending the request to the last server, it starts from the first server again.

The diagram below depicts this approach. Sequentially each request gets assigned to each server one by one and the round goes on. The change in the request assigned can be easily understood by looking into the diagram below.

This algorithm is used when servers are of equal specification and there not much persistent connections.

Fig: Load Balancer — Round Robin Algorithm

Least Connection First Scheduling

Requests are served first to the server which is currently handling least number of persistent connections.

In the diagram above, if we are using the least connection first scheduling algorithm, the request R5 can be assigned randomly to any of the server as when R5 will be coming every server will be having same number of connections.

Lets say when request R5 comes, request R3 got completed and server S3 is free now. Then, in that case, the request R5 will be assigned to server S3 instead of any other server as server S3 is having least no of connection at that instant of time.

When is this algorithm used?: When we have large number of persistent connections in the traffic unevenly distributed between the servers. It is often coupled with Sticky Session or Session aware load balancing. In this, all the request related to a session is sent to the same server to maintain the session state and syncronization.

This approach is used when we have session aware write operations in sync with client and the server so that it avoids any inconsistency.

Now, load balancing softwares can have the smart implementation of the combination of the above three basic scheduling algorithm. Such implementations are Weighted round robin scheduling and Weighted least connection scheduling.

Many hybrid scheduling algorithm for load balancing has evolved using some variations or combinations of the above algorithms.

Software Load Balancer Examples

The following are few examples of software load balancers:

  1. HAProxy — A TCP load balancer.
  2. NGINX — A http load balancer with SSL termination support. (install Nginx on Linux)
  3. mod_athena — Apache based http load balancer.
  4. Varnish — A reverse proxy based load balancer.
  5. Balance — Open source TCP load balancer.
  6. LVS — Linux virtual server offering layer 4 load balancing

Hardware Load Balancers

Load balancing hardwares are often referred as specialized routers or switches which are deployed in between the servers and the client. It can also be a dedicated system in between the the client and the server to balance the load.

The hardware load balancers are implemented on Layer4 (Transport layer) and Layer7 (Application layer) of OSI model so prominent among these hardwares are L4-L7 routers.

Layer4 Hardware Load Balancing

These kind of load balancers work on transport layer of OSI model and make use of TCP, UDP and SCTP transport layer protocol details to make decision on which server the data is to be sent.

Layer 4 load balancers are mostly the network address translators (NATs) which shares load to the different servers getting translated to by these loadbalancer. These routers hide multiple servers behind them and translate every response data packets coming from the server to be coming from same ipaddress.

Similarly, when there is a request they reverse translate the request using the mapping table and distributes it among the multiple servers.

DNS load balancing: In DNS based load balancing method the Domain Name Servers are configured to return different ipaddress for different systems. This approach creates a load balancing effect whenever there is a dns lookup.

The diagram below depicts the highlevel overview of Layer 4 and Layer 7 load balancer working and techniques on OSI layer.

Direct routing: This is a yet another configuration of hardware load balancing where the routers are aware of the server mac addresses and server may be ARP( Address resolution Protocol) disabled.

In direct routing, it is direct in the sense that all the income traffic is routed by the load balancer however all the outgoing traffic direct reaches the client which makes it super fast load balancing configuration.

Tunnel or a IP tunneling often looks like Direct routing where response is directly sent to client however the traffic between the router and the server can be routed.

In this, client sends the request to the virtual IP of load balancer which further encapsulates the IP packets, keeps a hast table and distributes it to the different servers as per the configured load balancing technique.

When the server is getting back the response, it decapsulates it and send back to the client directly according to the hash table which it has stored. This record is eventually removed from hash table when the connection is closed or there is a timeout.

Layer7 Hardware Load Balancing

This type of load balancers makes the decision according to the actual content of the message (URLs, cookies, scripts) since HTTP exists on the layer7.

These layer7 hardware actually form a ADN (Application delivery network) and they pass-on request to the servers as per the type of the content.

For example, the request for image will go to an image server, request for PHP scripts may to another server, HTML, js and css like static content may go to another one and request to any media content may go to yet another server.

So, here a load balancing effect is achieved by distributing load according to the type to content requested.

For this, it is also very helpful to understand the fundamentals of TCP/IP Protocol with the different layers.

The diagram below depicts a Layer7 load balancer.

Fig: Load Balancer — Layer 7

Web server

A web server is server software, or hardware dedicated to running this software, that can satisfy client requests on the World Wide Web. A web server can, in general, contain one or more websites. A web server processes incoming network requests over HTTP and several other related protocols.

The primary function of a web server is to store, process and deliver web pages to clients.[1] The communication between client and server takes place using the Hypertext Transfer Protocol (HTTP). Pages delivered are most frequently HTML documents, which may include images, style sheets and scripts in addition to the text content.

A user agent, commonly a web browser or web crawler, initiates communication by making a request for a specific resource using HTTP and the server responds with the content of that resource or an error message if unable to do so. The resource is typically a real file on the server’s secondary storage, but this is not necessarily the case and depends on how the web server is implemented.

While the major function is to serve content, a full implementation of HTTP also includes ways of receiving content from clients. This feature is used for submitting web forms, including uploading of files.

Many generic web servers also support server-side scripting using Active Server Pages (ASP), PHP (Hypertext Preprocessor), or other scripting languages. This means that the behaviour of the web server can be scripted in separate files, while the actual server software remains unchanged. Usually, this function is used to generate HTML documents dynamically (“on-the-fly”) as opposed to returning static documents. The former is primarily used for retrieving or modifying information from databases. The latter is typically much faster and more easily cached but cannot deliver dynamic content.

Web servers can frequently be found embedded in devices such as printers, routers, webcams and serving only a local network. The web server may then be used as a part of a system for monitoring or administering the device in question. This usually means that no additional software has to be installed on the client computer since only a web browser is required (which now is included with most operating systems).

Application server

Also called an appserver, an application server is a program that handles all application operations between users and an organization’s backend business applications or databases.

An application server is typically used for complex transaction-based applications. To support high-end needs, an application server has to have built-in redundancy, monitor for high-availability, high-performance distributed application services and support for complex database access.

See the Server Types page in the quick reference section of Webopedia for a comparison of server types.

Database

First, what are databases for?

Storing data in your application (in memory) has the obvious shortcoming that, whatever the technology you’re using, your data dies when your server stops. Some programming languages and/or frameworks take it even further by being stateless, which, in the case of an HTTP server, means your data dies at the end of an HTTP request. Whether the technology you’re using is stateless or stateful, you will need to persist your data somewhere. That’s what databases are for.

Then, why not store your data in flat files, as you did in the “Relational databases, done wrong” project? A solid database is expected to be acid, which means it guarantees:

Atomicity: transactions are atomic, which means if a transaction fails, the result will be like it never happened.

Consistency: you can define rules for your data, and expect that the data abides by the rules, or else the transaction fails.

Isolation: run two operations at the same time, and you can expect that the result is as though they were ran one after the other. That’s not the case with the JSON file storage you built: if 2 insert operations are done at the same time, the later one will fetch an outdated collection of users because the earlier one is not finished yet, and therefore overwrite the file without the change that the earlier operation made, totally ignoring that it ever happened.

Durability: unplug your server at any time, boot it back up, and it didn’t lose any data.

Also, a solid database will provide strong performance (because I/O is your bottleneck and databases are I/O, so their performance makes a whole lot more of a difference than the performance of your application’s code) and scalability (inserting one user in a collection of 5 users should take about the same time as inserting one user in a collection of 5 billion users).

Now, what happens when I enter the domain of a web page in my internet browser?
Suppose you are working on your local machine and you are creating a web page, you already have your web server configured and you have the domain associated with your server. Locally you can type the local host in the browser or simply 127.0.0.1 and it will output what you have in your html file. Although it works locally, when you enter that same IP address on another network, it is most likely that a message like the following will appear:

This occurs because that ip is local and is not associated with any server other than the local host. On the other hand, if you enter an existing domain, for example https://holbertonschool.com, it will bring the google home page regardless of what network the computer is connected to where you would be performing these tests. This occurs thanks to the fact that DNS allows the “translation” of the domain name https://holbertonschool.com to the ip address of the web server where the html file of the website is stored.

All images are references taken from google images except for the second image that is from by my. attached to that image finding a link from my repository where there is a project description of web infrastructure

Reference:

--

--

Andrés Felipe García Rendón

Web Developer graduated from Holberton School. With experience in as C, Python, NodeJS, React, React Native, HTML, CSS, Boostrap and Flexbox