What happens when you type https://www.google.com in your browser and press Enter

What happens when you type https://www.google.com in your browser and press Enter

Searching the internet is second nature to most of us. Every day we consult countless websites to find information, but do we ever stop to wonder how it works? Whenever we open our browsers either Google Chrome, Safari, Mozilla, OperaMini etc and type in a URL either for a social media website like www.facebook.com or a search engine such as www.google.com. A lot of things occur behind the scenes between the time we pressed enter after typing the URL and when the web page comes up. Surprisingly a whole lot of things occur during those few seconds or should I say milliseconds depending on how fast the network is.

In this post, we will be looking at all the end-to-end processes that occur behind the scenes, exploring concepts such as DNS requests, TCP/IP, firewalls, HTTPS/SSL, load balancers, web servers, application servers, and databases.

Firstly, let’s look at what websites, servers, domain names and IP addresses are and their relationship

WEBSITE

A website is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. They are typically dedicated to a particular topic or purpose, such as the news, education, commerce, entertainment or social networking. They contain collections of files which are often written in HTML, CSS and Javascript which tells your browser how to display the site’s text, images, and data.

In other, for them to be accessible to everyone they have to be hosted on a powerful external computer connected to the Internet, called a web server which stores the files

DOMAIN NAME

A domain name is a unique name that appears after the www. in a web address. It is easy to remember by the user and is usually associated with a physical IP address on the internet. For example, this URL https://domainname.com might translate to https://43.205.76.45. So instead of trying to memorize these numbers, the internet links the numbers to a name that one would easily remember.

There are three parts to a domain name
1. Domain name: google is the domain name in google.com.

2. Top-level domain: This is the suffix at the end of the URL. Examples include .com, .org, or .blog.

3. Subdomain: This is the prefix that further classifies a domain, such as subdomain.google.com. Some popular subdomain names include www, ftp, en, etc. The subdomains are created to organize and navigate to different sections of a website. Multiple subdomains or child domains can be linked to the main domain.

IP addresses

An Internet Protocol (IP) address is a series of numbers separated by periods e.g 289.34.567.22. The IP address identifies the physical location of a particular device on the internet network. It distinguishes one computer from another. It’s just like phone numbers, no two persons in the world have the same numbers, so you’ll have to dial their exact number to reach them. Some domain names are usually connected to more than one IP address

SERVERS

A server is a piece of computer hardware or software that serves another computer, device, or program called a “client” to which they provide functionality. They are used to store, send and receive data. There are 4 major types of servers such as Proxy servers, Print servers, Network servers, Application Servers and Web servers.

A WEB SERVER is a software and hardware that uses HTTP (Hypertext Transfer Protocol) and other protocols to respond to client requests made over the World Wide Web. It stores all the major files connected to a website, so when a person goes to a web browser and types in a website link, the browser forms a connection with a web server and requests the page files that are linked to the site. Hence, the server delivers those stored files to the searcher’s personal computer as a complete website. Examples of Popular web servers include Nginx, Apache HTTP, Lighttpd and Microsoft Internet Information Services (IIS).

So, when an internet searcher types in a website URL e.g https://www.thewebsite.com, the URL is divided into three parts by the browser:

  • The file name: web-server.htm i.e the HTML file

Each of these parts has a different responsibility when it comes to interacting with a web server.

1. Hypertext Transfer Protocol

The hypertext transfer protocol (HTTP) is the language browsers and web servers use to communicate. It is a request-response protocol that gives users a way to interact with web resources such as HTML files by transmitting hypertext messages between clients and servers. HTTP clients generally use Transmission Control Protocol (TCP) connections to communicate with servers. So when a URL is typed, a browser delivers an HTTP request to a web server, and the web server transfers the hypertext to the searcher’s internet browser.
When a server receives a request, it checks whether or not the requested URL matches an existing file. If it does, it will speedily return the requested file. If the file does not exist, it will return an error page that’s the 404 page you commonly see on the browser when a page is not found

2. Domain Name System

The next part of the equation is the Domain Name System (DNS), which translates easy-to-remember domain names to their numerical IP addresses. When you type a domain name into a browser, your internet service provider views the DNS that is tied to the domain name, translates it into a computer-friendly IP address, and then directs your internet connection to the server connected to that IP address, delivering up a set of stored files. These stored files show up as a website.

3. File Name

The web server stores all of the data files related to each unique domain name. This includes all the content, HTML documents, images, CSS stylesheets, videos, fonts, JavaScript files, and more — basically, everything that converts into organized text, design, images, or videos when you see a website.
And that’s how websites, servers, IP addresses and domain name relates to one another

Here’s a more detailed explanation of how everything works

What is DNS request and response?

A DNS query is a message sent by the client to the DNS server. It contains a list of “questions” that the DNS server will reply to with an answer. A DNS query can contain multiple questions that the server will reply to, but a server might also reply with its additional information. An example of a DNS query could be to locate the IP address for the domain name in the URL.

The DNS process works as follows:

  1. A browser, application or device called the DNS client, issues a DNS request or DNS address lookup, providing a hostname e.g. “google.com”.

  2. The request is received by a DNS resolver, which is responsible for finding the correct IP address for that hostname. The DNS resolver looks for a DNS name server that holds the IP address for the hostname in the DNS request.

  3. The resolver starts from the Internet’s root DNS server, moving down the hierarchy to Top Level Domain (TLD) DNS servers (“.com” in this case), down to the name server responsible for the specific domain “google.com”.

  4. When the resolver reaches the authoritative DNS name server for “google.com”, it receives the IP address and other relevant details and returns it to the DNS client. The DNS request is now resolved.

  5. The DNS client device can connect to the server directly using the correct IP address.

TCP/IP

TCP/IP stands for Transmission Control Protocol/Internet Protocol which is a suite of communication protocols used to interconnect network devices on the Internet.

The two protocols serve specific functions. The internet protocol obtains and defines the address i.e. the IP address of the application or device the data must be sent to. Then the TCP transports and routes data which are broken down into packets through the network architecture and ensures that each packet gets delivered to the destination application or device that IP has defined. So, once the browser obtains the IP address, it establishes a TCP connection with the web server hosting the requested website.

Common TCP/IP protocols include the following:

  • Hypertext Transfer Protocol (HTTP) which handles the communication between a web server and a web browser just like we’ve explained before.

  • HTTP Secure handles secure communication between a web server and a web browser.

  • File Transfer Protocol handles the transmission of files between computers.

FIREWALL

Before the TCP connection is established, it often encounters firewalls. To protect one's network from hackers, unauthorized users, viruses, malware or other malicious software from accessing sensitive information and either disrupting operations or holding the company ransom for its data, Firewalls are being installed

A firewall is a network security device that monitors incoming and outgoing network traffic and decides whether to allow or block specific traffic based on a defined set of security rules such as checking the source address, the source port, the destination address, the destination port, and an indication of whether or not the traffic should be permitted. They block traffic coming from suspicious sources to prevent cyberattacks. It only welcomes incoming traffic that it has been configured to accept. The firewall acts like a gatekeeper at your computer’s entry point which only allows trusted sources, or IP addresses, to enter your network.

Firewall accepting good traffic

Firewall preventing bad traffic

A firewall can be hardware, software, software-as-a-service (SaaS), public cloud, or private cloud (virtual).

What is HTTPS/SSL?

Sometimes you would notice that some website URL use http://example.com while others use https://example.com.

HTTPS (Hyper Text Transfer Protocol Secure) as the name implies is the secure version of HTTP.

Secure Sockets Layer (SSL) is a digital security feature that enables an encrypted connection between a website and a browser. SSL aims to provide a safe and secure way to transmit sensitive data, including personal information, credit card details, and login credentials.

The SSL protocol can only be used by websites with an SSL certificate, a digital document that validates a site’s identity.

Websites that install and configure an SSL certificate can run on HTTPS to establish a secure connection with a web server while those without an SSL certificate will run on HTTP and transfer data in plain text which means that anyone could intercept and retrieve messages whether sensitive or not.

Here’s an overview of how the SSL secures a website:

  • First, a website owner purchases an SSL certificate from a Certificate Authority (CA) and installs it on their site.

  • When a visitor navigates through the website, the browser and the web server establish an SSL connection using a method called SSL handshake.

  • During the SSL handshake, the browser asks the server for its SSL certificate and public key to prove its validity.

  • Once the certificate is verified, the browser and web server exchange private and public keys to create a symmetric session key.

  • Both parties then use this symmetric key to encrypt all communications and data would be transferred in encrypted keys which makes it difficult for anyone to intercept or retrieve any message. This key will remain valid for a limited time and only for that particular session.

Once the SSL protocol has been enabled, the website will be secure and encrypted. Unauthorized third parties will no longer be able to intercept its communication.

To identify whether a website uses the SSL protocol, you can check the browser’s address bar for a padlock icon. In some browsers the address bar turns green

To view detailed information about the digital certificate such as the issuer and valid date you can click on the padlock icon.

What is Load balancing?

Load balancing is the process of distributing a set of tasks over a set of resources, to make their overall processing more efficient.

Modern high‑traffic websites like Google, Facebook, Twitter, etc usually serve hundreds of thousands, if not millions, of concurrent requests from users or clients and return the correct text, images, video, or application data, all in a fast and reliable manner. To cost‑effectively meet these high volumes, they require adding more servers. Hence Load Balancers are installed to aid the distribution of traffic to the different servers.

A load balancer acts as the “traffic cop” which sits in front of the servers and routes client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization. It also ensures that no server is overworked, which could degrade their performance. If a single server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added to the server group, the load balancer automatically starts to send requests to it.

Load Balancers could be Hardware or Software. Examples of some Software Load Balancers include:
1. HAProxy — A TCP load balancer.
2. NGINX — An HTTP load balancer with SSL termination support.
3. mod_athena — Apache-based HTTP load balancer.
4. Varnish — A reverse proxy-based load balancer.

Application Servers

We previously discussed Web servers, let’s dive into another type of server known as an Application Server
An application server which is also called an app server is a program on a computer that handles all application operations between users and an organization’s backend business applications or databases.

Application servers vs. web servers

Application servers and web servers have a complex and connected relationship. Generally speaking, web servers provide only static content e.g. HTML pages, files, images, and video and serve only HTTP requests. Application servers can deliver web content too, but their primary job is to enable interaction between end-user clients and server-side application code—the code representing what is often called business logic—to generate and deliver dynamic content, such as transaction results, decision support, or real-time analytics.

In some server architectures, application servers sit behind web servers and provide all the dynamic content when it’s requested, as the earlier graphic shows. Web and application servers overlap, and in many web applications, both a web server and an application server must work together to deliver dynamic content on a web page.

Large web applications with high traffic typically use multiple servers, at least one web server and one application server.

Examples of Popular web servers include Apache Tom, Oracle Weblogic, IBM Websphere

Database

If the requested website relies on data stored in a database, the application server communicates with the database server. The database server retrieves and processes the requested data, returning it to the application server. The application server then combines this data with the HTML template to generate a fully customized webpage. So, in other to efficiently manage and store the data the website produces, databases are being used.

A database is a collection of data that is organized so that the information within can be easily accessed later. Databases power everything from banking software to scientific research to government records, as well as the websites you use every day, like Amazon, YouTube, Netflix, and Wikipedia. If you found this page through an Internet search engine, your search was powered by a database.
The data within a database needs to be arranged according to a consistent, logical set of underlying principles. The term data model describes the logical structure of a database, which determines the rules for how the information within can be organized and manipulated.

Databases are usually classified according to their data model. Some popular models include relational data models, hierarchical data models, network data models and so on. To interact with the database, a database management system (DBMS) software is being used to define, store, manipulate, and retrieve the data inside those databases. Some examples of database management systems include Network DBMS and Relational DBMS
A Relational Database Management System is software used to store, manage, query, and retrieve data stored in a relational database. Relational database represents data in the form of tables. Examples include MySQL, PostgreSQL, Oracle Database and so on.

Network database management systems (Network DBMSs) are based on a network data model that allows each record to have multiple parents and multiple child records. A network database allows a flexible relationship model between entities.

The following diagram represents a network data model that shows that the Stores entity has relationship with multiple child entities and the Transactions entity has relationships with multiple parent entities. In other words, a network database model allows one parent to have multiple child record sets and each record set can be linked to multiple nodes (parents) and children.

The model can be represented by an upside-down tree, where every member’s information (branch) attaches to the bottom of the tree (owner). Examples includes Ingres Database. TurboIMAGE. Integrated Data Store (IDS) Raima Database Manager.

Conclusion

The process of typing “https://www.google.com" and hitting Enter involves a remarkable sequence of events behind the scenes. From the DNS request to TCP/IP connection, passing through firewalls and load balancers, reaching web servers and application servers, and accessing databases, numerous components work together to deliver the website you requested. Understanding this journey sheds light on the complexity and sophistication of the internet infrastructure which enables us to appreciate the seamless and fast browsing experience we often take for granted.