Selenium WebDriver Architecture

Selenium WebDriver Architecture: Selenium is a robust automation testing tool which is widely used for testing web applications. Where the Selenium suite provides various tools and libraries for various testing needs, including Selenium WebDriver, which is important in automating various web browsers.

If you are an automation tester and you have a good understanding of the architecture of Selenium WebDriver, particularly in Selenium 4 then this will help you test your application efficiently.

Selenium consists of:

Selenium IDE
Selenium Grid
Selenium WebDriver

Selenium Components

Selenium IDE: Selenium IDE (Integrated Development Environment) which was a Firefox plugin in Selenium 3. But Selenium 4 supports a few other browsers like Chrome browser. By using Selenium IDE we can record and playback the scripts and also we can create scripts using Selenium IDE, but there are a few restrictions that why we need to use Selenium RC or Selenium WebDriver to write more advanced test cases.

Selenium RC: Selenium RC stands for Selenium Remote Control. When Selenium was first introduced in Selenium 1, Selenium RC was the main project. Selenium RC is still widely supported (in maintenance mode). Its dependence on JavaScript for automation also supports various other languages like Java, Javascript, Ruby, PHP, Python, Perl, and C#. we can also use almost most of the browsers. But when Selenium 2 was released after that, the use of Selenium RC went down and now it is officially deprecated.

Selenium WebDriver: Selenium WebDriver is nothing but a browser automation framework that receives the commands and sends those commands to the browsers and all this happens through the browser-specific driver(Firefoxdriver, IEDriver, Chromdriver, etc). It controls the browser by directly communicating with it. Selenium WebDriver widely supports most scripting languages like Java, C#, PHP, Python, Perl, and Ruby.

Selenium Grid: It is a tool used with Selenium RC to run tests parallel in two different machines and browsers. By this, we can run multiple tests on different machines, browsers, and in a different configuration at a time. in Selenium Grid one machine is called a hub and another called a node.

Selenium WebDriver API: First we have to know what is API and What is the use of API. API is a set of rules and specifications which is followed by software programs to communicate with each other. it behaves like an interface between different software programs. in the same way, UI is an interface between humans and computers. Similarly, Selenium WebDriver API helps in the communication between the browser and languages.

We are writing the test script using a programming language like Java, C#, Python, etc for automating web applications but the browser doesn’t understand those languages.

Components of Selenium Architecture

Understanding the architecture of Selenium WebDriver is essential for writing effective and maintainable automated tests. With a better understanding of the underlying components and their interactions, developers can leverage Selenium WebDriver to create robust test suites that ensure the quality and reliability of web applications.

Selenium WebDriver Architecture of 3.0

The Selenium Architecture of WebDriver in Selenium 3.0 contains the following components:

Selenium Client Library
Browser Drivers
JSON Wire Protocol over HTTP
Browsers
Selenium Architecture of WebDriver in Selenium 3.0

Selenium Client Library: As we all know that selenium is available in different flavor like we can use selenium with scripting languages like Java, Ruby, Python, etc. we are able to use selenium with so many another scripting because selenium developers developed language binding which allows selenium to support multiple languages. you can get a selenium library for each language here.

JSON Wire Protocol over HTTP: It is used to transfer data between server and client on the web. JSON Wire Protocol is a REST API which transfers the information between the HTTP Server ( FirefoxDriver, ChromeDriver, and IEDriver, etc.. ).

Browser Drivers: To automate the browser, each browser developer has provided specific drivers. that driver communicates with the browsers without revealing the internal logic of browser functionality. When we execute a script according to that command is delivered and executed in the respective browser and a response is received in the form of HTTP Response.

Browsers: Selenium supports almost all popular browsers like Firefox, Chrome, Safari Edge, etc.

Selenium WebDriver Architecture of 4.0

Selenium 3.0 uses JSON Wire protocol for communicating requests and responses, But when you use JSON Wire protocol then it causes slow test execution and flakiness due to direct communication between client libraries and browser drivers.

But when it comes to Selenium 4, it introduced WebDriver W3C Protocol, eliminating the need for encoding/decoding, and allowing direct communication between automation scripts and browsers without HTTP requests/responses.

Selenium WebDriver Architecture 1

In Selenium 4.0, the Selenium WebDriver architecture consists of the following four major components:

Selenium Client Libraries
Browser Drivers
WebDriver W3C Protocol
Real Browsers

In Selenium WebDriver 4.0, most components remain the same as in Selenium 3.0, but the JSON Wire protocol is replaced by the new W3C WebDriver protocol.

WebDriver W3C Protocol

WebDriver W3C is the new protocol in Selenium 4.0 endorsed by W3C, the group behind web standards. It enables direct data transfer between server and client without JSON Wire Protocol. This means more consistent testing across browsers without script changes. WebDriver W3C ensures stability and consistency in Selenium 4.0 tests.

How does Selenium WebDriver work internally?

When you run your Selenium script which was written by any supported Selenium client libraries for example Java, then the browser will be launched and perform the actions as per the script.

Here’s a simplified explanation of how the Selenium architecture works:

  • You write your automation script/code in a supported programming language like Java, Python, C#, Ruby etc. This is called client-side scripting.
  • Selenium provides a server component that acts as an intermediary between your client script and the actual browser.
  • For each browser (Chrome, Firefox, Safari etc.), there is a separate browser driver executable file (chromedriver.exe, geckodriver.exe etc.) that enables the Selenium server to control and communicate with that specific browser.
  • When you run your client script, it sends HTTP requests to the Selenium server.
  • The Selenium server uses the respective browser driver executable to translate those requests into commands that the browser can understand and passes them on to the browser.
  • The browser executes those commands (e.g. navigate to a URL, click a button) and sends back the new browser state to the server.
  • The server relays this response back to your client script.

I love open-source technologies and am very passionate about software development. I like to share my knowledge with others, especially on technology that's why I have given all the examples as simple as possible to understand for beginners. All the code posted on my blog is developed, compiled, and tested in my development environment. If you find any mistakes or bugs, Please drop an email to softwaretestingo.com@gmail.com, or You can join me on Linkedin.

Leave a Comment