What is Selenium? Selenium is a browser automation framework used primarily for web scraping. A Python Selenium basic configuration is key to getting Selenium to play nice with the web pages you are scraping. It comes in many flavors like Javascript, Python, PHP, and even Pharo Smalltalk (I have no idea; never heard of it.). I primarily Use the Python Selenium framework. The great thing is that Selenium on Python is easy to configure. Below are the basic steps of Python Selenium configuration.
Step 1: Install a Browser Driver
Selenium supports all major browsers including Safari. However, I do prefer Chrome or Firefox . I have much better success getting my code to run more smoothly in Firefox. One of the issues you will run into is javascript, especially javascript used for ads. They can cause delays with page loading. The great news, this Python Selenium basic configuration contains a few lines of code that add an adblcoker into the Firefox default profile. The adblocker configuration is not needed to start Selenium; however, it will greatly speedup the scraping of ad heavy websites.
#For MacOS use the Brew installer. Install Brew using the command below, #following the instructions in the terminal.
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
#For a Chrome webdriver. Note the install directory you will need it later
brew cask install chromedriver
#For a Firefox webdriver. Note the install directory you will need it #later
brew cask install geckodriver
#For Ubuntu follow the below steps to install the Chrome driver. If you want #step by step for installation of Firefox, let me know.
#Navigate to your home Download directory
cd ~/Downloads
#Download the the chromedriver
curl -O "https://chromedriver.storage.googleapis.com/100.0.4896.20/chromedriver_linux64.zip"
#Navigate to the unzipped directory
cd chromedriver_linux64
#Copy the chrome driver to /usr/local/bin
cp chromedriver /usr/local/bin
#Validate the binary copied correctly
which chromedriver
Step 2: Install Python Selenium
Install the latest version of Selenium using pip. Execute the command below.
python -m pip install selenium
Step 3: Selenium Python Basic Configuration
Remember the installation directory captured in Step 1? Well, now we are going to use it. Open your favorite Python IDE and import Selenium.
from selenium import webdriver
Now comes the Selenium Python basic configuration. Set the following configuration in code. The Options( ) class is great for configuring browser options such as running headless or adding profile preferences. You can use profile preferences to set the homepage or disable strict browsing modes.
#Configure browser options like running headless
browser_options = Options()
#Configure and adguard. The next 3 lines of code are used for ad control.
#They are not needed, but help Selenium to run smoothly.
gecko_profile = webdriver.FirefoxProfile()
adblock_plugin = './resources/adblock_for_firefox-5.1.1.xpi'
gecko_profile.add_extension(adblock_plugin)
#This is where you configure the binary location for your webdriver
#and add in your browser options.
browser = webdriver.Firefox(executable_path='/usr/bin/geckodriver',
options=browser_options)
#This line of code starts Selenium, executes the webdriver, and loads the
#the website.
browser.get('https://stocktwits.com')
The lines of code above will give you a Python Selenium basic configuration. Check back later for another post on how to control Selenium loading speed and input speed.