Using Selenium with Python for Web Scraping and Form Automation Completion
I am currently intensively working with Cypress, a well-known UAT (aka User Acceptance Testing) framework, to write some tests! Within this time, in parallel, I made a lot of POC with Selenium and Beautiful Soup. From what I understood and read about Selenium, there is a strong similarity between Selenium’s logic and a UAT framework. Even though, it is more tedious to make Selenium work as you must crank out a lot of code with sometimes arcane syntax! There is not this handy abstraction layer like in Cypress or CodeceptsJS. Anyway, these tools’ common fate is to execute automation use cases in a browser and Selenium can do a lot.
The 2 main actions I wanted to implement with Selenium was:
- Scraping web pages: You can definitely use Selenium combined with Beautiful Soup to scraping elements from pages’ web application.
- Task Automation: Anyway, the more I discovered Selenium the more I was wondering: What is the practical and learning value regarding automation some P.O boring daily tasks jobs?
You can grab the source for this post and some other resource on my GitHub account: Code for using_selenium_web_scraping_automation
You can find another post on this blog about Web scraping, Beautiful Soup, Selenium: Web scraping, Beautiful Soup, Selenium – Various explorations in web scraping with Python and jumping timidly in surveillance capitalism
1. Requirement: Install XAMPP to have a local WP Frontend and Backend
If you want to have the exact same development environment, you need to have a local WP installed and declare in your hosts file the domain e.g., https://cypress.mydomain.priv/wordpress/ that leads to WP.
- Download XAMPP https://www.apachefriends.org
- Download WordPress https://wordpress.org/download/
# edit your hosts on a mac sudo -s vi /etc/hosts # type I for insert # cut and paste the domain cypress.mydomain.priv 127.0.0.1 cypress.mydomain.priv # save ctrl+C then :wq # you are good # quit the root session exit # just ping in the console, to ensure that it is OK ping localhost ping 127.0.0.1 ping cypress.mydomain.priv
2. Scraping web pages for UAT testing
Scraping is fun! Indeed, it is like poaching or reaping what has been sown! It can be even illegal… More prosaically, let’s say it is one of the very first gestures in data science. To bring together your dataset, you often have to extract this data from various sources including websites.
For this post, as my focus is on UAT testing, my purpose was different. For instance, in a cypress test, I needed for a test to have a up-to-date main navigation labels array. So, instead of looking every time and in every language the labels, I made a script with Selenium and Beautiful Soup to do so.
The first point is straightforward; you can find some code on my GitHub account in the directory: Code for selenium_web_scraping
3. Tasks Automation for P.O Job
This time, I had more ambition! For me, any online repeatable task I do as P.O should be automated so I can focus only on what matters. For instance, I manage Backlogs through Jira tickets. The project’s big picture is then shared on Dashboards with teams. So far, nothing abnormal.
But, on average, I create a lot of tickets and as a team we are embarking between 30 and 40 tickets per Sprint. I also manage the backlog itself that contains between 40 and 50 spare tickets. Nothing is more boring than typing in Jira, so why do not automate ticket creation in Jira from a .csv or .json file.
On reflection, there is an import tool in Jira, so it is bit useless… Do not reinvent the wheel and meditate the Peter Drucker quote below! But what about handling with automation the numerous Google Forms or MS Office forms that I am working with or automate WordPress post publication for this blog… If I take a moment to think, there are numerous tasks that can be automated.
The decisions making process is easy: just ask yourself few questions to decide if a task can be automated or not. Sampled from Lucas Soares’s post, here is the 2 questions:
- Do I do this periodically?
- Does it involve repeatable processes with little or no smart decision making involved?
- I will add a third one, especially important to me: “Is there an existing solution apart programming?”
If the answer for the 2 first questions are yes, and the third question is no then you should automate that task.
4. Some feedback on this experience with Selenium
Two practical advices issued from my experience.
- For Selenium, my advice is the following: you’d better decompose your use cases to be sure that Selenium sees the stuff… For instance, I lost few hours to determine how to click on a specific submit button in WordPress e.g., save draft or publish. I finally discovered that it was probably because the button was to up and the test was to down on the page at this moment! I should have applied the rule: always start from general to specific and not the other way round.
- To parse the CSV, I have deliberately implemented Pandas. Using Pandas to parse a CSV ease your data frame manipulation. You can even think some preliminary data science filtering before creating stuff for instance.
… and food for thought
- More reflection on today job scope, in many full or half “Bullshit jobs*”, agile organization included, you are spending valuable time to do useless things like filling out different kind of submission forms! You spend a lot of time: creating tickets, publishing on confluence, create shareable google doc, preparing communication releases, sending support mails… I hope for you that you added value is somewhere else. With the method showed below, using Python’s Selenium framework, you can decide to automate typing any form from Google forms to Jira ticket form, WordPress post or in CMS of any kind. * Bullshit Jobs by David Graeber
- For some, sky is the limit! For me, to limit the task automation reflection, I always have in mind this Peter Drucker’s quote: “There is surely nothing quite so useless as doing with great efficiency what should not be done at all.” It must be your compass for task automation.
4. Few extras: take-aways on Popcornflow and the Law of Two Feet
Like always, I tied practice with theories I heard of! So incidentally, this time I modestly deepened Popcorn Flow by Claudio Perrone and hovered quickly the Law of Two Feet. You can find quick notes on my GitHub account at
Conclusion: Well, what conclusion withdraw from those experiences. What work for me may not work for the other… That automation is maybe overkill and as a P.O, I should push the team to execute this stuff instead of doing it by myself. Anyway, it freed me up a lot of free time for other things than work while building trust within the whole team! No complaint.
Videos
#1 Requirements Using Selenium with Python for Web Scraping and Form Automation Completion
#2 Parsing .cvs or .json source Using Selenium with Python for Web Scraping and Form Automation Completion
#3 Create a Selenium WebScraper Using Selenium with Python for Web Scraping and Form Automation Completion
More infos
- Automated filling forms from a csv using Selenium
https://stackoverflow.com/questions/49273708/automated-filling-forms-from-a-csv-using-selenium - Automating Forms Submissions with Python
https://towardsdatascience.com/automating-submission-forms-with-python-94459353b03e - Read, Write and Parse JSON using Python
https://www.geeksforgeeks.org/read-write-and-parse-json-using-python/ - USING PYTHON TO LOOP THROUGH JSON-ENCODED DATA
https://www.tech-otaku.com/mac/using-python-to-loop-through-json-encoded-data/ - How To Resolve WebdriverException Geckodriver Executable Needs To Be In Path
https://www.dev2qa.com/how-to-resolve-webdriverexception-geckodriver-executable-needs-to-be-in-path/ - selenium-python-examples on github
https://github.com/search?q=selenium-python-examples - Selenium Python Tutorial
https://www.geeksforgeeks.org/selenium-python-tutorial/ - selenium python
https://pythonspot.com/selenium-webdriver/ - Examples on how to use Selenium
https://github.com/philipperemy/selenium-python-examples - Selenium Template
https://github.com/joyzoursky/selenium-template - Selenium in Action
https://towardsdatascience.com/selenium-in-action-2fd56ad91be6 - 3. Navigating from selenium-python
https://selenium-python.readthedocs.io/navigating.html - The Impossible Web Scraping
https://medium.com/@supernyv/the-impossible-web-scraping-cfe9444d1d5e - Data_Science_Projects/Factors Influencing Salaries by hsupernyv
https://github.com/supernyv/Data_Science_Projects/tree/main/Factors%20Influencing%20Salaries - hsupernyv on github.com
https://github.com/supernyv - Configuring Python for Web Scraping from tilburgsciencehub.com
https://tilburgsciencehub.com/building-blocks/configure-your-computer/task-specific-configurations/configuring-python-for-webscraping/ - Install browser drivers
https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/ - 12 Ways to hide your Bot Automation from Detection | How to make Selenium undetectable and stealth
https://piprogramming.org/articles/How-to-make-Selenium-undetectable-and-stealth–7-Ways-to-hide-your-Bot-Automation-from-Detection-0000000017.html - 39 Annoying Things Good Employees Really Hate at Work
https://www.inc.com/bill-murphy-jr/39-annoying-things-good-employees-really-hate-at-work-ranked-in-descending-order.html - 6 features of tasks that can be automated
https://medium.com/n8n-io/6-features-of-tasks-that-can-be-automated-b8fcfd79f09c - Tilburg Science Hub
https://github.com/tilburgsciencehub - Web scraper for rankingthebrands.com
https://github.com/tilburgsciencehub/data-ranking-the-brands - PDF Keywords/sentences finder application based on R
https://github.com/tilburgsciencehub/keywords-finder