Quick POC for a All-in-one that provides an SEO dashboard made with Streamlit, managing Screaming Frog automation, storing results in a Database (SQLite) and create data-analysis graphics for SEO reports
So, my objective was to build a dashboard with Streamlit that automate Screaming Frog SEO Spider for large time saves and fast audits then manage the output (csv reports), save it into a SQLite database and analyse it with creating graphics. You can find all the code on my github account at https://bit.ly/3oAPuBP
I. SEO POC
1. Methodology
Like always I am heading for a POC (Proof of Concept). So, pragmatism over theory. POC does not have to be perfect or compliant to some coding best practices or even logical at some point! The real value occurs often during the POC in some low noise signals, so be open minded and listen… As it is said “Knowledge speaks, Wisdom listens”. In this exploration phase, the purpose remains to achieve a fixed goal in a limited amount of time, avoiding spending 2 years for instance on a POC! 🙂 That’s it for the methodology.
2. Value
As a PO, sometime for website, I need to have an eye on SEO KPIs. Even so, I am far from being a specialist, I need to grab and quickly overview key indicators (kinda a PO data-science oriented) and henceforth gather SEO best practices to fine-tune the website eventually.
But, as far as possible, it has to be made by a “tool” that ease these processes so it should be industrialized and automatized. The background idea is always to chain actions between them in order to improve productivity and facilitate decision-making. That’s it for the value proposal. Just remember that learning has a cost and experimental work is expensive! So, Timeboxing exploration is a duty!
3. Individual
What about my objectives me as an individual, they are several:
- Delegate tedious work to machines in order to give me more free time
- Improving my Python’s practice
- Give a more data-science flavor to my resume to become a mixed flavored data-science PO for digital! By the way, for me, Streamlit has solved this paradox between investing on web or data-science. With Streamlit I can do both: (1) IA or Machine learning web tools or (2) handy Web Tools for other dedicated job’s purposes.
This POC is definitely in the second category. That’s it for my personal investment. As a first conclusion, I will say that any POC is a always a good tradeoffs between personal and professional objectives.
4. Sources
This post has been also inspired by these 2 posts but I believe I have taken in a more personal direction the challenge exposed in these 2 posts.
- An SEO guide for automating Screaming Frog with Python
https://www.rocketclicks.com/client-education/an-seo-guide-for-automating-screaming-frog-with-python/ - Badass SEO: Automate Screaming Frog For Large Time Saves and Fast Audits
https://cometfuel.com/learn/automate-screaming-frog/#using-terminal-mac
I quote this post cause I need to remember it, about the inadequacies sometimes between tech and reality.
The first job of an application architect is to “right size” the business or enterprise with the most efficient technology for their use case; not just saddle them with your pet preferences because it’s the latest “fad-tech” from Google or it’s what they taught you in school.
Source: https://beau-beauchamp.medium.com/php-is-killing-python-2be459364284
II. Build up the app (the POC)
Ok let’s build our dashboard! Like many softwares, Screaming Frog SEO Spider offers the ability to run without User Interface via Command line. With help of Streamlit, you can build your own webapp (GUI) to industrialize, automate repetitive tasks then transform raw data into useful information. That is my objective.
1. Guidelines
So, let’s build this GUI with these few simple guidelines defined earlier:
- Streamlit library will be the “wrapper” for our application. (https://streamlit.io/)
- SQLite will be used as database. I gave up completely MySQL as SQLite is a serverless database. It gives more flexibility to implement a database behind an app, that is the main reason why SQLite is often used in mobile development. (https://sqlite.org)
- SQLAlchemy will be use to query the database to offer again flexibility and minimize the written code in Python. Thanks to this abstraction (ORM), it is possible to connect to a large number of databases (Firebird, Microsoft SQL Server, MySQL, Oracle, PostgreSQL, SQLite, Sybase) and above all to standardize and considerably minimize your code in order to make it readable by other people and finally reusable. That’s savvy! (https://www.sqlalchemy.org/)
- Screaming Frog SEO Spider is a website crawler that helps you improve onsite SEO. (https://www.screamingfrog.co.uk/seo-spider/)
Behind all these technical recommendations, there are few key ideas that again should be exposed because these ideas have also prevailed in this POC.
2. Few more ideas
2.1 praise for simplicity
As a PO, you might suffer from the information and tools dissemination that prevent you from having a big picture of your products. So handling everything in the same screen helps to minimize information’s confusion. It should be the target of any monitoring application. I will call it praise for simplicity.
For an application, think to centralize and to minify the info, never unnecessarily expand it. That’s a waste of time for user.
2.2 use cases decision matrix
Always think of the most extreme use cases and the non-user use cases for your web application. It will force you to
look beyond the objective and look for the most attractive use cases. That is also a practical strategy to shorten the exploration phase to look for attractive use cases in order to save time on the primary exploration phase.
What the purpose of building an application that won’t be use ?
2.3 code prospecting for usages
Capture the most possible information about similar existing web application like the one you want to create. For
instance, browsing through Git to grab some code you will prevent you from reinventing the wheel. That’s a usages benchmark that can be withdrawn from existing codes.
2.4 collect user feedback first then analyse
Like for any data science project, try to collect the data yourself by mingling with other users than you. One advice if looking user feedback, be sure to record everything of their responses. While you are listening, your bandwidth is very low and it is almost impossible to detect the relevant vs the irrelevant in their feedback. You will make analysis it but later!
3. The DB: using SQLITE
Instead of MySQL, I am using SQLITE and like I said before: it is easier to implement but the syntax slightly differs from MySQL.
3.1 Advices
You can name your database with both extension .sqlite3
or .db
e.g. streamlit_sqlalchemy_example.sqlite3 or streamlit_sqlalchemy_example.db
# name of your database engine = create_engine( 'sqlite:///data/streamlit_sqlalchemy_example.sqlite3') |
A good practice is to put the db file in a directory then you find it easily e.g. data is my directory name for db
files.
# Valid SQLite URL forms are: sqlite:///:memory: (or, sqlite://) sqlite:///relative/path/to/file.db sqlite:////absolute/path/to/file.db |
2 important lines to stress how to connect, for the rest, you can check the files available on my github account.
# this line create the empty tables Base.metadata.create_all(engine) # it imports the table that you need e.g Products from streamlit_sqlalchemy_database import Products |
3.2 SQLite command reminder
Some useful commands for SQLite
# TO GET INTO SQLITE3, just TYPE the command sqlite3 IN the console sqlite3 #.open /your-path/SOURCE.db # OPEN a connection extension can be .db .open /Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/DATA/screamingfrog_websites_crawls_all.db # OPEN a connection extension can be .sqlite3 .open /Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/streamlit_sqlalchemy_example/DATA/screamingfrog_websites_crawls_all_new_1.sqlite3 # SHOW TABLES .tables # make a dump .output /Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/sqlite_source_dump_2.sql .dump # READ dump .read /Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/sqlite_source_dump_2.sql - command SELECT COUNT(*) FROM websites; SELECT * FROM websites; SELECT COUNT(*) FROM crawls; SELECT * FROM crawls; - DELETE TABLES DROP TABLE websites; DROP TABLE crawls; - DELETE # DELETE FROM TABLE_NAME; DELETE FROM websites; VACUUM; |
4. Describing Wireframe Screens
My app project is divided in 4 main screens and here is the navigation chosen below:
- screen_1 :: General
- screen_2 :: Crawl
- screen_3 :: Parse and Insert
- screen_4 :: Analyze
Making a Wireframe, in addition to being time consuming, also presents the disadvantage that you must define different screen status every time the user is
hitting a button! Honestly, it is obvious that the screens resemblance between the Wireframe and the real application mitigate this step interest. In a way, design screens for your web-application is useless as Streamlit is so quick to build up the screens itself.
To illustrate this non-interest, I put side by side Wireframe (made quickly with Balsamiq) and the real app (made quickly also with Streamlit). The only thing that may have a real interest is to summarize for each screen, with simple words, the User Experience expected like a User Story. I know Graphic Designers or UX Designers may hate me for this. Below my app User Stories.
- navigation :: As a user, I can select
destination in a main navigation dropdown menu on the left. - screen_1 :: General :: As a user, I can add into the database a website (title, url) to be crawled later.
- screen_2 :: Crawl :: As a user, I can launch the crawl with Screaming Frog by selecting a website in a drop-down menu. By clicking on a button to launch the crawl, it will be print out the Screaming Frog command-line and executed it.
- screen_3 :: Parse and Insert :: As a user, I can parse Screaming Frog’s CVS “reports” directory, select a file and launch computation then insert data into the SQLite database.
- screen_4 :: Analyze :: As a user, I can load graphics that analyzed Screaming Frog’s SEO KPIs taken from the SQLite database.
(Screen_1) General
Wireframe (Balsamiq)
Real app (Streamlit)
(Screen_2) Crawl
Wireframe (Balsamiq)
Real app (Streamlit)
(Screen_3) Parse and Insert
Wireframe (Balsamiq)
Real app (Streamlit)
(Screen_4) Analyze
Wireframe (Balsamiq)
Real app (Streamlit)
5. Create alias for your favorite application in the MAC console
I love todo-list because I always forget how to do things. It is nice to have somewhere a little helping note that shed some light on actions that you seldom do. That is the purpose of the lines below even.
So, in order to simplify the command-line, I made a alias to be able to call screaming frog application with the alias
froggy
. Unfortunately it is working in the console but it does not work when you use it into your python scripts. Anyway I bypass this obstacle by using the direct path to the screaming frog application.
5.1 Make a shortcut on a MAC for Screaming Frog SEO
Source : https://cometfuel.com/learn/automate-screaming-frog/#using-terminal-mac
As a shortcut, you can use whatever you want! Make it simple and meaningful e.g. screamingfrogseospider
, screaminjayhawkins
or ranagritando
or a more basic sf (careful to not confuse with symfony for instance…). I will gone with froggy
as a shortcut.
Command only working on the console
# full command working in the console froggy --crawl http://flaven.fr --headless --save-crawl --output-folder "/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/reports/" --export-tabs "Internal:HTML" --overwrite --config "/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/seo_spider_config_3.seospiderconfig" |
Command working in the python scripts
/Applications/Screaming\ Frog\ SEO\ Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher --crawl http://flaven.fr --headless --save-crawl --output-folder "/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/reports/" --export-tabs "Internal:HTML" --overwrite --config "/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/seo_spider_config_3.seospiderconfig" |
There are probably tons of explanations to find or to give about this issue but I’d rather move on as there is better to do!
Add a shortcut for Screaming Frog for the mac console
# go to /Users/[username] cd ~ # check if .bash_profile is in this directory ls -la # open with vim or nano vim .bash_profile nano .bash_profile # using textedit native on mac open -e .bash_profile # using sublime text subl .bash_profile # adding an alias for Screaming Frog, I named it froggy alias froggy="/Applications/Screaming\ Frog\ SEO\ Spider.app/Contents/MacOS/ScreamingFrogSEOSpiderLauncher" source ~/.bash_profile # test the froggy shortcut froggy --help |
# commands inside the console working froggy --crawl <url> froggy --crawl https://flaven.fr/ froggy --headless --crawl https://flaven.fr/ froggy --headless --save-crawl --output-folder /Users/brunoflaven/Documents/02_copy/_streamlit_ideas_for_app/ --timestamped-output --crawl https://flaven.fr/ froggy --headless --save-crawl --timestamped-output --crawl https://flaven.fr/ # full command working in the console froggy --crawl http://flaven.fr --headless --save-crawl --output-folder "/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/reports/" --export-tabs "Internal:HTML" --overwrite --config "/Users/brunoflaven/Documents/01_work/blog_articles/python-automate-screaming-frog/seo_spider_config_3.seospiderconfig" # open the app from the console open "/Applications/Screaming Frog SEO Spider.app" # testing the shortcut for Screaming Frog SEO Spider.app froggy --help |
5.2 Make a shortcut on a MAC for Sublime Text
To use subl, the Sublime Text bin folder needs to be added to the path. For a typical installation of Sublime Text, this will be located at /Applications/Sublime Text.app/Contents/SharedSupport/bin
. If using Bash, the default before macOS 10.15, the following command will add the bin folder to the PATH environment variable:
# add sublime directly in the console... echo 'export PATH="/Applications/Sublime Text.app/Contents/SharedSupport/bin:$PATH"' >> ~/.bash_profile source ~/.bash_profile |
# check subl --help |
Source: https://www.sublimetext.com/docs/command_line.html#mac
6. Videos
3 additional videos to tackle this post
- Python, Screaming Frog, SEO, Automate, POC Part 1 Manipulating Data with Streamlit & SQLite with the help of SQLAlchemy
- Python, Screaming Frog, SEO, Automate, POC Part 2 Creating Database in SQLite with Streamlit and SQLAlchemy
- Python, Screaming Frog, SEO, Automate, POC Part 3 Creating Database in SQLite with Streamlit and SQLAlchemy
Conclusion
Few conclusions can be drawn from this POC:
- Python, Pandas, Numpy, Plotly and Streamlit… are strong enablers to build personal usage dedicated Software or SaaS. It reduces drastically the entrance fee to the Software or SaaS market! I must admit that learning recent web technologies bring me closer than ever to the “serious” software industry than I thought. I should say it the other way round that is more the Software Industry that shifts to SaaS, Cloud and web technologies since 20 years from now…. after all any website is nothing more than a “software” running in a browser.
- It reinforces me as a user centric PO and shapes me as an “improver”, “tinker” or “builder” for added value tools to improve daily processes and focus on delivering quickly real value! It is F… Faith Declaration!
More infos
- Screaming Frog SEO Spider
- Command line interface for screamingfrog
https://www.screamingfrog.co.uk/seo-spider/user-guide/general/#command-line - Making the Most of Screaming Frog’s Features in 2020
https://www.growth-rocket.com/blog/making-the-most-of-screaming-frogs-features-in-2020/ - SEO Spider Configuration
https://www.screamingfrog.co.uk/seo-spider/user-guide/configuration/ - How To Automate Crawl Reports In Data Studio
https://www.screamingfrog.co.uk/how-to-automate-crawl-reports-in-data-studio/ - Search Solved GitHub Account
https://github.com/searchsolved/search-solved-public-seo - Keyword_Clustering_Steamlit by Search Solved
https://github.com/searchsolved/search-solved-public-seo/tree/main/Keyword_Clustering_Steamlit - Screaming Frog SEO Spider Demo
https://www.youtube.com/watch?v=tkil3pKSFDA - streamlit-cheat-sheet by daniellewisDL
https://github.com/daniellewisDL/streamlit-cheat-sheet - Project-for-User-Auth by digipodium
https://github.com/digipodium/Project-for-User-Auth - discuss.streamlit.io – Format_func (function) – Examples Please
https://discuss.streamlit.io/t/format-func-function-examples-please/11295/5 - discuss.streamlit.io – How to use st.cache with sqlalchemy.orm objects
https://discuss.streamlit.io/t/how-to-use-st-cache-with-sqlalchemy-orm-objects/3329/5 - How to build interactive dashboards in Python using Streamlit
https://towardsdatascience.com/how-to-build-interactive-dashboards-in-python-using-streamlit-1198d4f7061b - Charly Wargnier github account
https://github.com/CharlyWargnier?tab=repositories - Streamlit with SQL and plotly express to make a dashboard
https://github.com/mdsohelmahmood/mdsohelmahmood.github.io/blob/8b6e4d48d23c9ce2d009f1ee36fb5855593ade0f/_notebooks/2021-06-29-Streamlit-01.ipynb - Customer Segment Profiling App with Streamlit
https://antonsruberts.github.io/streamlit-audience/ - Dropdownmenu in Streamlit without brackets and quotes
https://python.tutorialink.com/dropdownmenu-in-streamlit-without-brackets-and-quotes/ - How to build a multi-language dashboard with Streamlit
https://blog.devgenius.io/how-to-build-a-multi-language-dashboard-with-streamlit-9bc087dd4243 - streamlit-gettext by fischerbach
https://github.com/fischerbach/streamlit-gettext - Streamlit by mdsohelmahmood
https://github.com/mdsohelmahmood/Streamlit - streamlit-dashboard-template by amrrs
https://github.com/amrrs/streamlit-dashboard-template - fast.ai
https://github.com/fastai - Dashboard using Streamlit with data from SQL database
https://towardsdatascience.com/dashboard-using-streamlit-with-data-from-sql-database-f5c1ee36b51 - Build a Web App to Group & Plot Excel Files in Python with Streamlit
https://www.youtube.com/watch?v=ZDffoP6gjxc - SQL Exercises: Find salesman commission details where customer grade is 200 or more
https://www.w3resource.com/sql-exercises/sql-quering-on-multiple-table-exercise-7.php - MySQL Sample Databases
https://www3.ntu.edu.sg/home/ehchua/programming/sql/SampleDatabases.html - Data modeling
https://gquercini.github.io/courses/databases/tutorials/data-modeling/ - MySQL Tutorial
https://www.mysqltutorial.org/ - SQLAlchemy — Python Tutorial
https://towardsdatascience.com/sqlalchemy-python-tutorial-79a577141a91 - SQLAlchemy. Guía de inicio (Spanish)
https://j2logo.com/python/sqlalchemy-tutorial-de-python-sqlalchemy-guia-de-inicio/ - SQLAlchemy Tutorial
https://www.tutorialspoint.com/sqlalchemy/index.htm - SQLAlchemy — Python Tutorial
https://towardsdatascience.com/sqlalchemy-python-tutorial-79a577141a91 - SQLalchemy + Python Tutorial (using Streamlit)
https://www.youtube.com/watch?v=lIHGGcnCmFA - streamlit_sqlalchemy_example
https://github.com/digipodium/streamlit_sqlalchemy_example - sqlalchemy.org – The Python SQL Toolkit and Object Relational Mapper
https://www.sqlalchemy.org/ - analyse slqalchelmy
https://github.com/mdsohelmahmood/mdsohelmahmood.github.io/blob/8b6e4d48d23c9ce2d009f1ee36fb5855593ade0f/_notebooks/2021-06-29-Streamlit-01.ipynb - SQLAlchemy ORM Tutorial for Python Developers
https://auth0.com/blog/sqlalchemy-orm-tutorial-for-python-developers/ - Python SQLite – Travailler avec la date et la date et l’heure (French)
https://fr.acervolima.com/python-sqlite-travailler-avec-la-date-et-la-date-et-l-heure/ - Creating a Data Science Portfolio
https://www.maartengrootendorst.com/blog/portfolio/ - Feature Value Analysis: the Supreme Growth Hack
https://www.priceintelligently.com/blog/bid/191076/Feature-Value-Analysis-The-Supreme-Growth-Hack - Nudge theory
https://en.wikipedia.org/wiki/Nudge_theory
Streamlit
SQLite
SQLAlchemy
Other
Nudge theory