Using Ludwig, Introduction to Deep Learning

As a PO, I always try to find out some way to increase productivity but above all is to avoid doing repeatedly boring stuff! So, it often happens that I am writing code for my own use to avoid tedious work or even the work itself!

I am currently working on Python and try to sneak out practical results quickly with the help of ML, NLP or IA. As I read a lot of stuff about IA, I incidentally discovered Ludwig! So, this post is my personal tchotchkes’ collection on Ludwig.

My feeling was that the more I was reading about Ludwig, the more it maybe the right tool for me! Curiosity for IA but laziness and reluctancy for theoretical explanations and coding. Anyway, enough pointless personal considerations, let’s introduce Ludwig and then use it to train, predict and visualize!

All the files shown in the 3 videos can be found here: https://github.com/bflaven/BlogArticlesExamples/tree/master/using_ludwig_introduction_to_deep_learning

1. What is Ludwig?

Ludwig is a code free deep-learning tool box. It is supported by a Uber research scientist named Piero Molino. Ludwig lets people without a machine learning background train prediction models without the need to write code.

Ludwig can help you make deep learning easier to understand for non-experts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike.

Ludwig is a good starting to cover many ML use cases. It has:

flexibility: expert users and novices can take advantage of Ludwig.
extensibility: really easy to add additional model to Ludwig.

Ludwig make the model understandable at least in terms of what is predicting and what is the quality of the predictions.

It streamlines the ML process and assists you in all the workflow: splits the data in training, validation, and test sets, trains the model on the training set, validates on the validation set (early stopping), predicts on the test set… and so on.

You really have only set the output feature and the input feature and define the data type abstraction or a model definition without coding because it is all in .yml file.

The real issue that you may encounter is really to understand what you are currently doing if you are not familiar with ML. Like me, in the past, I have mostly used pretrained models so I am not aware of all the model’s subtleties and the way to fine tune a model! What a schmock!

There are plenty of examples on the official documentation at https://ludwig-ai.github.io/ludwig-docs/examples/

2. Install LUDWIG

Ludwig requires you to use Python 3.6+. If you don’t have Python 3 installed, install it by running:

sudo apt install python3  # on ubuntu
brew install python3      # on mac

You may want to use a virtual environment to maintain an isolated Python environment.

virtualenv -p python3 venv

In order to install Ludwig just run:

pip install ludwig

Check LUDWIG Installation

ludwig -h

# OUTPUT
 
-h, --help  show this help message and exit
ludwig cli runner
ludwig --help or ludwig -h
 
Available sub-commands:
   train                 Trains a model
   predict               Predicts using a pretrained model
   evaluate              Evaluate a pretrained model's performance
   experiment            Runs a full experiment training a model and evaluating it
   hyperopt              Perform hyperparameter optimization
   serve                 Serves a pretrained model
   visualize             Visualizes experimental results
   collect_summary       Prints names of weights and layers activations to use with other collect commands
   collect_weights       Collects tensors containing a pretrained model weights
   collect_activations   Collects tensors for each datapoint using a pretrained model
   export_savedmodel     Exports Ludwig models to SavedModel
   export_neuropod       Exports Ludwig models to Neuropod
   preprocess            Preprocess data and saves it into HDF5 and JSON format
   synthesize_dataset    Creates synthetic data for tesing purposes

If it is all done! You are good, you are ready to go and be part of the IA revolution and add some wow to your professional life at least.

3. Using LUDWIG

Here is a quick memo on important notions that you may often encounter if you get to grip with AI or Deep learning.

3.1 Few definitions: epoch, batch, dataset…
That an extract of the most simple answer about “What is an epoch?” extracted from https://www.quora.com/What-is-an-epoch-in-deep-learning

Deep learning often deals with copious amounts of data.
This data is broken down into smaller chunks (called batches) and fed to the neural networks one-by-one.
One epoch is when the entire dataset is passed forward and backward through the neural network once.
In order to generalize the model, there is more than one epoch in the majority of the deep learning models.
The more the number of epochs, the more the parameters are adjusted thus resulting in a better performing model. However, too many epochs might lead to overfitting. If a model is overfitted, it does well in the train data and performs poorly on the test data.
Iteration is the number of batches needed to complete one epoch.

A simple example to understand the linking between all these notions.
Suppose we have a dataset of 42,000 training examples and we divide it into batches of 600. To complete 1 epoch, it would have taken 70 (42,000 divided by 600) iterations.

3.2 Must-knows on Ludwig

3.2.1 requirements for ludwig, batches
You do not need to code to test training models. What is required by Ludwig is input and output? Input: You need to have dataset in .csv format as an input plus a declarative model definition (YAML).

3.2.2 Under the hood: where are the files that matters 🙂

After the first training in the main folder
Running Ludwig will create 2 files in the main folder:

The file with .meta.json’s extension will contain the mapping.
The file with .hdf5’s extension will contain the data.

After each training in the results folder
– In /results/experiment_run/description.json: Like it is indicated by the name, you have all the meta information about the command so the file description.json that contains all the information on how the command was run, what was the input feature and the output feature ….

– In /results/experiment_run/training_statistics.json: the file contains basic stats on what is accuracy, loss… etc. of your model

– If you want to use model for a prediction, you just have to point to the model directory e.g /results/experiment_run_3/model

4. EXTRAS STUFF

– You can also visualize to compare models among them with the visualize command.

– There is also a Programmatic API that can be used directly in python script like any python library. Check https://ludwig-ai.github.io/ludwig-docs/api/LudwigModel/

5. Videos tutorials on Ludwig

3 videos with some little mistakes but that gave a good sense of what is Ludwig and how to use if your are not an expert!

Conclusion

Well, the very first Ludwig’s benefit is psychologic! Ludwig lowers the supposed high entry barriers to concepts such as Deep learning, Machine Learning, Artificial Intelligence or Visualization. So, like me, if you are a P.O, a project manager, a marketing guy, you may feel, by discovering Ludwig, a little bit the meaning of what is the Artificial Intelligence revolution is about…
I must agree that it does not turn you magically into a data scientist in one night! The proof is that you may find explanations in my videos a bit shaky or even ludicrous.
Nevertheless, that is a great opportunity to leave pre-trained models to experiment your own models. “Don’t think you’re so small, you’re not so tall”! like the Talmud says.

More infos

Ludwig Examples
https://ludwig-ai.github.io/ludwig-docs/examples/
ludwig-ai on Github
https://github.com/ludwig-ai/ludwig/
Introducing Ludwig, a Code-Free Deep Learning Toolbox
https://eng.uber.com/introducing-ludwig/
[Uber Open Source] Ludwig: A Code-free Deep Learning Toolbox
https://www.youtube.com/watch?v=Ns_6Ep7GAIM
Ludwig: Declarative Deep Learning feat. Piero Molino | Stanford MLSys Seminar Episode 13
https://www.youtube.com/watch?v=BTkl_qc0Plc
Heart Disease UCI on Kaggle
https://www.kaggle.com/ronitf/heart-disease-uci
Intro to Machine Learning Applications
https://github.com/RPI-DATA/course-intro-ml-app
An NLP project to classify reviews as ‘positive’ or ‘negative’ from an IMDB movie reviews dataset.
https://github.com/srbh24/NLP/
Introduction to Ludwig and how to deploy a Deep Learning model via Flask
https://www.adaltas.com/en/2020/03/02/ludwig-deep-learning-flask/
A colab on “Sentiment Analysis with Ludwig” (Sentiment Analysis with Ludwig.ipynb)
https://colab.research.google.com/drive/1J-WmhxCdwvlRDJyvcYd1TDsXhqGOMCRa?usp=sharing
The Complete Guide to Sentiment Analysis with Ludwig — Part I
https://medium.com/ludwig-ai/the-complete-guide-to-sentiment-analysis-with-ludwig-part-i-65a9e6bc054e
The Complete Guide to Sentiment Analysis with Ludwig — Part II
https://medium.com/ludwig-ai/the-complete-guide-to-sentiment-analysis-with-ludwig-part-ii-d9f3952a06c6
The Complete Guide to Sentiment Analysis with Ludwig — Part III: Hyperparameter Optimization
https://medium.com/ludwig-ai/hyperparameter-optimization-with-ludwig-6e31272e43fb
Python for SEO: Complete Guide (in 7 Chapters)
https://www.jcchouinard.com/python-for-seo/
Python SEO by holisticseo
https://www.holisticseo.digital/python-seo/
Binary Categorization of Websites with Tensorflow and Python
https://www.holisticseo.digital/python-seo/website-categorization/
Automated Title Tag Optimization Using Deep Learning
https://www.searchenginejournal.com/
automated-title-tag-optimization-using-deep-learning/390207/
ranksense / Twittorials
https://github.com/ranksense/Twittorials
First impressions about Uber’s Ludwig. A simple machine learning tool. Or not?
https://medium.com/gowombat/first-impressions-about-ubers-ludwig-a-simple-machine-learning-tool-or-not-714962bbbedc
Text Sentiments Classification with CNN and LSTM
https://medium.com/@mrunal68/text-sentiments-classification-with-cnn-and-lstm-f92652bc29fd
Gunicorn ‘Green Unicorn’ is a Python WSGI HTTP Server for UNIX.
https://gunicorn.org/
Gilbert Tanner youtube channel
https://www.youtube.com/channel/UCBOKpYBjPe2kD8FSvGRhJwA
How to Produce Quality Titles & Meta Descriptions Automatically
https://www.searchenginejournal.com/titles-meta-descriptions-automatically-python-javascript/360108/
Automated Intent Classification Using Deep Learning in Google Sheets
https://www.searchenginejournal.com/automated-intent-classification-using-deep-learning-google-sheets/353910/
Automated Intent Classification Using Deep Learning
https://www.searchenginejournal.com/automated-intent-classification-using-deep-learning/311309/
ludwig text-classification training
https://dev.to/kojikanao/ludwig-text-classification-training-7ii
Python for Data Science
https://paiml.github.io/python_for_datascience/intro
Human Interface Laboratory, Kyushu University on github
https://github.com/uchidalab
Noah Gift on github
https://github.com/noahgift
ludwig code free deep learning tool box || part 1
https://www.youtube.com/watch?v=6dqG2B0XkFw
ludwig sentiment analysis parallel cnn || part 2
https://www.youtube.com/watch?v=ukErNtrBn_s
ludwig hyper-parameter tuning parallel cnn || part 3
https://www.youtube.com/watch?v=2ztr7k7hTZE
Uber Ludwig Tutorial #1 – What is Ludwig and how does it work
https://www.youtube.com/watch?v=uSOsos2eKHI
Introduction to Uber’s Ludwig
https://gilberttanner.com/blog/introduction-to-ubers-ludwig
Uber Ludwig Tutorial #1 – What is Ludwig and how does it work
https://www.youtube.com/watch?v=uSOsos2eKHI
Uber Ludwig Tutorial #2 – Working on different data-sets
https://www.youtube.com/watch?v=NB2aiRKZIok
How Lil Nas X Flipped Conservatives’ Culture-War Playbook
https://www.politico.com/news/magazine/2021/04/10/lil-nas-x-montero-satan-devil-politics-controversy-religion-480655
An Introduction to Ludwig Wittgenstein
https://medium.com/curious/an-introduction-to-ludwig-wittgenstein-e866ec78ed06