POC with FastAPI for an NLP API with Spacy, SQLAlchemy, Sqlite and… Streamlit

Still exploring the FastAPI’s capabilities to be use as a wrapper for data science API and expose NLP features as a RESTFUL microservice. But, this I have extended the scope and decide to interface an API’s POC, leveraging on Spacy’s NLP features, with a Sqlite or MySQL database through an ORM like SQLAlchemy or Peewee.

For this post, you can find all files for each project on my GitHub account. See https://github.com/bflaven/ia_usages/tree/main/fastapi_database

Just as a reminder, I already express this need in a previous post in how use database:

Again, I am mostly building a local environment, with the help of Anaconda, on my Mac, as I am still heading up for product discovery and do not yet have to share it with people. But once again, to share it, I will migrate all stuff on Docker as it facilitates maintainability and deployment in production.

Here is a presentation for each GitHub directories attached to this exploration:

  • 001_extra_files: It contains 2 files 001_fastapi_database.py and create_files_for_fastapi_sql_app.sh. The file “create_files_for_fastapi_sql_app.sh” creates a directory named “sql_app” and inside all the empty files required to start the FastAPI API with a database. The file “001_fastapi_database.py” check if sqlalchemy is installed in your environment with the creation of database named “example.db” and add 2 records in it.
  • 002_bugbytes_io_crud_api_fastapi: A notable example documented both by a video and a post from bugbytes.io. Everything you need to know on how creating a CRUD API with GET, POST, PUT and DELETE Endpoints.
  • 003_bugbytes_io_crud_api_fastapi_sqlmodel: A bit more advanced example than the previous exampl “002_bugbytes_io_crud_api_fastapi”. Introducing SQL model.
  • 004_bugbytes_io_fastapi_htmx_example: The same code than the two previous examples from bugbytes.io but again taking advantage of the ORM sqlalchemy.
  • 005_fastapi_tiangolo_tutorial_sql_databases: The example given by the official documentation of FastAPI with SQLAlchemy.
  • 007_sql_databases_peewee: The example given by the official documentation of FastAPI with Peewee this time.
  • 008_fastapi_mysql_restapi: Using Docker to create a MYSQL database and a Phpmyaadmin instance connected to FastAPI POC.
  • 011_openai_sqlite_nlp_fastapi_streamlit: a simple combination with Streamlit, FastAPI and Spacy to expose “features” like: NER, Summary, Tags extraction for text in French, Spanish, English and Russian. The skeleton has been written by ChatGPT and extended manually after.

The prompt behind the fastAPI POC in “011_openai_sqlite_nlp_fastapi_streamlit”

# prompt for ChatGP to create nlp api with database

In Python, with the help of Spacy in 3 different languages (ES, FR, EN) and FastAPI, write an API named
("title="TrattorIA") that manage an element named "Post", create 2 endpoints: function "healthcheck" available at
"/healthcheck", function "entities" available at "/entities/{lang}" where "lang" is a variable for the 3 different languages. For the NER function "entities" used "body: RecordsRequest = Body(..., example=example_request)" where,
example_request is provided by an external json file named example_request.json.

Please define the appropriate json structure according to the code for the API.

The API rely on a sqlite database named "nlp_post_db.db". It has to be structure like a traditional FastAPI project, the
ORM used is sqlalchemy. Below the wanted structure:

nlp_db_app
├── __init__.py
├── crud.py
├── database.py
├── main.py
├── models.py
└── schemas.py

In the file "models.py", the class Post(Base) contains fields like: post_id, post_title, post_body, post_origin_id,
keywords_ner. The API is saving the result of the NER function "entities" available at "/entities/{lang}".

post_id is the primary_key for each post.
post_title is the title of the post.
post_body is the body of the post.
post_origin_id is the id coming from another database.
keywords_ner is the result of the NER function "entities" available at "/entities/{lang}"

So, when the post_body is passed to the function "entities", the result is saved the result into a table and especially.
entities extracted from the post_body

Why saving data from the API?


Very basically, the results of the machine learning model can either produce an immediate result through an API call or for a future usage where the result is saved into a database e.g. extracting keywords, text summary… And another process is merging the data with some other data with database migration.

Just as reminder, an ORM make querying a database with FastAPI easier whatever the database type is e.g. (mysql, sqlite…etc). It will make your code more readable, easier to maintain, scalable and often more performing 🙂 even if this last point can be discussed.

Like said the FastAPI official website:

An “ORM”: an “object-relational mapping” library. An ORM has tools to convert (“map”) between objects in code and database tables (“relations”).

Source : https://fastapi.tiangolo.com/tutorial/sql-databases/

API Structure with a database


There is also some good advice, especially in application structure, given by FastAPI when you build up a API with an ORM likeSQLAlchemy ORM (part of SQLAlchemy, independent of framework) or Peewee (independent of framework).

Example of API structure

	sql_app
    ├── __init__.py
    ├── crud.py
    ├── database.py
    ├── main.py
    ├── models.py
    └── schemas.py

Conclusion:It is frankly good to know that it is easy to integrate any kind of database into your API project with FastAPI. Nevertheless, the most valuable lesson from this POC is that you rely on the best practice in top position among the FastAPI best practices “1. Project Structure. Consistent & predictable” where “the best structure is a structure that is consistent, straightforward, and has no surprises”!

Source : https://github.com/zhanymkanov/fastapi-best-practices

Videos to tackle this post

#1 POC with #fastapi for an #nlp #api with #spacy #sqlalchemy #sqlite and… #streamlit

#2 POC with #fastapi using an ORM like #sqlalchemy or #peewee for a #sqlite #database

#3 POC with #fastapi and #docker using an ORM like #sqlalchemy for a #mysql #database

More infos