Picture by Pixabay from Pexels

Coding can be a very hard task especially when working on a project with different developers. Each member of the team has his/her own way of coding leading to very heterogeneous scripts.
This is why it is important to have a similar code formatter and code linter in order to make your git commits cleaner. This can be carried out either between the staging and committing phase or during the CI/CD chain.

In this article, we will see how to do so as a pre-commit step using git hooks.

Table of content

The summary is as follows:

  1. Black
  2. Pylint
  3. Pre-commits as Git Hooks


Picture by Brett Sayles from Pexels

As a data scientist, when working on a complex project along with other developers, you, very often, need to package your AI algorithm into what we call an API which can be called by the backend in order to coordinate your app. Using an API has several advantages making your predictions more efficient and less time-consuming.

In this article, we will go through the definition of an API in general with a deep dive into RESTful ones, then we will create an API using python through the modules Flask and FastAPI. …

Photo by Jelmer Assink on Unsplash

When the Covid pandemic had hit, my gym was closed since the whole country was in lockdown. Because of that, I went from working out four sessions to zero sessions a week. As I have been exercising for the past five years I had to adapt regardful of the fact that I live in a medium apartment in Paris. After a while, I was able to get back to my previous rhythm using all the materials I bought.

I am well aware of the fact that many people have the same issue and since a lot of my friends and…

Photo by Joshua Aragon on Unsplash

When working on a data science project, many skills might be required from theoretical to technical ones.
In this article, I will mainly focus on some of the most important tools to have and work with, tools which allow better and cleaner codingand faster way of collaboration.

Table of content

The summary is as follows:

  1. Visual Studio Code
  2. Bash commands
  3. Virtual environment
  4. Unit Testing

1. Visual Studio Code

Photo by Markus Spiske from Pexels

As a consultant data scientist, I’m very aware of the importance of summarizing my work into Dashboards and Apps.
This allows me to popularise my algorithms and work and put them into instinctive graphics for better and faster understanding. In this article, we will go through one of the most famous tools and python libraries used in designing dashboards and applications.

Table of contents

The summary is as follows:

  1. Dash by Plotly
  2. Streamlit
  3. Bokeh
  4. Kibana
  5. Heroku deployment

Dash by Plotly

Dash is an open-source tool developed by Plotly. It allows to insert multiple types of widgets with the possibility of choosing the disposition and the style…

Photo by João Silas on Unsplash

PDF files or Portable Document Format are a type of files developed by Adobe in order to enable the creation of various forms of content. Particularly, it allows consistent safety regarding the change in its content. A PDF file can host different types of data: text, images, media, …etc. It is a tag-structured file which makes it easy to parse it just like an HTML page.

With that being said and for the sake of structure, we can separate the PDF files into two classes:

  • Text-based files: containing text that can be copied and pasted
  • Image-based files: contained images such…

Photo by Sonja Langford on Unsplash

Recurrent neural networks are very famous deep learning networks which are applied to sequence data: time series forecasting, speech recognition, sentiment classification, machine translation, Named Entity Recognition, etc..
The use of feedforward neural networks on sequence data raises two majors problems:

  • Input & outputs can have different lengths in different examples
  • MLPs do not share features learned across different positions of the data sample

In this article, we will discover the mathematics behind the success of RNNs as well as some special types of cells such as LSTMs and GRUs. …

Picture of Miguel Á. Padriñán from Pexels

Computer vision is a subfield of deep learning which deals with images on all scales. It allows the computer to process and understand the content of a large number of pictures through an automatic process.
The main architecture behind Computer vision is the convolutional neural network which is a derivative of feedforward neural networks. Its applications are very various such as image classification, object detection, neural style transfer, face identification,… If you have no background on deep learning in general, I recommend you to first read my post about feedforward neural networks.

NB: Since Medium does not support LaTeX, the…

Photo by Roman Mager on Unsplash

Deep learning is a subfield of Machine Learning Science which is based on artificial neural networks. It has several derivatives such as Multi-Layer Perceptron-MLP-, Convolutional Neural Networks -CNN- and Recurrent Neural Networks -RNN- which can be applied to many fields including Computer Vision, Natural Language Processing, Machine Translation…

Deep learning is taking off for three main reasons:

  • Instinctive features engineering: while most of machine learning algorithms require human expertise for the feature engineering and extraction, deep learning handles automatically the choice of variables and their weights
  • Huge Datasets: the continuous collection of data has led to large databases which allow…

Photo by Yancy Min on Unsplash

In the previous article, I have talked about some of the most important tools you will need to use when working on Data Science projects including Git widgets in VS Code. In this article, we will demystify the tool Git which allow the versioning of your code along with the handling of collaborative repositories.

Table of contents

The summary is as follows:

  1. Git & GitHub/GitLab
  2. Your 1st repository
  3. Pushing and Pulling code
  4. Git project Philosophy
  5. CI-CD with GitLab

1. Git & Github

Git is a coding tool used mainly for three reasons:

  • Time versioning your code
  • Keeping track of the changes made
  • Allowing parallel collaborations of multiple…

Ismail Mebsout

Data Scientist and an active blogger from Paris. Visit my website www.ismailmebsout.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store