If you are completely new to machine learning, then probably you should read my earlier post what is machine learning!
If you are new to python programming, then I suggest you to go through our python coding tutorial first, that will help you to learn python coding syntax and how to work with database, it’s free!
In this post we learn how to start Machine Learning using Python, you will learn how to perform basic price prediction using python machine learning API.
Before you start learning python machine learning, I suggest you should get familiar with following python libraries, because during machine learning we will be using those libraries extensively, if you know those library code syntax, you will able to focus more on machine learning flow rather than wondering about those library-codes looks like!
from matplotlib import pyplot
import pandas as pd url ="\\data\\taxi-fare-test.xlsx" df = pd.read_excel(url) print(df)
sklearn is a Python module integrating classical machine learning algorithms in Python packages (numpy, scipy, matplotlib)
from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.metrics import roc_curve
Here is a small example of how to use sklearn library
# generate 2 class dataset X, y = make_classification(n_samples=1000, n_classes=2, random_state=1) # split into train/test sets trainX, testX, trainy, testy = train_test_split(X, y, test_size=0.5, random_state=2) # fit a model model = LogisticRegression() model.fit(trainX, trainy) # predict probabilities probs = model.predict_proba(testX)
Numpy library is for python array, widely used in machine learning project, if you are not familiar with numpy, this numpy tutorial may help you some extent!
SciPy is a scientific computing package for Python language, here you can learn more about SciPy Library
import tensorflow as tf
Tensor is multi-dimensional array (like numpy), Tensorflow is very popular for image classification and processing
Learn more about Tensorflow
In our example below, we will be learning some of above libraries, so you need to install them in your local project.
First, we need to setup our python development environment by installing all required libraries as listed above.
Here I am using visual studio code for python development, you can use any SDK.
You need to create your dataset, You create dataset in any data source like any RDBMS or Excel or CSV anything.
You can create panda data frame to create a dataset you want to work with.
import pandas as pd artworks = pd.read_csv('../testdata/Artworks.csv') artworkdt = pd.DataFrame(artworks, columns = ['Artist', 'Nationality']) filter = artworkdt["Artist"] == "Thomas Bewick" atr = artworkdt.where(filter) print(atr)
At this stage, you need to load dataset into python object, so you can play with data, how to load data that will depend what data source you are working with, in my example i will load data from excel file.
import pandas as pd import os from matplotlib import pyplot path=os.getcwd() url =path+ "\\data\\taxi-fare-test.xlsx" df = pd.read_excel(url) print (df)
You may need to understand data by changing order, removing columns, adding additional columns, grouping them etc. Get them ready to train and test algorithms.
# check if data is correct by calling head function df = pd.DataFrame(data) _head= df.head(2) print(_head)
Ideally we should analyze and organize dataset in SQL, that will be more convenient, once our dataset is ready, then we should bring data into our python code to process further!
Now, you may want to see how visually data will look like, by plotting, charting etc. You can also save the visual representation in pdf format for future reference or reporting purpose.
from matplotlib import pyplot df.hist() # will show histogram for each column pyplot.show() # then pyplot.scatter(df['unemploymentrate'], df['indexprice'], color='green') pyplot.title('Index Price Vs Unemployment Rate', fontsize=14) pyplot.xlabel('Unemployment Rate', fontsize=14) pyplot.ylabel('Index Price', fontsize=14) pyplot.grid(True) pyplot.show()
Try different algorithms to see which produce the best closest result
Finally, make prediction with real data.
In our example, we will predict fruit price based on previous year data.
Note: if you don’t have data you can download Taxi fare standard data for practice.
Start your SDE, we are using Visual Studio 2019 to write python console application for machine learning example. (You can use any Python SDE, code will remain same.)
First, we need to make sure that all required libraries are installed correctly, so let’s run the following code in your console.
print("We are learning Machine Learning at WebTrainingRoom") import sys print('Python: {}'.format(sys.version)) # install numpy import numpy print('numpy: {}'.format(numpy.__version__)) # install matplotlib import matplotlib print('matplotlib: {}'.format(matplotlib.__version__)) # install pandas import pandas print('pandas: {}'.format(pandas.__version__)) # install scikit-learn import sklearn print('sklearn: {}'.format(sklearn.__version__)) # install scipy import scipy print('scipy: {}'.format(scipy.__version__))
If you are creating for first time, then you may need to install all packages one by one,
to do that, expand your solution explorer, go to python environment, then right click to manage package,
then run pip command, like pip install sklearn
Once you run the above code, here is the result you should see on your console screen, don’t worry if you see the different version. Core concept will remain same.
We are learning Machine Learning at WebTrainingRoom Python: 3.7.5 (tags/v3.7.5:5c02a39a0b, Oct 15 2019, 00:11:34) [MSC v.1916 64 bit (AMD64)] matplotlib: 3.1.3 pandas: 1.0.1 sklearn: 0.22.1 scipy: 1.4.1 numpy: 1.18.1 Press any key to continue . . .
in progress