This problem is done to show that, even though data does not look like having an easy model to do the fit, after a small transformation it can be found.
We import the libraries
import numpy as np
import matplotlib.pylab as plt
from scipy.optimize import curve_fit
Then we save the url where the data is,
url='https://raw.githubusercontent.com/jmsevillam/Herramientas-Computacionales-UniAndes/master/Data/Regression/spiral.dat'
We load the data to our program
x,y,r,theta =np. genfromtxt ( url ).T
As an exploratory step, we plot the data,
plt.plot(x,y,'.')
Finding a model doesn't look like an easy task, but maybe after a small transformation, it does.
The data gives the values of $r$ and $\\theta$, but in principle, they can be found from a transformation.
plt.plot(r,theta,'.')
We will do the fit using
def f(x,a,b):
return a*x+b
popt,pcov=curve_fit(f,r,theta)
Such that, after the fitting we get
plt.plot(r,theta,'.')
plt.plot(r,f(r,*popt))
So, as we want to recover the $x,y$ plane, we proceed to to a resampling so that the final result is smooth.
r_plot=np.linspace(0,10,1000)
x_plot,y_plot=r_plot*np.cos(f(r_plot,*popt)),r_plot*np.sin(f(r_plot,*popt))
plt.plot(x,y,'.')
plt.plot(x_plot,y_plot)
One of the most important problems when learning to manage data, is that description is not prediction, so we will do a fit of a certain group of data, such that when we add a new point, it does not work as well as a simpler model.
url='https://raw.githubusercontent.com/jmsevillam/Herramientas-Computacionales-UniAndes/master/Data/Regression/data.dat'
x,y=np.genfromtxt(url).T
x1=x[:-1]
y1=y[:-1]
Our two models
def poly9 (x,a0 ,a1 ,a2 ,a3 ,a4 ,a5 ,a6 ,a7 ,a8 ,a9):
return np. poly1d ([a0 ,a1 ,a2 ,a3 ,a4 ,a5 ,a6 ,a7 ,a8 ,a9])(x)
def line(x,a,b):
return a*x+b
print(len(y1))
As we have 10 points, we are using a 9th grade polynomial, so that it touches all the points.
The warning shown below is due to the fact that the method finds that the error is zero. (There is no distance between the line and the points)
popt1 , pcov1 = curve_fit (line ,x1 ,y1)
popt2 , pcov2 = curve_fit (poly9 ,x1 ,y1)
x_plot =np. linspace (0 ,9 ,1000)
plt . plot (x_plot , line (x_plot ,* popt1 ))
plt . plot (x_plot , poly9 (x_plot ,* popt2 ))
plt . plot (x1 ,y1,'.')
But when we add a new point, the orange plot dows not describe the dataset anymore, but the blue one does.
x_plot2 =np. linspace (0 ,10 ,1000)
plt . plot (x_plot2 , line (x_plot2 ,* popt1 ))
plt . plot (x_plot2 , poly9 (x_plot2 ,* popt2 ))
plt . plot (x ,y,'.')
popt3 , pcov3 = curve_fit (line ,x,y)
popt4 , pcov4 = curve_fit (poly9 ,x,y)
If we add the new point, and repeat the fit we get new values for the optimal parameters.
Let us see how different they are,
for i in range(len(popt1)):
print(popt1[i],popt3[i])
for i in range(len(popt2)):
print(popt2[i],popt4[i])
It is easier if we evaluate the percentage, (0 means that both are equal, and 1 means that the difference is of the 100%)
for i in range(len(popt1)):
print((popt1[i]-popt3[i])/popt1[i])
for i in range(len(popt2)):
print((popt2[i]-popt4[i])/popt4[i])
The linear model, remains close after the addition of the new point. But the Polynomial changes completely (Almost 10 times!!!)
Again, we load the data and plot it
url='raw.githubusercontent.com/jmsevillam/Herramientas-Computacionales-UniAndes/master/Data/Spectra/137Cs_10min.dat'
data=np.genfromtxt(url)
plt.plot(data[:,0],data[:,1])
We select a fraction of the data.
New_x=data[280:400,0]
New_data=data[280:400,1]
plt.plot(New_x,New_data)
We define the model: a gaussian + brackground.
def Gaussian(x,mu,sigma,amp,a,b):
return amp*np.exp(-0.5*(x-mu)**2/sigma**2)+a*x+b
popt,pcov=curve_fit(Gaussian,New_data,New_x)
The model did not work
plt.plot(New_x,New_data,'.')
plt.plot(New_x,Gaussian(New_x,*popt))
But, when adding the initial conditions
popt,pcov=curve_fit(Gaussian,New_x,New_data,p0=[350,10,40000,1,1])
plt.plot(New_x,New_data,'.')
plt.plot(New_x,Gaussian(New_x,*popt))