Programming reference.html (12991B)
1 2 <!DOCTYPE html> 3 <html> 4 <head> 5 <meta charset="UTF-8"> 6 <link rel="stylesheet" href="pluginAssets/highlight.js/atom-one-light.css"> 7 <title>Programming reference</title> 8 <link rel="stylesheet" href="pluginAssets/katex/katex.css" /><link rel="stylesheet" href="./style.css" /></head> 9 <body> 10 11 <div id="rendered-md"><h1 id="numpy-matplotlib">Numpy & matplotlib</h1> 12 <p>Load external file:</p> 13 <pre class="hljs"><code>data = numpy.loadtxt(<span class="hljs-string">'./filepath.csv'</span>, delimiter=<span class="hljs-string">','</span>) 14 </code></pre> 15 <p>Print information about data:</p> 16 <pre class="hljs"><code>data.shape 17 </code></pre> 18 <p>Graph two columns of data:</p> 19 <pre class="hljs"><code><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt 20 %matplotlib inline 21 x = data[:,<span class="hljs-number">0</span>] 22 y = data[:,<span class="hljs-number">1</span>] 23 <span class="hljs-comment"># includes size and transparency setting, specifies third column to use for color</span> 24 plt.scatter(x, y, s=<span class="hljs-number">3</span>, alpha=<span class="hljs-number">0.2</span>, c=data[:,<span class="hljs-number">2</span>], cmap=<span class="hljs-string">'RdYlBu_r'</span>) 25 plt.xlabel(<span class="hljs-string">'x axis'</span>) 26 plt.ylabel(<span class="hljs-string">'y axis'</span>); 27 </code></pre> 28 <p>Histogram plotting:</p> 29 <pre class="hljs"><code><span class="hljs-comment"># bins determines width of bars</span> 30 plt.hist(data, bins=<span class="hljs-number">100</span>, range=[start, end] 31 </code></pre> 32 <p>The identity matrix:</p> 33 <pre class="hljs"><code>np.eye(<span class="hljs-number">2</span>) <span class="hljs-comment"># for a 2x2 matrix</span> 34 </code></pre> 35 <p>Matrix multiplication:</p> 36 <pre class="hljs"><code>a * b <span class="hljs-comment"># element-wise</span> 37 a.dot(b) <span class="hljs-comment"># dot product</span> 38 </code></pre> 39 <p>Useful references:</p> 40 <ul> 41 <li><a data-from-md title='https://docs.scipy.org/doc/numpy-dev/user/quickstart.html' href='https://docs.scipy.org/doc/numpy-dev/user/quickstart.html' type=''>The official numpy quickstart guide</a></li> 42 <li><a data-from-md title='https://www.datacamp.com/community/tutorials/python-numpy-tutorial' href='https://www.datacamp.com/community/tutorials/python-numpy-tutorial' type=''>A more in-depth tutorial, with in-browser samples</a></li> 43 <li><a data-from-md title='http://cs231n.github.io/python-numpy-tutorial/' href='http://cs231n.github.io/python-numpy-tutorial/' type=''>A very good walk through the most important functions and features</a>. From the famous <a data-from-md title='http://cs231n.github.io/' href='http://cs231n.github.io/' type=''>CS231n course</a>, from Stanford.</li> 44 <li><a data-from-md title='https://matplotlib.org/users/pyplot_tutorial.html' href='https://matplotlib.org/users/pyplot_tutorial.html' type=''>The official pyplot tutorial</a>. Note that pyplot can accept basic python lists as well as numpy data.</li> 45 <li><a data-from-md title='https://matplotlib.org/gallery.html' href='https://matplotlib.org/gallery.html' type=''>A gallery of example MPL plots</a>. Most of these do not use the pyplot state-machine interface, but the more low level objects like <a data-from-md title='https://matplotlib.org/api/axes_api.html' href='https://matplotlib.org/api/axes_api.html' type=''>Axes</a>.</li> 46 <li><a data-from-md title='http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html' href='http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html' type=''>In-depth walk through the main features and plot types</a></li> 47 </ul> 48 <h1 id="sklearn">Sklearn</h1> 49 <p>Split data into train and test, on features <code class="inline-code">x</code> and target <code class="inline-code">y</code>:</p> 50 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> train_test_split 51 x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span class="hljs-number">0.5</span>) 52 </code></pre> 53 <p>An estimator implements method <code class="inline-code">fit(x,y)</code> that learns from data, and <code class="inline-code">predict(T)</code> which takes new instance and predicts target value.</p> 54 <p>Linear classifier, using SVC model with linear kernel:</p> 55 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.svm <span class="hljs-keyword">import</span> SVC 56 linear = SVC(kernel=<span class="hljs-string">'linear'</span>) 57 linear.fit(x_train, y_train) 58 </code></pre> 59 <p>Decision tree classifier:</p> 60 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.tree <span class="hljs-keyword">import</span> DecisionTreeClassifier 61 tree = DecisionTreeClassifier() 62 tree.fit(x_train, y_train) 63 </code></pre> 64 <p>k-Nearest Neighbors:</p> 65 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.neighbors <span class="hljs-keyword">import</span> KNeighborsClassifier 66 knn = KNeighborsClassifier(<span class="hljs-number">15</span>) <span class="hljs-comment"># We set the number of neighbors to 15</span> 67 knn.fit(x_train, y_train) 68 </code></pre> 69 <p>Try to classify new data:</p> 70 <pre class="hljs"><code>linear.predict(some_data) 71 </code></pre> 72 <p>Compute accuracy on testing data:</p> 73 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> accuracy_score 74 y_predicted = linear.predict(x_test) 75 accuracy_score(y_test, y_predicted) 76 </code></pre> 77 <p>Make a plot of classification, with colors showing classifier's decision:</p> 78 <pre class="hljs"><code><span class="hljs-keyword">from</span> mlxtend.plotting <span class="hljs-keyword">import</span> plot_decision_regions 79 plot_decision_regions(x_test[:<span class="hljs-number">500</span>], y_test.astype(np.integer)[:<span class="hljs-number">500</span>], clf=linear, res=<span class="hljs-number">0.1</span>); 80 </code></pre> 81 <p>Compare classifiers via ROC curve:</p> 82 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> roc_curve, auc 83 84 <span class="hljs-comment"># The linear classifier doesn't produce class probabilities by default. We'll retrain it for probabilities.</span> 85 linear = SVC(kernel=<span class="hljs-string">'linear'</span>, probability=<span class="hljs-literal">True</span>) 86 linear.fit(x_train, y_train) 87 88 <span class="hljs-comment"># We'll need class probabilities from each of the classifiers</span> 89 y_linear = linear.predict_proba(x_test) 90 y_tree = tree.predict_proba(x_test) 91 y_knn = knn.predict_proba(x_test) 92 93 <span class="hljs-comment"># Compute the points on the curve</span> 94 <span class="hljs-comment"># We pass the probability of the second class (KIA) as the y_score</span> 95 curve_linear = sklearn.metrics.roc_curve(y_test, y_linear[:, <span class="hljs-number">1</span>]) 96 curve_tree = sklearn.metrics.roc_curve(y_test, y_tree[:, <span class="hljs-number">1</span>]) 97 curve_knn = sklearn.metrics.roc_curve(y_test, y_knn[:, <span class="hljs-number">1</span>]) 98 99 <span class="hljs-comment"># Compute Area Under the Curve</span> 100 auc_linear = auc(curve_linear[<span class="hljs-number">0</span>], curve_linear[<span class="hljs-number">1</span>]) 101 auc_tree = auc(curve_tree[<span class="hljs-number">0</span>], curve_tree[<span class="hljs-number">1</span>]) 102 auc_knn = auc(curve_knn[<span class="hljs-number">0</span>], curve_knn[<span class="hljs-number">1</span>]) 103 104 plt.plot(curve_linear[<span class="hljs-number">0</span>], curve_linear[<span class="hljs-number">1</span>], label=<span class="hljs-string">'linear (area = %0.2f)'</span> % auc_linear) 105 plt.plot(curve_tree[<span class="hljs-number">0</span>], curve_tree[<span class="hljs-number">1</span>], label=<span class="hljs-string">'tree (area = %0.2f)'</span> % auc_tree) 106 plt.plot(curve_knn[<span class="hljs-number">0</span>], curve_knn[<span class="hljs-number">1</span>], label=<span class="hljs-string">'knn (area = %0.2f)'</span>% auc_knn) 107 108 plt.xlim([<span class="hljs-number">0.0</span>, <span class="hljs-number">1.0</span>]) 109 plt.ylim([<span class="hljs-number">0.0</span>, <span class="hljs-number">1.0</span>]) 110 plt.xlabel(<span class="hljs-string">'False Positive Rate'</span>) 111 plt.ylabel(<span class="hljs-string">'True Positive Rate'</span>) 112 plt.title(<span class="hljs-string">'ROC curve'</span>); 113 114 plt.legend(); 115 </code></pre> 116 <p>Cross-validation:</p> 117 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn.model_selection <span class="hljs-keyword">import</span> cross_val_score 118 <span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> roc_auc_score, make_scorer 119 120 <span class="hljs-comment"># The cross_val_score function does all the training for us. We simply pass</span> 121 <span class="hljs-comment"># it the complete data, the model, and the metric.</span> 122 123 linear = SVC(kernel=<span class="hljs-string">'linear'</span>, probability=<span class="hljs-literal">True</span>) 124 125 <span class="hljs-comment"># Train for 5 folds, returing ROC AUC. You can also try 'accuracy' as a scorer</span> 126 scores = cross_val_score(linear, x, y, cv=<span class="hljs-number">3</span>, scoring=<span class="hljs-string">'roc_auc'</span>) 127 128 print(<span class="hljs-string">'scores per fold '</span>, scores) 129 </code></pre> 130 <p>Regression:</p> 131 <pre class="hljs"><code><span class="hljs-keyword">from</span> sklearn <span class="hljs-keyword">import</span> datasets 132 <span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> mean_squared_error, r2_score 133 134 <span class="hljs-comment"># Load the diabetes dataset, and select one feature (Body Mass Index)</span> 135 x, y = datasets.load_diabetes(<span class="hljs-literal">True</span>) 136 x = x[:, <span class="hljs-number">2</span>].reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>) 137 138 <span class="hljs-comment"># -- the reshape operation ensures that x still has two dimensions</span> 139 <span class="hljs-comment"># (that is, we need it to be an n by 1 matrix, not a vector)</span> 140 141 x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=<span class="hljs-number">0.5</span>) 142 143 <span class="hljs-comment"># feature space on horizontal axis, output space on vertical axis</span> 144 plt.scatter(x_train[:, <span class="hljs-number">0</span>], y_train) 145 plt.xlabel(<span class="hljs-string">'BMI'</span>) 146 plt.ylabel(<span class="hljs-string">'disease progression'</span>); 147 148 <span class="hljs-comment"># Train three models: linear regression, tree regression, knn regression</span> 149 <span class="hljs-keyword">from</span> sklearn.linear_model <span class="hljs-keyword">import</span> LinearRegression 150 linear = LinearRegression() 151 linear.fit(x_train, y_train) 152 153 <span class="hljs-keyword">from</span> sklearn.tree <span class="hljs-keyword">import</span> DecisionTreeRegressor 154 tree = DecisionTreeRegressor() 155 tree.fit(x_train, y_train) 156 157 <span class="hljs-keyword">from</span> sklearn.neighbors <span class="hljs-keyword">import</span> KNeighborsRegressor 158 knn = KNeighborsRegressor(<span class="hljs-number">10</span>) 159 knn.fit(x_train, y_train); 160 161 <span class="hljs-comment"># Plot the models</span> 162 <span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> mean_squared_error 163 164 plt.scatter(x_train, y_train, alpha=<span class="hljs-number">0.1</span>) 165 166 xlin = np.linspace(<span class="hljs-number">-0.10</span>, <span class="hljs-number">0.2</span>, <span class="hljs-number">500</span>).reshape(<span class="hljs-number">-1</span>, <span class="hljs-number">1</span>) 167 plt.plot(xlin, linear.predict(xlin), label=<span class="hljs-string">'linear'</span>) 168 plt.plot(xlin, tree.predict(xlin), label=<span class="hljs-string">'tree '</span>) 169 plt.plot(xlin, knn.predict(xlin), label=<span class="hljs-string">'knn '</span>) 170 171 print(<span class="hljs-string">'MSE linear '</span>, mean_squared_error(y_test, linear.predict(x_test))) 172 print(<span class="hljs-string">'MSE tree '</span>, mean_squared_error(y_test, tree.predict(x_test))) 173 print(<span class="hljs-string">'MSE knn'</span>, mean_squared_error(y_test, knn.predict(x_test))) 174 175 plt.legend(); 176 </code></pre> 177 <p>Useful references:</p> 178 <ul> 179 <li><a data-from-md title='http://scikit-learn.org/stable/tutorial/basic/tutorial.html' href='http://scikit-learn.org/stable/tutorial/basic/tutorial.html' type=''>The official quickstart guide</a></li> 180 <li><a data-from-md title='https://www.datacamp.com/community/tutorials/machine-learning-python' href='https://www.datacamp.com/community/tutorials/machine-learning-python' type=''>A DataCamp tutorial with interactive exercises</a></li> 181 <li><a data-from-md title='http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html' href='http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html' type=''>Analyzing text data with SKLearn</a></li> 182 </ul> 183 </div></div> 184 </body> 185 </html>