index.md (14115B)
1 +++ 2 title = 'Web security' 3 +++ 4 5 # Web Security 6 Request: 7 - header and optional body, separated by empty line (CRLF) 8 - header specifies: 9 - method: 10 - GET: transfer of entity referred to by URL 11 - HEAD: transfer of header meta-information 12 - POST: send data 13 - etc. 14 - resource: absolute URI or absolute path (starting with slash) 15 - host: 16 - in HTTP 1.0, can't tell from request which server was intended to process request 17 - in HTTP 1.1, mandatory, and allows multiple servers on the same IP address 18 - HTTP/2 is similar to HTTP/1.1, but more features. e.g. allows server to "push" more data 19 20 Reply: 21 - header and body separated by CRLF 22 - header contains: protocol version, status code, diagnostic text, other info 23 - body is a byte stream 24 25 URI syntax: `<scheme>://<authority><path>?<query>` 26 27 HTTP authentication: 28 - simple challenge-response 29 - challenge returned as part of 401 reply, specifies schema to be used 30 - auth request refers to 'realm' (set of resources on server) 31 - client must include Authorization header field with required valid credentials 32 33 Basic HTTP authentication 34 - server replies with 401 message containing header field: `WWW-Authenticate: Basic realm="whatever"` 35 - client retries access, including cookie composed of base64-encoded username and password (easily readable) 36 37 HTTP1.1 authentication: 38 - defines authentication scheme based on cryptographic digests 39 - server sends nonce as challenge 40 - client sends request with digest of username, password, given nonce, HTTP method, and requested URL 41 - web server has to have access to cleartext passwords 42 - but man-in-the-middle (MITM) still possible, with using basic auth with client and then advanced with server 43 44 ![HTTP MITM attack diagram](http-mitm-attack-diagram.png) 45 46 Web Authentication API 47 - allows strong auth with public key crypto, passwordless, second-factor, etc. 48 - relies on entities outside the browser, like authentictors 49 50 Maintaining state 51 - HTTP is stateless, but web apps often require state 52 - can be achieved by: 53 - embedding info in URLs: `GET /login.php?user=foo&pwd=bar HTTP/1.1` 54 - hidden form fields: `<input type="hidden" name="user" value="foo" />` 55 - cookies: via header `Set-Cookie: USER=foo; SHIPPING=fedex; path=/` 56 - cookies are passed in every further transaction with the same site, in the `Cookie` header 57 - only accessible by the site that set them 58 - can have number of fields, in the form `<name>=<value>` 59 - can have `expires` key for expiration date 60 - `domain` for more generic domain 61 - `secure` to only send via SSL connections 62 - `httponly` to make it inaccessible to client-side scripts 63 - a server can only set a limited number of cookies 64 65 Sessions: 66 - represent time-limited interaction of user with web server 67 - no concept at HTTP level, so use the state mechanisms above 68 - generate unique ID at start of session, then use it to access info on server side 69 70 ## Server-side 71 ### Common Gateway Interface (CGI) 72 Way to invoke programs on server side, with input returning to client. 73 Input passed via URL or body in POST. 74 75 CGI programs can be written in any language, and input piped to process's stdin. 76 Parameters are passed via environment variables. 77 78 ### Active Server Pages (ASP, ASP.NET) 79 Pages that contain mix of text, HTML tags, scripting directives, and server-side includes. 80 81 Directives are executed on server side before serving the page. 82 83 ### Servlets and JavaServer pages (JSP) 84 Servlets: Java programs executed on server (similar to CGI). Can run in existing JVM, without making a new process. 85 86 JSP are static HTML mixed with Java code, and are compiled into servlets. 87 88 ### PHP 89 Scripting language that can be embedded in HTML. 90 PHP code executed on server side when the page containing the code is requested. 91 Common way is to have a LAMP stack. 92 93 ### Web App Frameworks 94 Support rapid development, might be based on existing web severs or might have their own. 95 Often based on model-view-controller pattern, and provide automated translation of objects to/from database. 96 Example is Ruby on Rails. 97 98 ## Client-side 99 ### Java Applets 100 Compiled Java programs that are downloaded and executed in context of a web page. 101 102 ### ActiveX 103 Binary, OS-specific programs downloaded and executed in context of a web page. 104 Code signed via Authenticode mechanism. 105 Once executed, have complete access to client's environment. 106 107 ### JavaScript/JScript, EcmaScript/VBScript 108 Scripting languages for dynamic behavior in web pages. 109 110 ### asm.js 111 Subset of JS that allows for very fast code. 112 Can use compiler passes to translate e.g. C code to asm.js 113 114 ### webassembly 115 Low-level bytecode for client-side scripting, supports compilation from C/C++. 116 117 ### Global structure: 118 "Window": top hierarchy of objects 119 120 DOM: document object model 121 - interface to manipulation of client-side content 122 123 BOM: browser object model 124 - interface to browser properties 125 126 ### JS security 127 JS code downloaded as part of HTML page, executed on-the-fly. 128 Security guaranteed by sandboxing: 129 - no access to files 130 - no access to network resources 131 - etc. 132 133 Security policies: 134 - same origin: JS can only access resources (e.g. cookies) associated with same origin (e.g. vu.nl) 135 - every frame in browser is associated with domain ("origin") 136 - web browser permits script contained in first web page to access data in second web page _only if same origin_ 137 - same URI scheme + exact hostname + port number 138 - if frame explicitly includes external code, it executes within the same frame domain! 139 - signed script: signature on JS code is verified and principal identity extracted, identity compared to policy file to determine level of access 140 - configurable: user can manually modify policy file to allow/deny access 141 142 Site isolation (Google Chrome): pages from different websites are different processes, each in a sandbox. 143 144 ### AJAX (Asynchronous JavaScript and XML) 145 Lets JS modify web page based on result of request, without need for explicit user interaction. 146 147 XML HTTP request: 148 - allows JS to retrieve XML data from server by querying from JS 149 - use `onreadystatechange` property of XML-HTTP object to run a callback 150 - `onreadystatechange` callback is called on any state change, so you can check the current state 151 152 ## Web attacks 153 ### Against authentication 154 What's the best way to authenticate? 155 - IP address: can be spoofed, and same user could use different IPs 156 - HTTP-based: not scalable, hard to manage at application level (lots of options for digest) 157 - Cert-based: works on server-side for SSL, few users have "real" certs or know how to use them 158 - Form-based: data might be sent in the clear 159 160 'Basic' authentication: 161 - form used to send username and password over SSL 162 - app verifies credentials, generates session authenticator (typically a cookie) 163 - authenticators should not have predictable values 164 - authenticators shouldn't be reusable across sessions 165 - better to store random value with other session info in file or backend database 166 167 If app includes authenticator in URL, browsers may leak info as part of "Refer" field. 168 169 Expiration info should be stored on server side, or included in cookie in cryptographically secure way. 170 171 Attacking it: 172 - eavesdropping 173 - if HTTP connection not protected by SSL, you can eavesdrop 174 - name and password are sent as part of HTTP basic auth exchange 175 - bruteforcing/guessing 176 - if authenticators have limited values, can be bruteforced 177 - if not random, can be guessed 178 - bypassing 179 - weak recovery procedures can be used to change a password to whatever you want 180 - session fixation forces user's session ID to a known value 181 182 ### Against authorization 183 Authorization: what can a user do? 184 185 Path/directory traversal: break out of document space by using relative paths 186 187 Forceful browsing: manually jump to any publicly available resource 188 189 Automatic directory listing: if no index.html in directory, browser returns listing of the files 190 191 Parameter manipulation: changing parameters of valid request 192 193 Parameter creation: add new parameters manually, such as `&admin=1` 194 195 Server misconfiguration: e.g. if data can be uploaded via FTP and executed via a web request 196 197 Command injection: incorrect validation of user input that leads to executing commands on the server 198 199 ### Server-side includes (SSI) 200 Simple interpreted server-side scripting language. 201 202 You can introduce directives into web pages. 203 Syntax: `<!-- #element attribute=value ... -->` 204 205 These can also have things like `#exec`, which is a security problem. 206 207 ### Command injection in PHP 208 If `allow_url_fopen` is set, you can use URLs in `include()` and `require()`. 209 If user input is used to create the filename, then you can execute arbitrary code. 210 211 ### HTML injection 212 You can inject HTML tags to modify behavior of a web page, e.g. an `iframe`, or forms to collect user's credentials. 213 214 ### Preventing command injection 215 Command injection is a sanitization problem, so don't trust outside input. Always sanitize. 216 217 ### SQL injection 218 SQL queries are built using parameters provided by users. 219 If a user provides special characters, they can modify queries, find out about stored procedures in database, and even run commands. 220 221 If you build a query like this: 222 223 ```asp 224 var sql = "select * from user_accounts where username = '" + username + "' and password = '" + password + "'"; 225 ``` 226 227 You can provide the input `' or 1=1 --` for username to get a string like this: 228 229 ```sql 230 select * from user_accounts whre username='' or 1=1--' and password='' 231 ``` 232 233 Since 1=1 is always true, you get all of the records in the table. 234 235 You can use this to run subqueries, and if the result is reflected back, you can extract info from other tables. 236 237 Identifying SQL injections: 238 - negative approach: special-meaning characters in query will cause an error 239 - positive approach: provide expression that would not cause an error (e.g. `''Foo` instead of `Foo`) 240 241 Number of columns in a query can be determined using progressively longer NULL columns until correct query is returns (i.e. `UNION SELECT NULL`, `UNION SELECT NULL, NULL`, etc.) 242 243 If you want to figure out which column has a string: `UNION SELECT 'foo', NULL, NULL`, `UNION SELECT NULL, 'foo', NULL`, etc. 244 245 ### Second order SQL injection 246 SQL code injected into application, but statement invoked at later point in time. 247 Even if application escapes single quotes, second order SQL injection might be possible. 248 E.g. if you save a 'favorite search' which contains an SQL injection, and then select it later, running the injection. 249 250 ### Blind SQL injection 251 If you have no feedback, you can use `AND 1=1` to check if input is sanitized. 252 253 ### XSS 254 XSS (Cross-site scripting): used to bypass JS's same origin policy 255 - reflected attacks: injected code reflected off web server, e.g. in an error message 256 - e.g. including JS code inside of a link, which is reflected on the 404 page (and thus executed) 257 - stored attacks: injected code permanently stored on the server e.g. in a database 258 259 Preventing XSS: 260 - every piece of data returned to user that can be influenced by input must first be sanitized 261 - languages often provide routines to help with this 262 - sanitization has to be done differently depending on where the data is used 263 - rules: 264 0. never insert untrusted data except in allowed locations 265 1. HTML escape before inserting untrusted data into HTML element content 266 2. attribute escape before inserting untrusted data into HTML common attributes 267 3. JS escape before inserting into HTML JS data values 268 4. CSS escape before inserting into HTML style property values 269 5. URI escape before inserting into HTML URL attributes 270 - use `httponly` on cookies to prevent access by scripts 271 272 ### Cross-site request forgery (CSRF) 273 Allows attacker to execute requests on behalf of victim. 274 275 "Confused deputy attack": browser uses victim's authority to do what the attacker wants 276 277 ![Diagram showing CSRF](csrf-diagram.png) 278 279 Preventing: 280 - HTML-only: web server embeds token (secret & unique value) for each request, in all HTML forms, verified on server side 281 - header-based (for JS sites) 282 - on login, web app sets cookie containing random token that stays same for whole session 283 - JS on client side copies it into custom HTTP header. Only JS within the same origin. 284 - server validates this 285 286 ### Server-side request forgery (SSRF) 287 Suppose the server is asked to make a request to some back-end API like this: 288 289 ``` 290 POST /product/stock HTTP/1.0 291 Content-Type: application/x-www-form-urlencoded 292 Content-Lenth: 118 293 294 stockApi=http://stock... 295 ``` 296 297 If the attacker can change the URL, it can provide something like 298 299 ``` 300 POST /product/stock HTTP/1.0 301 Content-Type: application/x-www-form-urlencoded 302 Content-Length: 118 303 304 stockApi=http://localhost/admin 305 ``` 306 307 This means that server accesses its own admin URL, which is inaccessible from the outside but not checked from localhost. 308 309 another attack is clickjacking: 310 - user visits attacker's website 311 - website has transparent iframe with target site on top 312 - click leads to opening a popup 313 314 ### HTTP response splitting 315 Exploits the fact that user provided data is in header of reply. 316 317 For example, if setting language to english gives you a redirect like this: 318 319 ``` 320 HTTP/1.1 302 Moved Temporarily 321 Date: ... 322 Location: http://10.1.1.1/by_lang.jsp?lang=English 323 ... 324 <html>Error</html> 325 ``` 326 327 You can provide URL-encoded headers inside of lang, which can be interpreted. 328 329 ### HTTP request smuggling 330 You can add a space after a header, without CRLF, and then an 'inner' HTTP request: 331 332 ![Request smuggling example])(http-request-smuggling.png) 333 334 ### PHP type juggling 335 PHP has loose (`==`) and strict (`===`) comparisons. 336 337 When comparing string to number, PHP tries to convert the string to the appropriate number. 338 If both operands look like numbers, PHP converts both to numbers and does numeric comparison. 339 340 ### Python Pickle 341 Serialization of python datatypes. 342 343 Pickle allows arbitrary objects to be pickled by providing a `__reduce__` method, which should return: 344 - a string 345 - or tuple describing how to reconstruct object