index.md - lectures.alex.balgavy.eu - Lecture notes from university.

index.md (14115B)
      1 +++
      2 title = 'Web security'
      3 +++
      4 
      5 # Web Security
      6 Request:
      7 - header and optional body, separated by empty line (CRLF)
      8 - header specifies:
      9     - method:
     10         - GET: transfer of entity referred to by URL
     11         - HEAD: transfer of header meta-information
     12         - POST: send data
     13         - etc.
     14     - resource: absolute URI or absolute path (starting with slash)
     15     - host:
     16         - in HTTP 1.0, can't tell from request which server was intended to process request
     17         - in HTTP 1.1, mandatory, and allows multiple servers on the same IP address
     18 - HTTP/2 is similar to HTTP/1.1, but more features. e.g. allows server to "push" more data
     19 
     20 Reply:
     21 - header and body separated by CRLF
     22 - header contains: protocol version, status code, diagnostic text, other info
     23 - body is a byte stream
     24 
     25 URI syntax: `<scheme>://<authority><path>?<query>`
     26 
     27 HTTP authentication:
     28 - simple challenge-response
     29 - challenge returned as part of 401 reply, specifies schema to be used
     30 - auth request refers to 'realm' (set of resources on server)
     31 - client must include Authorization header field with required valid credentials
     32 
     33 Basic HTTP authentication
     34 - server replies with 401 message containing header field: `WWW-Authenticate: Basic realm="whatever"`
     35 - client retries access, including cookie composed of base64-encoded username and password (easily readable)
     36 
     37 HTTP1.1 authentication:
     38 - defines authentication scheme based on cryptographic digests
     39     - server sends nonce as challenge
     40     - client sends request with digest of username, password, given nonce, HTTP method, and requested URL
     41 - web server has to have access to cleartext passwords
     42 - but man-in-the-middle (MITM) still possible, with using basic auth with client and then advanced with server
     43 
     44 ![HTTP MITM attack diagram](http-mitm-attack-diagram.png)
     45 
     46 Web Authentication API
     47 - allows strong auth with public key crypto, passwordless, second-factor, etc.
     48 - relies on entities outside the browser, like authentictors
     49 
     50 Maintaining state
     51 - HTTP is stateless, but web apps often require state
     52 - can be achieved by:
     53     - embedding info in URLs: `GET /login.php?user=foo&pwd=bar HTTP/1.1`
     54     - hidden form fields: `<input type="hidden" name="user" value="foo" />`
     55     - cookies: via header `Set-Cookie: USER=foo; SHIPPING=fedex; path=/`
     56         - cookies are passed in every further transaction with the same site, in the `Cookie` header
     57         - only accessible by the site that set them
     58         - can have number of fields, in the form `<name>=<value>`
     59             - can have `expires` key for expiration date
     60             - `domain` for more generic domain
     61             - `secure` to only send via SSL connections
     62             - `httponly` to make it inaccessible to client-side scripts
     63         - a server can only set a limited number of cookies
     64 
     65 Sessions:
     66 - represent time-limited interaction of user with web server
     67 - no concept at HTTP level, so use the state mechanisms above
     68 - generate unique ID at start of session, then use it to access info on server side
     69 
     70 ## Server-side
     71 ### Common Gateway Interface (CGI)
     72 Way to invoke programs on server side, with input returning to client.
     73 Input passed via URL or body in POST.
     74 
     75 CGI programs can be written in any language, and input piped to process's stdin.
     76 Parameters are passed via environment variables.
     77 
     78 ### Active Server Pages (ASP, ASP.NET)
     79 Pages that contain mix of text, HTML tags, scripting directives, and server-side includes.
     80 
     81 Directives are executed on server side before serving the page.
     82 
     83 ### Servlets and JavaServer pages (JSP)
     84 Servlets: Java programs executed on server (similar to CGI). Can run in existing JVM, without making a new process.
     85 
     86 JSP are static HTML mixed with Java code, and are compiled into servlets.
     87 
     88 ### PHP
     89 Scripting language that can be embedded in HTML.
     90 PHP code executed on server side when the page containing the code is requested.
     91 Common way is to have a LAMP stack.
     92 
     93 ### Web App Frameworks
     94 Support rapid development, might be based on existing web severs or might have their own.
     95 Often based on model-view-controller pattern, and provide automated translation of objects to/from database.
     96 Example is Ruby on Rails.
     97 
     98 ## Client-side
     99 ### Java Applets
    100 Compiled Java programs that are downloaded and executed in context of a web page.
    101 
    102 ### ActiveX
    103 Binary, OS-specific programs downloaded and executed in context of a web page.
    104 Code signed via Authenticode mechanism.
    105 Once executed, have complete access to client's environment.
    106 
    107 ### JavaScript/JScript, EcmaScript/VBScript
    108 Scripting languages for dynamic behavior in web pages.
    109 
    110 ### asm.js
    111 Subset of JS that allows for very fast code.
    112 Can use compiler passes to translate e.g. C code to asm.js
    113 
    114 ### webassembly
    115 Low-level bytecode for client-side scripting, supports compilation from C/C++.
    116 
    117 ### Global structure:
    118 "Window": top hierarchy of objects
    119 
    120 DOM: document object model
    121 - interface to manipulation of client-side content
    122 
    123 BOM: browser object model
    124 - interface to browser properties
    125 
    126 ### JS security
    127 JS code downloaded as part of HTML page, executed on-the-fly.
    128 Security guaranteed by sandboxing:
    129 - no access to files
    130 - no access to network resources
    131 - etc.
    132 
    133 Security policies:
    134 - same origin: JS can only access resources (e.g. cookies) associated with same origin (e.g. vu.nl)
    135     - every frame in browser is associated with domain ("origin")
    136     - web browser permits script contained in first web page to access data in second web page _only if same origin_
    137         - same URI scheme + exact hostname + port number
    138     - if frame explicitly includes external code, it executes within the same frame domain!
    139 - signed script: signature on JS code is verified and principal identity extracted, identity compared to policy file to determine level of access
    140 - configurable: user can manually modify policy file to allow/deny access
    141 
    142 Site isolation (Google Chrome): pages from different websites are different processes, each in a sandbox.
    143 
    144 ### AJAX (Asynchronous JavaScript and XML)
    145 Lets JS modify web page based on result of request, without need for explicit user interaction.
    146 
    147 XML HTTP request:
    148 - allows JS to retrieve XML data from server by querying from JS
    149 - use `onreadystatechange` property of XML-HTTP object to run a callback
    150 - `onreadystatechange` callback is called on any state change, so you can check the current state
    151 
    152 ## Web attacks
    153 ### Against authentication
    154 What's the best way to authenticate?
    155 - IP address: can be spoofed, and same user could use different IPs
    156 - HTTP-based: not scalable, hard to manage at application level (lots of options for digest)
    157 - Cert-based: works on server-side for SSL, few users have "real" certs or know how to use them
    158 - Form-based: data might be sent in the clear
    159 
    160 'Basic' authentication:
    161 - form used to send username and password over SSL
    162 - app verifies credentials, generates session authenticator (typically a cookie)
    163 - authenticators should not have predictable values
    164 - authenticators shouldn't be reusable across sessions
    165 - better to store random value with other session info in file or backend database
    166 
    167 If app includes authenticator in URL, browsers may leak info as part of "Refer" field.
    168 
    169 Expiration info should be stored on server side, or included in cookie in cryptographically secure way.
    170 
    171 Attacking it:
    172 - eavesdropping
    173     - if HTTP connection not protected by SSL, you can eavesdrop
    174     - name and password are sent as part of HTTP basic auth exchange
    175 - bruteforcing/guessing
    176     - if authenticators have limited values, can be bruteforced
    177     - if not random, can be guessed
    178 - bypassing
    179     - weak recovery procedures can be used to change a password to whatever you want
    180     - session fixation forces user's session ID to a known value
    181 
    182 ### Against authorization
    183 Authorization: what can a user do?
    184 
    185 Path/directory traversal: break out of document space by using relative paths
    186 
    187 Forceful browsing: manually jump to any publicly available resource
    188 
    189 Automatic directory listing: if no index.html in directory, browser returns listing of the files
    190 
    191 Parameter manipulation: changing parameters of valid request
    192 
    193 Parameter creation: add new parameters manually, such as `&admin=1`
    194 
    195 Server misconfiguration: e.g. if data can be uploaded via FTP and executed via a web request
    196 
    197 Command injection: incorrect validation of user input that leads to executing commands on the server
    198 
    199 ### Server-side includes (SSI)
    200 Simple interpreted server-side scripting language.
    201 
    202 You can introduce directives into web pages.
    203 Syntax: `<!-- #element attribute=value ... -->`
    204 
    205 These can also have things like `#exec`, which is a security problem.
    206 
    207 ### Command injection in PHP
    208 If `allow_url_fopen` is set, you can use URLs in `include()` and `require()`.
    209 If user input is used to create the filename, then you can execute arbitrary code.
    210 
    211 ### HTML injection
    212 You can inject HTML tags to modify behavior of a web page, e.g. an `iframe`, or forms to collect user's credentials.
    213 
    214 ### Preventing command injection
    215 Command injection is a sanitization problem, so don't trust outside input. Always sanitize.
    216 
    217 ### SQL injection
    218 SQL queries are built using parameters provided by users.
    219 If a user provides special characters, they can modify queries, find out about stored procedures in database, and even run commands.
    220 
    221 If you build a query like this:
    222 
    223 ```asp
    224 var sql = "select * from user_accounts where username = '" + username + "' and password = '" + password + "'";
    225 ```
    226 
    227 You can provide the input `' or 1=1 --` for username to get a string like this:
    228 
    229 ```sql
    230 select * from user_accounts whre username='' or 1=1--' and password=''
    231 ```
    232 
    233 Since 1=1 is always true, you get all of the records in the table.
    234 
    235 You can use this to run subqueries, and if the result is reflected back, you can extract info from other tables.
    236 
    237 Identifying SQL injections:
    238 - negative approach: special-meaning characters in query will cause an error
    239 - positive approach: provide expression that would not cause an error (e.g. `''Foo` instead of `Foo`)
    240 
    241 Number of columns in a query can be determined using progressively longer NULL columns until correct query is returns (i.e. `UNION SELECT NULL`, `UNION SELECT NULL, NULL`, etc.)
    242 
    243 If you want to figure out which column has a string: `UNION SELECT 'foo', NULL, NULL`, `UNION SELECT NULL, 'foo', NULL`, etc.
    244 
    245 ### Second order SQL injection
    246 SQL code injected into application, but statement invoked at later point in time.
    247 Even if application escapes single quotes, second order SQL injection might be possible.
    248 E.g. if you save a 'favorite search' which contains an SQL injection, and then select it later, running the injection.
    249 
    250 ### Blind SQL injection
    251 If you have no feedback, you can use `AND 1=1` to check if input is sanitized.
    252 
    253 ### XSS
    254 XSS (Cross-site scripting): used to bypass JS's same origin policy
    255 - reflected attacks: injected code reflected off web server, e.g. in an error message
    256     - e.g. including JS code inside of a link, which is reflected on the 404 page (and thus executed)
    257 - stored attacks: injected code permanently stored on the server e.g. in a database
    258 
    259 Preventing XSS:
    260 - every piece of data returned to user that can be influenced by input must first be sanitized
    261 - languages often provide routines to help with this
    262 - sanitization has to be done differently depending on where the data is used
    263 - rules:
    264     0. never insert untrusted data except in allowed locations
    265     1. HTML escape before inserting untrusted data into HTML element content
    266     2. attribute escape before inserting untrusted data into HTML common attributes
    267     3. JS escape before inserting into HTML JS data values
    268     4. CSS escape before inserting into HTML style property values
    269     5. URI escape before inserting into HTML URL attributes
    270 - use `httponly` on cookies to prevent access by scripts
    271 
    272 ### Cross-site request forgery (CSRF)
    273 Allows attacker to execute requests on behalf of victim.
    274 
    275 "Confused deputy attack": browser uses victim's authority to do what the attacker wants
    276 
    277 ![Diagram showing CSRF](csrf-diagram.png)
    278 
    279 Preventing:
    280 - HTML-only: web server embeds token (secret & unique value) for each request, in all HTML forms, verified on server side
    281 - header-based (for JS sites)
    282     - on login, web app sets cookie containing random token that stays same for whole session
    283     - JS on client side copies it into custom HTTP header. Only JS within the same origin.
    284     - server validates this
    285 
    286 ### Server-side request forgery (SSRF)
    287 Suppose the server is asked to make a request to some back-end API like this:
    288 
    289 ```
    290 POST /product/stock HTTP/1.0
    291 Content-Type: application/x-www-form-urlencoded
    292 Content-Lenth: 118
    293 
    294 stockApi=http://stock...
    295 ```
    296 
    297 If the attacker can change the URL, it can provide something like
    298 
    299 ```
    300 POST /product/stock HTTP/1.0
    301 Content-Type: application/x-www-form-urlencoded
    302 Content-Length: 118
    303 
    304 stockApi=http://localhost/admin
    305 ```
    306 
    307 This means that server accesses its own admin URL, which is inaccessible from the outside but not checked from localhost.
    308 
    309 another attack is clickjacking:
    310 - user visits attacker's website
    311 - website has transparent iframe with target site on top
    312 - click leads to opening a popup
    313 
    314 ### HTTP response splitting
    315 Exploits the fact that user provided data is in header of reply.
    316 
    317 For example, if setting language to english gives you a redirect like this:
    318 
    319 ```
    320 HTTP/1.1 302 Moved Temporarily
    321 Date: ...
    322 Location: http://10.1.1.1/by_lang.jsp?lang=English
    323 ...
    324 <html>Error</html>
    325 ```
    326 
    327 You can provide URL-encoded headers inside of lang, which can be interpreted.
    328 
    329 ### HTTP request smuggling
    330 You can add a space after a header, without CRLF, and then an 'inner' HTTP request:
    331 
    332 ![Request smuggling example])(http-request-smuggling.png)
    333 
    334 ### PHP type juggling
    335 PHP has loose (`==`) and strict (`===`) comparisons.
    336 
    337 When comparing string to number, PHP tries to convert the string to the appropriate number.
    338 If both operands look like numbers, PHP converts both to numbers and does numeric comparison.
    339 
    340 ### Python Pickle
    341 Serialization of python datatypes.
    342 
    343 Pickle allows arbitrary objects to be pickled by providing a `__reduce__` method, which should return:
    344 - a string
    345 - or tuple describing how to reconstruct object
	lectures.alex.balgavy.eu Lecture notes from university.
	git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
	Log \| Files \| Refs \| Submodules