Lecture 9_ Web security.md (6655B)
1 +++ 2 title = "Lecture 9: Web security" 3 +++ 4 5 # Lecture 9: Web security 6 ## Overview 7 Requests: 8 - request is composed of header and optional body separated by CRLF 9 - header contains method, resource, protocol version, other info 10 - body is considered as byte stream 11 12 Resource can be specified by absolute URI or absolute path. 13 In HTTP/1.1, Host field is required to specify server to receive request. 14 HTTP/2 lets server to push content (respond with data for more queries than client requested). 15 16 Replies: 17 - composed of header and body separated by CRLF 18 - header contains protocol version, status code, diagnostic text, other info 19 - body is a byte stream 20 21 Header fields: 22 - general: refer to message (date, pragma, cache-control, transfer-encoding, etc.) 23 - request: accept (MIME type), host, authorization, from, if-modified-since, user-agent, referer, etc. 24 - response: location, server, www-authenticate 25 - entity: allow (methods that can be invoked), content-encoding, content-length (required if body not null), content-type (MIME type of body), expires, last-modified 26 27 URIs 28 - syntax: {scheme}://{authority}{path}?{query} 29 - can be absolute or relative 30 31 Authentication 32 - simple challenge-response theme 33 - challenge returned by server as part of 401 reply, specifies auth schema to be used 34 - auth request refers to realm (set of resources on server) 35 - client must include Authorization header field with required valid credentials 36 - examples: 37 - Basic HTTP auth: 38 - server replies to unauthorized request wit 401 message containing header field: `WWW-Authenticate: Basic realm="ReservedDocs"` 39 - client retries access including in header a field containing cookie composed of base64 encoded username and password: `Authorization: Basic <token>` 40 - HTTP1.1 auth: 41 - defines additional auth scheme based on cryptographic digests 42 - server sends nonce as challenge 43 - client sends request with digest of username, password, given nonce value, HTTP method, and request URL 44 - to authenticate users, web server must have access to plaintext user passwords 45 - WebAuthn 46 47 Maintaining State 48 - HTTP is stateless, but many apps require maintaining state across requests 49 - ways: 50 - embedding information in URLs: app embeds user-specific info in every link contained in page returned to user 51 - putting information in forms: use hidden input tags, contains names/values 52 - cookies: set by server by including Set-Cookie header, cookies are passed in every further transaction with the site, accessible only by site that set them 53 - sessions: implemented at app level, at the beginning of session a unique ID is generated for the user, and is used to index info stored on server side 54 55 ## Server side 56 Common Gateway Interface: 57 - mechanism to invoke programs on server side 58 - program's output returned to client 59 - input parameters passed using URL in GET, or using body in POST 60 - programs can be written in any language 61 - input to program passed to process' stdin 62 - parameters passed by setting environment variables (REQUEST_METHOD, PATH_INFO, QUERY_STRING, CONTENT_TYPE, etc.) 63 64 Active Server Pages (ASP, ASP.NET) 65 - Microsoft's version of CGI scripts 66 - pages containing mix of text, HTML tags, scripting directives (VBScript, JScript), server-side includes 67 - directives executed on server side before serving page 68 69 Servlets, JavaServer pages 70 - Servlets: Java programs executed on server, similar to CGI programs, can be executed within existing JVM without making new process 71 - JavaServer pages (JSP): static HTML intermixed with Java code, similar to ASP, compiled into servlets 72 73 PHP: 74 - scripting language embedded in HTML pages 75 - PHP code executed on server side when page containing code request 76 - common setup is LAMP (Linux+Apache+MySQL+PHP) 77 78 Web Application Frameworks 79 - provide support for fast development of web apps 80 - might be based on existing web servers, or have a new environment 81 - often based on Model-View-Controller architecture 82 - provide automatic translation of objects from/to database 83 - provide templates for generating dynamic pages 84 85 ## Client side 86 Java applets: 87 - compiled Java programs downloaded into browser and executed in context of web page 88 - access to resources regulated by implementation of Java Security Manager 89 - dead 90 91 ActiveX controls 92 - binary, native code downloaded and executed in context of page 93 - only supported by Windows-based browsers 94 - code signed using Authenticode mechanism 95 - once executed, complete access to client's environment 96 - dead 97 98 JavaScript/JScript, EcmaScript/VBScript 99 - scripting languages for dynamic behavior in web pages 100 - JS initially introduced by NetScape, JScript is Microsoft's version 101 - EcmaScript standardised version of JS 102 - VBScript is based on Microsoft Visual Basic 103 104 asm.js: 105 - subset of JS allowing for fast code 106 - can use special compiler passes to e.g. translate C to asm.js 107 108 webassembly 109 - low-level bytecode for in-browser client-side scripting 110 initial aim to support compilation from C and C++ 111 - initial implementation support in browsers based on feature set of asm.js 112 113 Code is embedded into HTML pages using script tag. 114 Window is top of hierarchy of objects. 115 DOM (Document object model) lets your script manipulate content. 116 BOM (Browser object model) is interface to browser's properties. 117 118 JS security policies: 119 - same origin: JS code can only access resources (e.g. cookies) associated with same origin/host 120 - every frame in browser's window is associated with domain (origin = URI scheme, hostname, port number) 121 - web browser only lets scripts contained in web page A to access data in web page B if they have the same origin 122 - even iframes/included scripts execute within frame domain 123 - signed script: signature on JS code is verified and principal identity extracted; principal identity compared to policy file to determine level of access 124 - configurable: user can manually modify policy file to allow/deny access to specific :/methods for code from specific sites 125 126 Site isolation 127 - site-dedicated processes, with browser process as interface 128 - cross-origin read blocking: stop access to specific types of data 129 130 AJAX (asynchronous JavaScript and XML) 131 - way to modify page based on result of request, without need of explicit user action 132 - relies on JS-based DOM manipulation, and XML-HTTP request object 133 - using `onreadystatechange` property of XML-HTTP request, set callback for result of query 134 135 Possible attacks: 136 - bug in renderer process 137 - universal cross-site scripting bugs let attacker bypass same origin policy 138 - side channel attacks like Spectre/RIDL let attacker read arbitrary renderer process memory