| 1 | ---
|
| 2 | in_progress: yes
|
| 3 | default_highlighter: oils-sh
|
| 4 | body_css_class: width50
|
| 5 | ---
|
| 6 |
|
| 7 | Tables, Object, and Documents - Notation, Query, Creation, Schema
|
| 8 | =============================
|
| 9 |
|
| 10 | <style>
|
| 11 | thead {
|
| 12 | background-color: #eee;
|
| 13 | font-weight: bold;
|
| 14 | text-align: left;
|
| 15 | }
|
| 16 | table {
|
| 17 | font-family: sans-serif;
|
| 18 | border-collapse: collapse;
|
| 19 | }
|
| 20 |
|
| 21 | tr {
|
| 22 | border-bottom: solid 1px;
|
| 23 | border-color: #ddd;
|
| 24 | }
|
| 25 |
|
| 26 | td {
|
| 27 | padding: 8px; /* override default of 5px */
|
| 28 | }
|
| 29 | </style>
|
| 30 |
|
| 31 | This is part of **maximal** YSH!
|
| 32 |
|
| 33 | <div id="toc">
|
| 34 | </div>
|
| 35 |
|
| 36 | ## Philosophy
|
| 37 |
|
| 38 | - Oils is Exterior-First
|
| 39 | - Tables, Objects, Documents - CSV, JSON, HTML
|
| 40 | - Oils cleanup: TSV8, JSON8, HTM8
|
| 41 |
|
| 42 | ## Tables
|
| 43 |
|
| 44 |
|
| 45 | <table>
|
| 46 |
|
| 47 | - thead
|
| 48 | - Data Type
|
| 49 | - Notation
|
| 50 | - Query
|
| 51 | - Creation
|
| 52 | - Schema
|
| 53 | - tr
|
| 54 | - Table
|
| 55 | - TSV, CSV
|
| 56 | - csvkit, xsv, awk-ish, etc. <br/>
|
| 57 | SQL, Data Frames
|
| 58 | - ?
|
| 59 | - ?
|
| 60 | - tr
|
| 61 | - Object
|
| 62 | - JSON
|
| 63 | - jq <br/>
|
| 64 | JSONPath: MySQL/Postgres/sqlite support it?
|
| 65 | - jq
|
| 66 | - JSON Schema
|
| 67 | - tr
|
| 68 | - Document
|
| 69 | - HTML5
|
| 70 | - DOM API like getElementById() <br/>
|
| 71 | CSS selectors <br/>
|
| 72 | - JSX Templates
|
| 73 | - ?
|
| 74 | - tr
|
| 75 | - Document
|
| 76 | - XML
|
| 77 | - XPath? XQuery?
|
| 78 | - XSLT?
|
| 79 | - three:
|
| 80 | - DTD (document type definition, 1986)
|
| 81 | - RelaxNG (2001)
|
| 82 | - XML Schema aka XSD (2001)
|
| 83 |
|
| 84 | <!-- TODO: ul-table should allow caption at the top -->
|
| 85 | <caption>Existing</caption>
|
| 86 |
|
| 87 | </table>
|
| 88 |
|
| 89 |
|
| 90 |
|
| 91 | <table>
|
| 92 |
|
| 93 | - thead
|
| 94 | - Data Type
|
| 95 | - Notation
|
| 96 | - Query
|
| 97 | - Creation
|
| 98 | - Schema
|
| 99 | - In-Memory
|
| 100 | - tr
|
| 101 | - Table
|
| 102 | - TSV8 (is valid TSV)
|
| 103 | - dplyr-like Data Frames <br/>
|
| 104 | Maybe some SQL-pipe subset thing?
|
| 105 | - `table { }`
|
| 106 | - ?
|
| 107 | - By column: dict of "arrays" <br/>
|
| 108 | By row: list of dicts <br/>
|
| 109 | - tr
|
| 110 | - Object
|
| 111 | - JSON8 (superset)
|
| 112 | - JSONPath? <br/>
|
| 113 | jq as a reshaping language
|
| 114 | - Hay? `Package { }`
|
| 115 | - JSON Schema?
|
| 116 | - List and Dict
|
| 117 | - tr
|
| 118 | - Document
|
| 119 | - HTM8 (subset)
|
| 120 | - CSS selectors
|
| 121 | - Markaby Style `div { }` <br/>
|
| 122 | "sed" style
|
| 123 | - ?
|
| 124 | - DocFrag - a span within a doc<br/>
|
| 125 | DocTree - an Obj representation<br/>
|
| 126 | ?
|
| 127 |
|
| 128 | <caption>Oils</caption>
|
| 129 |
|
| 130 | </table>
|
| 131 |
|
| 132 | ## Note: SQL Databases Support all three models!
|
| 133 |
|
| 134 | - sqlite, MySQL, and PostGres obviously have tables
|
| 135 | - They all have JSON and JSONPath support!
|
| 136 | - JSONPath syntax might differ a bit?
|
| 137 | - XML support
|
| 138 | - Postgres: XML type, XPath, more
|
| 139 | - MySQL: XML extraction functions only
|
| 140 | - sqlite: none
|
| 141 |
|
| 142 | ## Design Issues
|
| 143 |
|
| 144 | ### Streaming
|
| 145 |
|
| 146 | - jq has a line-based streaming model, by taking advantage of the fact that
|
| 147 | all JSON can be encoded without literal newlines
|
| 148 | - HTML/XML don't have this property
|
| 149 | - Solution: Netstring based streaming?
|
| 150 | - can do it for both JSON8 and HTM8 ?
|
| 151 |
|
| 152 | ### Mutual Nesting
|
| 153 |
|
| 154 | - JSON must be UTF-8, so JSON strings can contain JSON
|
| 155 | - ditto for JSON8, and J8 strings
|
| 156 | - TSV cells can't contain tabs or newlines
|
| 157 | - so they can't contain TSV
|
| 158 | - if you remove all the newlines, they can contain JSON
|
| 159 | - TSV8 cells use J8 strings, so they can contain JSON, TSV
|
| 160 | - HTM8
|
| 161 | - you can escape everything, so you can put another HTM8 doc inside
|
| 162 | - and you can put JSON/JSON8 or TSV/TSV8
|
| 163 | - although are there whitespace rules?
|
| 164 | - all nodes can be like `<pre>` nodes, preserving whitespace, until
|
| 165 | - you apply another function to it
|
| 166 |
|
| 167 | ### HTML5 whitespace rules
|
| 168 |
|
| 169 | - inside text context:
|
| 170 | - multiple whitespace chars collapsed into a single one
|
| 171 | - newlines converted to spaces
|
| 172 | - leading and trailing space is preserved
|
| 173 | - `<pre> <code> <textarea>`
|
| 174 | - whitespace is preserved exactly as written
|
| 175 | - I guess HTM8 could use another function for this?
|
| 176 | - quoted attributes
|
| 177 | - whitespace is untouched
|
| 178 |
|
| 179 | ## Related
|
| 180 |
|
| 181 | - [stream-table-process.html](stream-table-process.html)
|
| 182 | - [ysh-doc-processing.html](ysh-doc-processing.html)
|
| 183 |
|
| 184 | ## Notes
|
| 185 |
|
| 186 | ### RelaxNG, XSD, DTD
|
| 187 |
|
| 188 | I didn't know there were these 3 schema types!
|
| 189 |
|
| 190 | - DTD is older, associated with SGML created in 1986
|
| 191 | - XML Schema and Relax NG created in 2001
|
| 192 | - XML Schema use XML syntax, which is bad!
|
| 193 |
|
| 194 |
|
| 195 | ### Algorithms?
|
| 196 |
|
| 197 | - I looked at `jq`
|
| 198 | - how do you do CSS selectors?
|
| 199 | - how do you do JSONPath?
|
| 200 |
|
| 201 | - XML Path
|
| 202 | - holistic twig joins - bounded memory
|
| 203 | - Hollandar Marx XPath Streaming
|
| 204 |
|
| 205 |
|
| 206 | ### Naming
|
| 207 |
|
| 208 | - HTM8 doesn't use J8 strings
|
| 209 | - but TSV8 does
|
| 210 |
|
| 211 | - Technically we could add j8 strings with
|
| 212 | - j''
|
| 213 | - and even templated strings with $"" ?
|
| 214 | - hm
|
| 215 | - well then we would need $[ j'' ] and so forth
|
| 216 |
|
| 217 | Is
|
| 218 |
|
| 219 | - `<span x=j'foo'>` identical to `<span x="j'foo'">` in HTML5 ?
|
| 220 | - it seems do
|
| 221 | - ditto for `$""`
|
| 222 | - then we could disallow those pattern in double quotes?
|
| 223 | - they would have to be quoted like &sq; or something
|