doc/proc-func.md

OILS / doc / proc-func.md View on Github | oilshell.org

923 lines, 622 significant

1	---
2	default_highlighter: oils-sh
3	---
4
5	Guide to Procs and Funcs
6	========================
7
8	YSH has two major units of code: shell-like `proc`, and Python-like `func`.
9
10	- Roughly speaking, procs are for commands and I/O, while funcs are for
11	pure computation.
12	- Procs are often big, and may call small funcs. On the other hand,
13	it's possible, but rarer, for funcs to call procs.
14	- You can write shell scripts mostly with procs, and perhaps a few funcs.
15
16	This doc compares the two mechanisms, and gives rough guidelines.
17
18	<!--
19	See the blog for more conceptual background: [Oils is
20	Exterior-First](https://www.oilshell.org/blog/2023/06/ysh-design.html).
21	-->
22
23	<div id="toc">
24	</div>
25
26	## Tip: Start Simple
27
28	Before going into detail, here's a quick reminder that you don't have to use
29	either procs or funcs. YSH is a language that scales both down and up.
30
31	You can start with just a list of plain commands:
32
33	mkdir -p /tmp/dest
34	cp --verbose *.txt /tmp/dest
35
36	Then copy those into procs as the script gets bigger:
37
38	proc build-app {
39	ninja --verbose
40	}
41
42	proc deploy {
43	mkdir -p /tmp/dest
44	cp --verbose *.txt /tmp/dest
45	}
46
47	build-app
48	deploy
49
50	Then add funcs if you need pure computation:
51
52	func isTestFile(name) {
53	return (name => endsWith('._test.py'))
54	}
55
56	if (isTestFile('my_test.py')) {
57	echo 'yes'
58	}
59
60	## At a Glance
61
62	### Procs vs. Funcs
63
64	This table summarizes the difference between procs and funcs. The rest of the
65	doc will elaborate on these issues.
66
67	<style>
68	thead {
69	background-color: #eee;
70	font-weight: bold;
71	}
72	table {
73	font-family: sans-serif;
74	border-collapse: collapse;
75	}
76
77	tr {
78	border-bottom: solid 1px;
79	border-color: #ddd;
80	}
81
82	td {
83	padding: 8px; /* override default of 5px */
84	}
85	</style>
86
87	<table>
88	<thead>
89	<tr>
90	<td></td>
91	<td>Proc</td>
92	<td>Func</td>
93	</tr>
94	</thead>
95
96	<tr>
97	<td>Design Influence</td>
98	<td>
99
100	Shell-like.
101
102	</td>
103	<td>
104
105	Python- and JavaScript-like, but pure.
106
107	</td>
108	</tr>
109
110	<tr>
111	<td>Shape</td>
112
113	<td>
114
115	Procs are shaped like Unix processes: with `argv`, an integer return code, and
116	`stdin` / `stdout` streams.
117
118	They're a generalization of Bourne shell "functions".
119
120	</td>
121	<td>
122
123	Funcs are shaped like mathematical functions.
124
125	</td>
126	</tr>
127
128	<tr>
129	<td>
130
131	Architectural Role ([Oils is Exterior First](https://www.oilshell.org/blog/2023/06/ysh-design.html))
132
133	</td>
134	<td>
135
136	Exterior: processes and files.
137
138	</td>
139
140	<td>
141
142	Interior: functions and garbage-collected data structures.
143
144	</td>
145	</tr>
146
147	<tr>
148	<td>I/O</td>
149	<td>
150
151	Procs may start external processes and pipelines. Can perform I/O anywhere.
152
153	</td>
154	<td>
155
156	Funcs need an explicit `io` param to perform I/O.
157
158	</td>
159	</tr>
160
161	<tr>
162	<td>Example Definition</td>
163	<td>
164
165	proc print-max (; x, y) {
166	echo $[x if x > y else y]
167	}
168
169	</td>
170	<td>
171
172	func computeMax(x, y) {
173	return (x if x > y else y)
174	}
175
176	</td>
177	</tr>
178
179	<tr>
180	<td>Example Call</td>
181	<td>
182
183	print-max (3, 4)
184
185	Procs can be put in pipelines:
186
187	print-max (3, 4) \| tee out.txt
188
189	</td>
190	<td>
191
192	var m = computeMax(3, 4)
193
194	Or throw away the return value, which is useful for functions that mutate:
195
196	call computeMax(3, 4)
197
198	</td>
199	</tr>
200
201	<tr>
202	<td>Naming Convention</td>
203	<td>
204
205	`kebab-case`
206
207	</td>
208	<td>
209
210	`camelCase`
211
212	</td>
213	</tr>
214
215	<tr>
216	<td>
217
218	[Syntax Mode](command-vs-expression-mode.html) of call site
219
220	</td>
221	<td>Command Mode</td>
222	<td>Expression Mode</td>
223	</tr>
224
225	<tr>
226	<td>Kinds of Parameters / Arguments</td>
227	<td>
228
229	1. Word aka string
230	1. Typed and Positional
231	1. Typed and Named
232	1. Block
233
234	Examples shown below.
235
236	</td>
237	<td>
238
239	1. Positional
240	1. Named
241
242	(both typed)
243
244	</td>
245	</tr>
246
247	<tr>
248	<td>Return Value</td>
249	<td>Integer status 0-255</td>
250	<td>
251
252	Any type of value, e.g.
253
254	return ([42, {name: 'bob'}])
255
256	</td>
257	</tr>
258	<tr>
259	<td>Relation to Objects</td>
260	<td>none</td>
261	<td>
262
263	May be bound to objects:
264
265	var x = obj.myMethod()
266	call obj->myMutatingMethod()
267
268	</td>
269	</tr>
270
271	<tr>
272	<td>Interface Evolution</td>
273	<td>
274
275	Slower: Procs exposed to the outside world may need to evolve in a compatible or "versionless" way.
276
277	</td>
278	<td>
279
280	Faster: Funcs may be refactored internally.
281
282	</td>
283	</tr>
284
285	<tr>
286	<td>Parallelism?</td>
287	<td>
288
289	Procs can be parallel with:
290
291	- shell constructs: pipelines, `&` aka `fork`
292	- external tools and the [$0 Dispatch
293	Pattern](https://www.oilshell.org/blog/2021/08/xargs.html): xargs, make,
294	Ninja, etc.
295
296	</td>
297	<td>
298
299	Funcs are inherently serial, unless wrapped in a proc.
300
301	</td>
302	</tr>
303
304	<tr>
305	<td colspan=3 style="text-align: center; padding: 3em">More <code>proc</code> features ...</td>
306	</tr>
307
308	<tr>
309	<td>Kinds of Signature</td>
310	<td>
311
312	Open `proc p {` or <br/>
313	Closed `proc p () {`
314
315	</td>
316	<td>-</td>
317	</tr>
318
319	<tr>
320	<td>Lazy Args</td>
321	<td>
322
323	assert [42 === x]
324
325	</td>
326	<td>-</td>
327	</tr>
328
329	</table>
330
331	### Func Calls and Defs
332
333	Now that we've compared procs and funcs, let's look more closely at funcs.
334	They're inherently simpler: they have 2 types of args and params, rather
335	than 4.
336
337	YSH argument binding is based on Julia, which has all the power of Python, but
338	without the "evolved warts" (e.g. `/` and `*`).
339
340	In general, with all the bells and whistles, func definitions look like:
341
342	# pos args and named args separated with ;
343	func f(p1, p2, ...rest_pos; n1=42, n2='foo', ...rest_named) {
344	return (len(rest_pos) + len(rest_named))
345	}
346
347	Func calls look like:
348
349	# spread operator ... at call site
350	var pos_args = [3, 4]
351	var named_args = {foo: 'bar'}
352	var x = f(1, 2, ...pos_args; n1=43, ...named_args)
353
354	Note that positional args/params and named args/params can be thought of as two
355	"separate worlds".
356
357	This table shows simpler, more common cases.
358
359
360	<table>
361	<thead>
362	<tr>
363	<td>Args / Params</td>
364	<td>Call Site</td>
365	<td>Definition</td>
366	</tr>
367	</thead>
368
369	<tr>
370	<td>Positional Args</td>
371	<td>
372
373	var x = myMax(3, 4)
374
375	</td>
376	<td>
377
378	func myMax(x, y) {
379	return (x if x > y else y)
380	}
381
382	</td>
383	</tr>
384
385	<tr>
386	<td>Spread Pos Args</td>
387	<td>
388
389	var args = [3, 4]
390	var x = myMax(...args)
391
392	</td>
393	<td>
394
395	(as above)
396
397	</td>
398	</tr>
399
400	<tr>
401	<td>Rest Pos Params</td>
402	<td>
403
404	var x = myPrintf("%s is %d", 'bob', 30)
405
406	</td>
407	<td>
408
409	func myPrintf(fmt, ...args) {
410	# ...
411	}
412
413	</td>
414	</tr>
415
416	<tr>
417	<td colspan=3 style="text-align: center; padding: 3em">...</td>
418	</tr>
419
420	</td>
421	</tr>
422
423	<tr>
424	<td>Named Args</td>
425	<td>
426
427	var x = mySum(3, 4, start=5)
428
429	</td>
430	<td>
431
432	func mySum(x, y; start=0) {
433	return (x + y + start)
434	}
435
436	</td>
437	</tr>
438
439	<tr>
440	<td>Spread Named Args</td>
441	<td>
442
443	var opts = {start: 5}
444	var x = mySum(3, 4, ...opts)
445
446	</td>
447	<td>
448
449	(as above)
450
451	</td>
452	</tr>
453
454	<tr>
455	<td>Rest Named Params</td>
456	<td>
457
458	var x = f(start=5, end=7)
459
460	</td>
461	<td>
462
463	func f(; ...opts) {
464	if ('start' not in opts) {
465	setvar opts.start = 0
466	}
467	# ...
468	}
469
470	</td>
471	</tr>
472
473	</table>
474
475	### Proc Calls and Defs
476
477	Like funcs, procs have 2 kinds of typed args/params: positional and named.
478
479	But they may also have string aka word args/params, and a block
480	arg/param.
481
482	In general, a proc signature has 4 sections, like this:
483
484	proc p (
485	w1, w2, ...rest_word; # word params
486	p1, p2, ...rest_pos; # pos params
487	n1, n2, ...rest_named; # named params
488	block # block param
489	) {
490	echo 'body'
491	}
492
493	In general, a proc call looks like this:
494
495	var pos_args = [3, 4]
496	var named_args = {foo: 'bar'}
497
498	p /bin /tmp (1, 2, ...pos_args; n1=43, ...named_args) {
499	echo 'block'
500	}
501
502	The block can also be passed as an expression after a second semicolon:
503
504	p /bin /tmp (1, 2, ...pos_args; n1=43, ...named_args; block)
505
506	<!--
507	- Block is really last positional arg: `cd /tmp { echo $PWD }`
508	-->
509
510	Some simpler examples:
511
512	<table>
513	<thead>
514	<tr>
515	<td>Args / Params</td>
516	<td>Call Site</td>
517	<td>Definition</td>
518	</tr>
519	</thead>
520
521	<tr>
522	<td>Word args</td>
523	<td>
524
525	my-cd /tmp
526
527	</td>
528	<td>
529
530	proc my-cd (dest) {
531	cd $dest
532	}
533
534	</td>
535	</tr>
536
537	<tr>
538	<td>Rest Word Params</td>
539	<td>
540
541	my-cd -L /tmp
542
543	</td>
544	<td>
545
546	proc my-cd (...flags) {
547	cd @flags
548	}
549
550	<tr>
551	<td>Spread Word Args</td>
552	<td>
553
554	var flags = :\| -L /tmp \|
555	my-cd @flags
556
557	</td>
558	<td>
559
560	(as above)
561
562	</td>
563	</tr>
564
565	</td>
566	</tr>
567
568	<tr>
569	<td colspan=3 style="text-align: center; padding: 3em">...</td>
570	</tr>
571
572	<tr>
573	<td>Typed Pos Arg</td>
574	<td>
575
576	print-max (3, 4)
577
578	</td>
579	<td>
580
581	proc print-max ( ; x, y) {
582	echo $[x if x > y else y]
583	}
584
585	</td>
586	</tr>
587
588	<tr>
589	<td>Typed Named Arg</td>
590	<td>
591
592	print-max (3, 4, start=5)
593
594	</td>
595	<td>
596
597	proc print-max ( ; x, y; start=0) {
598	# ...
599	}
600
601	</td>
602	</tr>
603
604	<tr>
605	<td colspan=3 style="text-align: center; padding: 3em">...</td>
606	</tr>
607
608
609
610	<tr>
611	<td>Block Argument</td>
612	<td>
613
614	my-cd /tmp {
615	echo $PWD
616	echo hi
617	}
618
619	</td>
620	<td>
621
622	proc my-cd (dest; ; ; block) {
623	cd $dest (; ; block)
624	}
625
626	</td>
627	</tr>
628
629	<tr>
630	<td>All Four Kinds</td>
631	<td>
632
633	p 'word' (42, verbose=true) {
634	echo $PWD
635	echo hi
636	}
637
638	</td>
639	<td>
640
641	proc p (w; myint; verbose=false; block) {
642	= w
643	= myint
644	= verbose
645	= block
646	}
647
648	</td>
649	</tr>
650
651	</table>
652
653	## Common Features
654
655	Let's recap the common features of procs and funcs.
656
657	### Spread Args, Rest Params
658
659	- Spread arg list `...` at call site
660	- Rest params `...` at definition
661
662	### The `error` builtin raises exceptions
663
664	The `error` builtin is idiomatic in both funcs and procs:
665
666	func f(x) {
667	if (x <= 0) {
668	error 'Should be positive' (status=99)
669	}
670	}
671
672	Tip: reserve such errors for exceptional situations. For example, an input
673	string being invalid may not be uncommon, while a disk full I/O error is more
674	exceptional.
675
676	(The `error` builtin is implemented with C++ exceptions, which are slow in the
677	error case.)
678
679	### Out Params: `&myvar` is of type `value.Place`
680
681	Out params are more common in procs, because they don't have a typed return
682	value.
683
684	proc p ( ; out) {
685	call out->setValue(42)
686	}
687	var x
688	p (&x)
689	echo "x set to $x" # => x set to 42
690
691	But they can also be used in funcs:
692
693	func f (out) {
694	call out->setValue(42)
695	}
696	var x
697	call f(&x)
698	echo "x set to $x" # => x set to 42
699
700	Observation: procs can do everything funcs can. But you may want the purity
701	and familiar syntax of a `func`.
702
703	---
704
705	Design note: out params are a nicer way of doing what bash does with `declare
706	-n` aka `nameref` variables. They don't rely on [dynamic
707	scope]($xref:dynamic-scope).
708
709	## Proc-Only Features
710
711	Procs have some features that funcs don't have.
712
713	### Lazy Arg Lists `where [x > 10]`
714
715	A lazy arg list is implemented with `shopt --set parse_bracket`, and is syntax
716	sugar for an unevaluated `value.Expr`.
717
718	Longhand:
719
720	var my_expr = ^[42 === x] # value of type Expr
721	assert (myexpr)
722
723	Shorthand:
724
725	assert [42 === x] # equivalent to the above
726
727	### Open Proc Signatures bind `argv`
728
729	TODO: Implement new `ARGV` semantics.
730
731	When a proc signature omits `()`, it's called "open" because the caller can
732	pass "extra" arguments:
733
734	proc my-open {
735	write 'args are' @ARGV
736	}
737	# All valid:
738	my-open
739	my-open 1
740	my-open 1 2
741
742	Stricter closed procs:
743
744	proc my-closed (x) {
745	write 'arg is' $x
746	}
747	my-closed # runtime error: missing argument
748	my-closed 1 # valid
749	my-closed 1 2 # runtime error: too many arguments
750
751
752	An "open" proc is nearly is nearly identical to a shell function:
753
754	shfunc() {
755	write 'args are' @ARGV
756	}
757
758	## Methods are Funcs Bound to Objects
759
760	Values of type `Obj` have an ordered set of name-value bindings, as well as a
761	prototype chain of more `Obj` instances ("parents"). They support these
762	operators:
763
764	- dot (`.`) looks for attributes or methods with a given name.
765	- Reference: [ysh-attr](ref/chap-expr-lang.html#ysh-attr)
766	- Attributes may be in the object, or up the chain. They are returned
767	literally.
768	- Methods live up the chain. They are returned as `BoundFunc`, so that the
769	first `self` argument of a method call is the object itself.
770	- Thin arrow (`->`) looks for mutating methods, which have an `M/` prefix.
771	- Reference: [thin-arrow](ref/chap-expr-lang.html#thin-arrow)
772
773	## Usage Notes
774
775	### 3 Ways to Return a Value
776
777	Let's review the recommended ways to "return" a value:
778
779	1. `return (x)` in a `func`.
780	- The parentheses are required because expressions like `(x + 1)` should
781	look different than words.
782	1. Pass a `value.Place` instance to a proc or func.
783	- That is, out param `&out`.
784	1. Print to stdout in a `proc`
785	- Capture it with command sub: `$(myproc)`
786	- Or with `read`: `myproc \| read --all; echo $_reply`
787
788	Obsolete ways of "returning":
789
790	1. Using `declare -n` aka `nameref` variables in bash.
791	1. Relying on [dynamic scope]($xref:dynamic-scope) in POSIX shell.
792
793	### Procs Compose in Pipelines / "Bernstein Chaining"
794
795	Some YSH users may tend toward funcs because they're more familiar. But shell
796	composition with procs is very powerful!
797
798	They have at least two kinds of composition that funcs don't have.
799
800	See #[shell-the-good-parts]($blog-tag):
801
802	1. [Shell Has a Forth-Like
803	Quality](https://www.oilshell.org/blog/2017/01/13.html) - Bernstein
804	chaining.
805	1. [Pipelines Support Vectorized, Point-Free, and Imperative
806	Style](https://www.oilshell.org/blog/2017/01/15.html) - the shell can
807	transparently run procs as elements of pipelines.
808
809	<!--
810
811	In summary:
812
813	* func signatures look like JavaScript, Julia, and Go.
814	* named and positional are separated with `;` in the signature.
815	* The prefix `...` "spread" operator takes the place of Python's `args` and `*kwargs`.
816	* There are optional type annotations
817	* procs are like shell functions
818	* but they also allow you to name parameters, and throw errors if the arity
819	is wrong.
820	* and they take blocks.
821
822	-->
823
824	## Summary
825
826	YSH is influenced by both shell and Python, so it has both procs and funcs.
827
828	Many programmers will gravitate towards funcs because they're familiar, but
829	procs are more powerful and shell-like.
830
831	Make your YSH programs by learning to use procs!
832
833	## Appendix
834
835	### Implementation Details
836
837	procs vs. funcs both have these concerns:
838
839	1. Evaluation of default args at definition time.
840	1. Evaluation of actual args at the call site.
841	1. Arg-Param binding for builtin functions, e.g. with `typed_args.Reader`.
842	1. Arg-Param binding for user-defined functions.
843
844	So the implementation can be thought of as a 2 × 4 matrix, with some
845	code shared. This code is mostly in [ysh/func_proc.py]($oils-src).
846
847	### Related
848
849	- [Variable Declaration, Mutation, and Scope](variables.html) - in particular,
850	procs don't have [dynamic scope]($xref:dynamic-scope).
851	- [Block Literals](block-literals.html) (in progress)
852
853	<!--
854	TODO: any reference topics?
855	-->
856
857	<!--
858	OK we're getting close here -- #language-design>Unifying Proc and Func Params
859
860	I think we need to write a quick guide first, not a reference
861
862
863	It might have some tables
864
865	It might mention concerete use cases like the flag parser -- #oil-dev>Progress on argparse
866
867
868	### Diff-based explanation
869
870	- why not Python -- because of `/` and `*` special cases
871	- Julia influence
872	- lazy args for procs `where` filters and `awk`
873	- out Ref parameters are for "returning" without printing to stdout
874
875	#language-design>N ways to "return" a value
876
877
878	- What does shell have?
879	- it has blocks, e.g. with redirects
880	- it has functions without params -- only named params
881
882
883	- Ruby influence -- rich DSLs
884
885
886	So I think you can say we're a mix of
887
888	- shell
889	- Python
890	- Julia (mostly subsumes Python?)
891	- Ruby
892
893
894	### Implemented-based explanation
895
896	- ASDL schemas -- #oil-dev>Good Proc/Func refactoring
897
898
899	### Big Idea: procs are for I/O, funcs are for computation
900
901	We may want to go full in on this idea with #language-design>func evaluator without redirects and $?
902
903
904	### Very Basic Advice, Up Front
905
906
907	Done with #language-design>value.Place, & operator, read builtin
908
909	Place works with both func and proc
910
911
912	### Bump
913
914	I think this might go in the backlog - #blog-ideas
915
916
917	#language-design>Simplify proc param passing?
918
919	-->
920
921
922
923	<!-- vim sw=2 -->