en-usCopyright 2010, Paul Mineiropaul-erlanganswers@mineiro.com (Paul Mineiro)paul-erlanganswers@mineiro.com (Paul Mineiro)Thu, 17 Sep 2009 17:34:59 GMTThu, 17 Sep 2009 17:34:59 GMT360Erlang Answers - A Companion to Erlang QuestionsColor commentary for the Erlang Questions mailing list.http://erlanganswers.com/SharingDataStructures.htmlhttp://erlanganswers.com/web/mcedemo/SharingDataStructures.html<h2>Sharing Data Structures</h2> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#convertinganexistingterm">Converting an Existing Term</a></li> <li><a href="#sendingaterminamessage">Sending a Term in a Message</a></li> </ul> <h3>Introduction<a name="introduction"></a></h3> <p>On the list recently, the question of <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:46431:200909:ddmklefkopfbljjiddop">ensuring maximal sharing</a> of an Erlang term came up.</p> <blockquote> <pre>I've run into several cases where enforcing the sharing of data<br />resulted in a significant memory savings. I'm talking about a<br />reduction in heap size from 60MB to under half that. By "enforcing the<br />sharing of data" I mean making sure that identical elements in a data<br />structure are actually referencing the same locations in memory.<br /><br />This is easy to do in Erlang, because the compiler is very literal:<br /><br /> fix_tuple({H, H}) -&gt; {H, H};<br /> ...<br /><br />That ensures that identical looking elements in the tuple are sharing<br />memory locations. But there is absolutely no reason the compiler has<br />to do this. It would be perfectly valid to optimize away the entire<br />function, just returning the original value.<br /><br />Would any existing standard library functions make this nicer? What I<br />really want is to have a gb_trees:insert function that returns<br />{NewTree, InsertedValue} where InsertedValue references existing data<br />(unless it wasn't already in the tree; in that case, InsertedValue is<br />exactly what I passed in). Then I can happily use InsertedValue,<br />knowing data is being shared.<br /><br />James<br /></pre> </blockquote> <p>There were alot of responses: it's a very interesting thread.</p> <h3>Converting an Existing Term<a name="convertinganexistingterm"></a></h3> <p>The task here is to take an existing Erlang term and convert it into something which shares components, but is semantically equivalent (i.e., would compare equal via =:=).&nbsp; Bjorn offered a <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:46481:200909:ddmklefkopfbljjiddop">key insight</a></p> <blockquote> <pre>In Wings3D (which is an application that depends heavily on sharing),<br />I use a gb tree to keep track of values that I want to share.<br />I enter each term as both the key and the value and then I just do<br />gb trees:get/2 to retrieve the shared value.<br /></pre> </blockquote> <p>Using this we can write a function to convert a term to a shared form.</p> <pre>share (Term) -&gt; element (1, share (Term, dict:new ())). share (Term, Dict) -&gt; case dict:find (Term, Dict) of { ok, OrigTerm } -&gt; { OrigTerm, Dict }; error -&gt; { NewTerm, NewDict } = share_copy (Term, Dict), { NewTerm, dict:store (NewTerm, NewTerm, NewDict) } end. share_copy ([ H | T ], Dict) -&gt; { NewH, NewDict } = share (H, Dict), { NewT, NewNewDict } = share (T, NewDict), { [ NewH | NewT ], NewNewDict }; share_copy (Term, Dict) when is_tuple (Term) -&gt; share_copy (Term, Dict, size (Term)); share_copy (Term, Dict) -&gt; { Term, Dict }. share_copy (Tuple, Dict, N) when N &gt; 0 -&gt; { NewElement, NewDict } = share (element (N, Tuple), Dict), share_copy (setelement (N, Tuple, NewElement), NewDict, N - 1); share_copy (Tuple, Dict, _) -&gt; { Tuple, Dict }. </pre> <p>Basically we traverse the collection types (list, tuple) and share the components; and the sharing is done via a dict in the manner Bjorn outlines.</p> <p>Note that for all Term, Term = share (Term) should be true, which makes this routine easy to test with <a href="http://en.wikipedia.org/wiki/QuickCheck">QuickCheck</a>.</p> <h3>Sending a Term in a Message<a name="sendingaterminamessage"></a></h3> <p>Even if a Term has a high degree of sharing within itself, this sharing is not currently preserved when the Term is sent in an Erlang message.&nbsp; Although one could send the Term and then postprocess the message with share/1 from above, this could lead to a large intermediate value.&nbsp; Therefore the strategy for sharing a term when sending in a message involves encoding the Term into a structure which has a high degree of sharing and which can be decoded into something equivalent to Term but with a high degree of sharing.</p> <p>Tony Rogvall also <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:46448:200909:ddmklefkopfbljjiddop">posted a solution</a> to this problem.</p> <p>The encode is similar to the share routine, but replaces every Term with an index into a term table.</p> <pre>encode (Term) -&gt; { _, _, _, Codec } = encode (Term, dict:new (), dict:new ()), list_to_tuple ([ X || { _, X } &lt;- lists:sort (dict:to_list (Codec)) ]). encode (Term, Dict, Codec) -&gt; case dict:find (Term, Dict) of { ok, { Ref, OrigTerm } } -&gt; { Ref, OrigTerm, Dict, Codec }; error -&gt; { NewTerm, NewDict, NewCodec } = encode_copy (Term, Dict, Codec), Ref = 1 + dict:size (NewDict), { Ref, NewTerm, dict:store (Term, { Ref, NewTerm }, NewDict), dict:store (Ref, NewTerm, NewCodec) } end. encode_copy ([ H | T ], Dict, Codec) -&gt; { HRef, _, NewDict, NewCodec } = encode (H, Dict, Codec), { TRef, _, NewNewDict, NewNewCodec } = encode (T, NewDict, NewCodec), { [ HRef | TRef ], NewNewDict, NewNewCodec }; encode_copy (Term, Dict, Codec) when is_tuple (Term) -&gt; { ListTerm, NewDict, NewCodec } = encode_copy (tuple_to_list (Term), Dict, Codec), { { t, ListTerm }, NewDict, NewCodec }; encode_copy (Term, Dict, Codec) when is_integer (Term) -&gt; { { i, Term }, Dict, Codec }; encode_copy (Term, Dict, Codec) -&gt; { Term, Dict, Codec }. </pre> <p>The decode function starts with the last element of the term table (which is the term to be decoded), and expands each occurrence of a reference to the term table.</p> <pre>decode (Codec) -&gt; decode (element (size (Codec), Codec), Codec). decode ([ HRef | TRef ], Codec) when is_integer (HRef), is_integer (TRef) -&gt; [ decode (HRef, Codec) | decode (TRef, Codec) ]; decode ({ t, Encoded }, Codec) -&gt; list_to_tuple (decode (Encoded, Codec)); decode ({ i, Literal }, _Codec) when is_integer (Literal) -&gt; Literal; decode (Ref, Codec) when is_integer (Ref) -&gt; decode (element (Ref, Codec), Codec); decode (Literal, _Codec) -&gt; Literal. </pre> <p>Here what some terms look like encoded:</p> <pre>11&gt; hashcons:encode (lists:duplicate (5, wazzup)). {wazzup,[],[1|2],[1|3],[1|4],[1|5],[1|6]} 12&gt; hashcons:encode ([ a, [ a, b ], [ a, b, c ] ]). {a,b,[],[2|3],[1|4],c,[6|3],[2|7],[1|8],[9|3],[5|10],[1|11]} </pre> <p>Again for all Term, Term = decode (encode (Term)) should be true, which makes this routine easy to test with <a href="http://en.wikipedia.org/wiki/QuickCheck">QuickCheck</a>.</p>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/SharingDataStructures.htmlThu, 17 Sep 2009 17:34:59 GMTmnesia/OrderedBy.htmlhttp://erlanganswers.com/web/mcedemo/mnesia/OrderedBy.html<h2>Mnesia Ordered By Queries</h2> <p>Table of Contents</p> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#inefficientgeneralsolution">Inefficient General Solution</a></li> <li><a href="#efficientsolutionprimarykey">Efficient Solution: Primary Key</a></li> <li><a href="#efficientgeneralsolutionsecondarysortedindex">Efficient General Solution: Secondary Sorted Index</a></li> </ul> <h3><a name="introduction"></a>Introduction</h3> <p>Mnesia is not a relational database engine; it's better described as a distributed transactional key-value store. Relational databases do not have their feature set by accident, however, so periodically somebody <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:44669:200906:jnmdbifiiccfmmaojkco">asks the list about range queries</a>:</p> <blockquote>Hi,<br /><br />What's the simplest, most efficient way of expressing the following<br />sql query in mnesia:<br /><br />"SELECT * FROM foo ORDER BY creation_date LIMIT 500,10"<br /><br />?<br /><br />Btw, I don't care about transactions.<br /><br />Thanks,<br />Yariv<br /></blockquote> <p>Asking for simultaneous simplicity and efficiency is certainly optimistic.</p> <p>As with many complicated Mnesia problems, the solution involves doing manually what relational databases do automatically behind the scenes.&nbsp; Thus, if your application mostly consists of complicated relational access patterns, you should consider another database engine rather than Mnesia.</p> <h3><a name="inefficientgeneralsolution"></a>Inefficient General Solution</h3> <p>An inefficient general solution consists of doing a full table scan, accumulating the records into a tree, and reading out the result at the end of the scan.&nbsp; This is what a relational database engine will do for you behind the scenes assuming you have not indexed the column being ordered.&nbsp; If the number of records in the table is not that large and the query infrequent this can be an acceptable solution.</p> <p>The <span class="codesnippet">mnesia:select/4</span> function can be used to extract records from a table according to a <a href="http://erlang.org/doc/apps/erts/match_spec.html">match specification</a>.&nbsp; Match specifications are quite general; in this case, because all records are eligible (i.e., no WHERE clauses in the SQL) we will just select all records, but we will select 10 at a time to prevent exhausting available memory.</p> <p class="codesnippet">-module (testsort).<br />-export ([ limit/4 ]).<br /><br />limit (Tab, Field, Offset, Number) -&gt;<br />&nbsp; limit (get_field_number (Tab, Field),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Offset,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Number,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mnesia:select (Tab,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ { mnesia:table_info (Tab, wild_pattern), [], [ '$_' ] } ],<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 10,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; read),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gb_sets:empty ()).<br /><br />get_offset ([ Field | _ ], Field, N) -&gt; N;<br />get_offset ([ _ | T ], Field, N) -&gt; get_offset (T, Field, N + 1).<br /><br />get_field_number (Tab, Field) -&gt;<br />&nbsp; get_offset (mnesia:table_info (Tab, attributes), Field, 2).<br /><br />limit (_FieldNumber, Offset, Number, '$end_of_table', Tree) -&gt;<br />&nbsp; Candidates = [ Record || { _, Record } &lt;- gb_sets:to_list (Tree) ],<br />&nbsp; { _, Rest } = safe_split (Offset, Candidates),<br />&nbsp; { Result, _ } = safe_split (Number, Rest),<br />&nbsp; Result;<br />limit (FieldNumber, Offset, Number, { Results, Cont }, Tree) -&gt;<br />&nbsp; NewTree =<br />&nbsp;&nbsp;&nbsp; lists:foldl (fun (Record, AccTree) -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Key = { element (FieldNumber, Record), Record },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; gb_sets:add (Key, AccTree)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Tree,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Results),<br /><br />&nbsp; PrunedTree = prune_tree (NewTree, Offset + Number),<br /><br />&nbsp; limit (FieldNumber,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Offset,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Number,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mnesia:select (Cont),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; PrunedTree).<br /><br />prune_tree (Tree, Max) -&gt;<br />&nbsp; case gb_sets:size (Tree) &gt; Max of<br />&nbsp;&nbsp;&nbsp; true -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { _, NewTree } = gb_sets:take_largest (Tree),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; prune_tree (NewTree, Max);<br />&nbsp;&nbsp;&nbsp; false -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Tree<br />&nbsp; end.<br /><br />safe_split (N, L) when length (L) &gt;= N -&gt;<br />&nbsp; lists:split (N, L);<br />safe_split (_N, L) -&gt;<br />&nbsp; { L, [] }.</p> <p>How it works:</p> <ul> <li><span class="codesnippet">limit/4</span> is the entry point.&nbsp; The SQL query in the original post would correspond to <span class="codesnippet">testsort:limit (foo, creation_date, 500, 10)</span>.</li> <li><span class="codesnippet">get_field_number/2</span> computes the column of the specified field in the specified table.</li> <li>the initial <span class="codesnippet">mnesia:select/4</span> call says to select all records all columns 10 records at a time.&nbsp; depending upon your record size you can get a favorable space-time tradeoff by increasing the number of records fetched per select.</li> <li>the select results are inserted into a tree.&nbsp; the key for the tree is a tuple whose first element is the field being sorted, so that the keys are in the desired sort order (the comparison function here is Erlang term order).</li> <li>via <span class="codesnippet">prune_tree/2</span>, the temporary tree is sized so that it contains at most <span class="codesnippet">(Offset + Number)</span> entries.</li> <li>at the end of the full table scan, the first <span class="codesnippet">Offset</span> results are discarded and then the next <span class="codesnippet">Number</span> results are returned.</li> </ul> <h3><a name="efficientsolutionprimarykey"></a>Efficient Solution: Primary Key</h3> <p>Efficiency can be significantly improved if the column in question is either the primary key or a prefix of the primary key, and the table type is ordered_set.&nbsp; Relational database engines recognize this condition for you automatically during query planning and choose an efficient method for computing the result similar to what we shall do.</p> <p>We will again use <span class="codesnippet">mnesia:select/4</span>, but this time to seek to the first desired result, and then we will read the answer directly.&nbsp; If the record is defined as</p> <p class="codesnippet">-record (foo, { creation_date, something, something_else }).</p> <p>then creation_date is the primary key, so we will select in table order, as in this code listing.</p> <p class="codesnippet">-module (testsort2).<br />-export ([ limit/3 ]).<br /><br />limit (Tab, Offset, Number) -&gt;<br />&nbsp; seek (Offset,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Number,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mnesia:select (Tab,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ { mnesia:table_info (Tab, wild_pattern), [], [ '$_' ] } ],<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 10,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; read)).<br /><br />seek (_Offset, _Number, '$end_of_table') -&gt;<br />&nbsp; [];<br />seek (Offset, Number, X) when Offset =&lt; 0 -&gt;<br />&nbsp; read (Number, X, []);<br />seek (Offset, Number, { Results, Cont }) -&gt;<br />&nbsp; NumResults = length (Results),<br />&nbsp; case Offset &gt; NumResults of<br />&nbsp;&nbsp;&nbsp; true -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; seek (Offset - NumResults, Number, mnesia:select (Cont));<br />&nbsp;&nbsp;&nbsp; false -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { _, DontDrop } = lists:split (Offset, Results),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Keep = lists:sublist (DontDrop, Number),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; read (Number - length (Keep), mnesia:select (Cont), [ Keep ])<br />&nbsp; end.<br /><br />read (Number, _, Acc) when Number =&lt; 0 -&gt;<br />&nbsp; lists:foldl (fun erlang:'++'/2, [], Acc);<br />read (_Number, '$end_of_table', Acc) -&gt;<br />&nbsp; lists:foldl (fun erlang:'++'/2, [], Acc);<br />read (Number, { Results, Cont }, Acc) -&gt;<br />&nbsp; NumResults = length (Results),<br />&nbsp; case Number &gt; NumResults of<br />&nbsp;&nbsp;&nbsp; true -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; read (Number - NumResults, mnesia:select (Cont), [ Results | Acc ]);<br />&nbsp;&nbsp;&nbsp; false -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { Keep, _ } = lists:split (Number, Results),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; lists:foldl (fun erlang:'++'/2, Keep, Acc)<br />&nbsp; end.</p> <p>Here's how it works:</p> <ul> <li><span class="codesnippet">limit/3</span> is the entry point.&nbsp; It assumes the ordering desired is by the primary key.&nbsp; The SQL in the original post would become <span class="codesnippet">limit (foo, 500, 10)</span>.</li> <li>the initial <span class="codesnippet">mnesia:select/4</span> call says to select all records all columns 10 records at a time.&nbsp; depending upon your record size you can get a favorable space-time tradeoff by increasing the number of records fetched per select.</li> <li>the <span class="codesnippet">seek/3</span> consumes the output of the select until the desired offset is reached.</li> <li>the <span class="codesnippet">read/3</span> consumes the output of the select until the desired limit is reached.</li> </ul> <p>We can use this routine even if creation_date is only a prefix of the primary key, i.e., if it were the leading element of a tuple or list that is used as the primary key.&nbsp; That is because Erlang term order for tuples and lists will compare the first element first, and so will sort in the desired fashion.</p> <p>This solution is linear in <span class="codesnippet">(Offset + Count)</span>.&nbsp; If the equivalent SQL query is slightly modified to</p> <p class="codesnippet">SELECT * FROM foo WHERE creation_date &gt; X ORDER BY creation_date LIMIT 10</p> <p>then the initial <span class="codesnippet">mnesia:select/4</span> can be made more efficient with a guard that constrains the primary key, e.g.,</p> <p class="codesnippet">mnesia:select (foo, [ { #foo { creation_date = '$1', _ = '_' }, [ { '&gt;', '$1', X } ], [ '$_' ] } ], 10, read).</p> <p>With this <span class="codesnippet">mnesia:select/4</span> initial call, the <span class="codesnippet">seek/3</span> portion of the processing can be skipped, and only the <span class="codesnippet">read/3</span> portion need be executed.&nbsp; <strong>Update</strong>: actually it looks like ets <a href="http://dukesoferl.blogspot.com/2009/07/ets-ordered-set-select-efficiency.html">does not optimize this condition</a>, although tcerl does.&nbsp; However for an ordered_set table <span class="codesnippet">ets:next/2</span> will return the next key in the table even if the key supplied as argument is not in the table, so this could be exploited to walk an ets table starting at a particular lower bound.&nbsp; Also, if the <span class="codesnippet">Offset</span> in the limit query is resulting from continuing a previous query (e.g., paging through results in a search engine), then an alternate strategy is to serialize the select continuation, repair it (via <span class="codesnippet">ets:repair_continuation/2</span>), and reuse via <span class="codesnippet">mnesia:select/1</span>, which will seek to the next key in the table similar to <span class="codesnippet">ets:next/2</span>.&nbsp; <span class="codesnippet">mnesia:select/1</span> will outperform repeated calls to <span class="codesnippet">ets:next/2</span> as <span class="codesnippet">Count</span> increases, so generally the former is preferred.</p> <h3><a name="efficientgeneralsolutionsecondarysortedindex"></a>Efficient General Solution: Secondary Sorted Index</h3> <p>If the column being sorted is not (a prefix of) the primary key than efficient operation will require creating a secondary sorted index.&nbsp; This is done for you automatically by a relational database engine when you index the column appropriately.&nbsp; Mnesia's built-in indices are not sorted so you will have to do this by creating another table.&nbsp; For example if the record is</p> <p class="codesnippet">-record (foo, { id, stuff, creation_date }).</p> <p>then assuming that foo is of type set or ordered_set, you could make an ordered_set table with records</p> <p class="codesnippet">-record (foo_cd_index, { creation_date_id_tuple, void = [] }).</p> <p>The primary keys of foo_cd_index will be tuples of the form { creation_date, id }, and we do not care about the void value but mnesia requires at least 2 columns, so we populate with nil (which has a small external format representation).&nbsp; When we insert an element F into foo, we insert a corresponding element</p> <p class="codesnippet">#foo_cd_index{ creation_date_id_tuple = { F#foo.creation_date, F#foo.id } }</p> <p>into the index table.&nbsp; Once the secondary index has been constructed, we can use the primary key techniques from the previous section on the secondary table, extract the identifiers, and then query the primary table to extract the records (if only the list of identifiers is to be returned, the join can be skipped, yet another condition that a typical relational database engine recognizes for you under the hood).</p> <p>If foo is of type bag, we cannot make an "ordered_bag" table for the index because mnesia does not support such a type; however we can achieve something similar with records</p> <p class="codesnippet">-record (foo_cd_index, { creation_date_id_stuff, void = [] }).</p> <p>Essentially we have duplicated the record but reordered the fields.&nbsp; When we insert an element F into foo, we insert a corresponding element</p> <p class="codesnippet">#foo_cd_index{ creation_id_stuff = { F#foo.creation_date, F#foo.id, F#foo.stuff } }</p> <p>This solution is space intensive because the bag table type does not allow us to refer to an element by a subset of its attributes.&nbsp; However, once the secondary index table has been created we can use the primary key solution on the secondary table to extract the desired records directly, without having to join against the primary table.</p> <p>&nbsp;</p>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/mnesia/OrderedBy.htmlMon, 27 Jul 2009 03:56:56 GMTCounters.htmlhttp://erlanganswers.com/web/mcedemo/Counters.html<h2>Implementing a Counter in Erlang</h2> <h3>Table of Contents</h3> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#slowerbutideologicallypure">Slower but Ideologically Pure</a></li> <li><a href="#faster">Faster</a></li> </ul> <h3><a name="introduction"></a>Introduction</h3> <p>Erlang, as a functional language, strives for a <a href="http://en.wikipedia.org/wiki/Side_effect_%28computer_science%29">side-effect</a> free sequential syntax.&nbsp; Sometimes side-effects are desirable, the classic example being the <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:44518:200906:ahjmnmijameappfabjpe">collection of serving statistics</a>.</p> <blockquote>I want to add counters to my gen_tcp implementation to track total<br />number of connections made, total number of requests serviced, total<br />number of bytes transfered, etc.<br />I need to have numbers like this for production monitoring. In python<br />or java I would just have a singleton class that had these counters<br />with thread safe increment/decrement methods. How would I get this<br />same behavior using Erlang?<br /></blockquote> <h3><a name="slowerbutideologicallypure"></a>Slower but Ideologically Pure</h3> <p>Although Erlang strives for a side-effect free <em>sequential</em> syntax, it can generally handle what are normally considered side-effects via the <em>concurrent</em> syntax.&nbsp; In particular, by sending a message to a process which is in a tail-recursive loop, you can change the arguments to the tail-recursive call.&nbsp; This is analogous to changing the state of an object via a method call, but in a manner acceptable to functional programmers.&nbsp; For example, given a process executing the tail-recursive loop</p> <p class="codesnippet">loop (on) -&gt; receive _ -&gt; loop (off) end;<br />loop (off) -&gt; receive _ -&gt; loop (on) end.</p> <p>sending a message to this process would cause the argument to toggle between <span class="codesnippet">on</span> and <span class="codesnippet">off</span>.&nbsp; This technique will be the basis for our counter.</p> <p>Generally, except for pedagogical fun, it is not a good idea to create your own loops to implement persistent processes.&nbsp; Rather, you should use one of the standard OTP implementations, which at their core consist of a loop like the one above, but with many failure cases and operational capabilities built-in.&nbsp; Here, we will use gen_server.</p> <p class="codesnippet">-module (counter).<br />-export ([ new/0,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; increment/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; read/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; free/1 ]).<br />-behaviour (gen_server).<br />-export ([ init/1,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; handle_call/3,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; handle_cast/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; handle_info/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; terminate/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; code_change/3 ]).<br /><br />new () -&gt;<br />&nbsp; gen_server:start_link (?MODULE, [], []).<br /><br />increment (Counter, Keys) when is_list (Keys) -&gt;<br />&nbsp; gen_server:cast (Counter, { increment, Keys }).<br /><br />read (Counter, Keys) when is_list (Keys) -&gt;<br />&nbsp; gen_server:call (Counter, { read, Keys }).<br /><br />free (Counter) -&gt;<br />&nbsp; gen_server:cast (Counter, stop).<br /><br />init (_) -&gt;<br />&nbsp; { ok, dict:new () }.<br /><br />handle_call ({ read, Keys }, _From, State) -&gt;<br />&nbsp; { reply, [ read_key (K, State) || K &lt;- Keys ], State };<br />handle_call (_Msg, _From, State) -&gt;<br />&nbsp; { noreply, State }.<br /><br />handle_cast ({ increment, Keys }, State) -&gt;<br />&nbsp; { noreply, lists:foldl (fun increment_key/2, State, Keys) };<br />handle_cast (stop, State) -&gt;<br />&nbsp; { stop, normal, State };<br />handle_cast (_Msg, State) -&gt;<br />&nbsp; { noreply, State }.<br /><br />handle_info (_Info, State) -&gt;<br />&nbsp; { noreply, State }.<br /><br />terminate (_Reason, _State) -&gt;<br />&nbsp; ok.<br /><br />code_change (_OldVsn, State, _Extra) -&gt;<br />&nbsp; { ok, State }.<br /><br />read_key (Key, Dict) -&gt;<br />&nbsp; case dict:find (Key, Dict) of<br />&nbsp;&nbsp;&nbsp; { ok, Value } -&gt; Value;<br />&nbsp;&nbsp;&nbsp; error -&gt; 0<br />&nbsp; end.<br /><br />increment_key (Key, Dict) -&gt;<br />&nbsp; dict:store (Key, read_key (Key, Dict) + 1, Dict).</p> <p>And an example usage in the shell</p> <p class="codesnippet">% erl<br />Erlang (BEAM) emulator version 5.6.5 [source] [async-threads:0] [kernel-poll:false]<br /><br />Eshell V5.6.5&nbsp; (abort with ^G)<br />1&gt; f (), { ok, Counter } = counter:new (), A = now (), [ counter:increment (Counter, [ test ]) || _ &lt;- lists:seq (1, 100000) ], B = now (), Final = counter:read (Counter, [test]), counter:free (Counter), { timer:now_diff (B, A), Final }.<br />{1166675,[100000]}</p> <p>Here's how it works:</p> <ol> <li>A new counter is constructed via <span class="codesnippet">new/0</span>.&nbsp; This results in a spawned gen_server process which is linked to the caller. <ul> <li>Analysis of <span class="codesnippet">init/1</span> indicates a <a href="http://erlang.org/doc/man/dict.html">dictionary</a> is being used as the process state.</li> </ul> </li> <li>Counters are incremented via the <span class="codesnippet">increment/2</span> call, which takes a counter id and a list of keys whose counters are to be incremented.&nbsp; <span class="codesnippet">increment/2</span> sends a message to the counter process and then returns immediately without waiting for a reply. <ul> <li>This message is processed in <span class="codesnippet">handle_cast/2</span> which calls <span class="codesnippet">increment_key/2</span> for each key; here we can see new keys are initialized with the default value of 0.</li> </ul> </li> <li>Counters are read via the <span class="codesnippet">read/2</span> call, which takes a counter id and a list of keys whose counters are to be read.&nbsp; read/2 sends a message to the counter process and then waits until the counter process replies. <br /> <ul> <li>This message is processed in <span class="codesnippet">handle_call/3</span>.</li> </ul> </li> <li>The counter process is reclaimed via <span class="codesnippet">free/1</span>.&nbsp; It will also automatically exit if the creating process exits, due to the use of start_link.</li> </ol> <h3><a name="faster"></a>Faster</h3> <p>The above counter implementation, although simple, is analogous to how more complicated stateful computations are encapsulated in Erlang.&nbsp; Because of practical concerns such as messaging overhead, however, Erlang design strives for "small messages, large computations".&nbsp; The counter implementation above does not achieve this ideal (although, allowing for multiple keys to be incremented with a single call is a step in the right direction).</p> <p>Erlang is a practical language, and there are some elements of the sequential syntax which are not side-effect free.&nbsp; In particular ets tables provide destructive operations of a datastore which can be used to implement an efficient counter.&nbsp; In order to avoid resource leaks, ets tables have to be owned by a process, so we will continue to use gen_server to represent our ets table, but the increment and read calls will not involve messaging the process.&nbsp; The resulting API is the same as with the previous counter.</p> <p class="codesnippet">-module (counter2).<br />-export ([ new/0,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; increment/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; read/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; free/1 ]).<br />-behaviour (gen_server).<br />-export ([ init/1, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; handle_call/3,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; handle_cast/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; handle_info/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; terminate/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; code_change/3 ]).<br /><br />new () -&gt; <br />&nbsp; { ok, Pid } = gen_server:start_link (?MODULE, [], []),<br />&nbsp; Tab = gen_server:call (Pid, get_tab),<br />&nbsp; { ok, { Pid, Tab } }.<br /><br />increment ({ _, Tab }, Keys) when is_list (Keys) -&gt; <br />&nbsp; lists:foldl (fun increment_key/2, Tab, Keys),<br />&nbsp; ok.<br /><br />read ({ _, Tab }, Keys) when is_list (Keys) -&gt; <br />&nbsp; [ read_key (K, Tab) || K &lt;- Keys ].<br /><br />free ({ Pid, _ }) -&gt;<br />&nbsp; gen_server:cast (Pid, stop).<br /><br />init (_) -&gt; <br />&nbsp; Tab = ets:new (?MODULE, [ public, set ]),<br />&nbsp; % if you are using R13 or above, use this line instead<br />&nbsp; % Tab = ets:new (?MODULE, [ public, set, { write_concurrency, true } ]),<br />&nbsp; { ok, Tab }.<br /><br />handle_call (get_tab, _From, State) -&gt;<br />&nbsp; { reply, State, State };<br />handle_call (_Msg, _From, State) -&gt;<br />&nbsp; { noreply, State }.<br /><br />handle_cast (stop, State) -&gt; <br />&nbsp; { stop, normal, State };<br />handle_cast (_Msg, State) -&gt;<br />&nbsp; { noreply, State }.<br /><br />handle_info (_Info, State) -&gt;<br />&nbsp; { noreply, State }.<br /><br />terminate (_Reason, _State) -&gt;<br />&nbsp; ok.<br /><br />code_change (_OldVsn, State, _Extra) -&gt;<br />&nbsp; { ok, State }.<br /><br />read_key (Key, Tab) -&gt;<br />&nbsp; case ets:lookup (Tab, Key) of<br />&nbsp;&nbsp;&nbsp; [ { _, Value } ] -&gt; Value;<br />&nbsp;&nbsp;&nbsp; [] -&gt; 0<br />&nbsp; end.<br /><br />increment_key (Key, Tab) -&gt;<br />&nbsp; try ets:update_counter (Tab, Key, 1)<br />&nbsp; catch<br />&nbsp;&nbsp;&nbsp; _ : _ -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; case ets:insert_new (Tab, { Key, 1 }) of<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; true -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ok;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; false -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ets:update_counter (Tab, Key, 1)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end<br />&nbsp; end,<br />&nbsp; Tab.</p> <p>And an example use in the shell:</p> <p class="codesnippet">% erl<br />Erlang (BEAM) emulator version 5.6.5 [source] [async-threads:0] [kernel-poll:false]<br /><br />Eshell V5.6.5&nbsp; (abort with ^G)<br />1&gt; f (), { ok, Counter } = counter2:new (), A = now (), [ counter2:increment (Counter, [ test ]) || _ &lt;- lists:seq (1, 100000) ], B = now (), Final = counter2:read (Counter, [test]), counter2:free (Counter), { timer:now_diff (B, A), Final }.<br />{906200,[100000]}</p> <p>That may not seem much faster (20% wall clock), but essentially no messages are generated in the ets-based version.&nbsp; I worked on a large system with a sophisticated statistics gathering system which heavily leveraged Erlang messaging, analogous the first counter implementation above.&nbsp; After significant work optimizing the main application operations, profiling indicated the statistics process (consuming the message queue) had the highest CPU utilization.&nbsp; Presumably, using ets for node-local collection and then messaging to assemble statistics across the cluster would have further improved overall system performance.</p> <p>Here's how it works:</p> <ol> <li>A new counter is constructed via <span class="codesnippet">new/0</span>.&nbsp; This actually creates a public ets table which will contain the counters, plus a gen_server process which will own the ets table. <ul> <li>Except for a single message to get the table identifier from the process, no messages will be sent to the owning process during the lifetime of the counter (until <span class="codesnippet">free/1</span> is called).</li> </ul> </li> <li>Counters are incremented via the <span class="codesnippet">increment/2</span> call, which takes a counter id and a list of keys whose counters are to be incremented.&nbsp; <span class="codesnippet">increment/2</span> loops over the supplied keys and increments each one. <ul> <li>The <span class="codesnippet">increment_key/2</span> function is now significantly more complicated than in the previous implementation.&nbsp; Because the ets table will be accessed concurrently, we have to be careful.&nbsp; The complexities are:<ol> <li>You cannot update a counter record that does not exist.</li> <li>Two simultaneous inserts of the initial counter record, if not guarded against, will cause a loss in one of the increments.</li> <li>Most of the time we will be updating a counter record that does exist, so we should optimistically try that first for efficiency purposes.</li> </ol></li> <li>Is it good to be forced to think about all these complications regarding concurrent access?&nbsp; <strong>No, so take no pride in it</strong>.&nbsp; Notice the first implementation didn't have any of this complexity to it; we handled one message at a time, and we used an immutable data structure to hold state.&nbsp; This is a very simple example (a concurrently incremented counter), and it is still easy to get it wrong.&nbsp; Erlang was designed to help you avoid the need to reason about concurrent access to mutable data structures.&nbsp; If you find yourself doing that alot, you are not thinking in the Erlang way.</li> </ul> </li> <li>Counters are read via the <span class="codesnippet">read/2</span> call, which takes a counter id and a list of keys whose counters are to be read.&nbsp; This reads from the ets table directly, without messaging the owning process.</li> <li>The counter process is reclaimed via <span class="codesnippet">free/1</span>.&nbsp; It will also automatically exit if the creating process exits, due to the use of start_link.&nbsp; Note the Erlang VM will delete the ets table when the owning (counter) process exists.</li> </ol>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/Counters.htmlSat, 18 Jul 2009 00:36:41 GMTindex.htmlhttp://erlanganswers.com/web/mcedemo/index.html<h2>Welcome</h2> <p>This is a site devoted to providing color commentary for <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi/4">erlang questions</a>.&nbsp; You can peruse the <a href="/web/mcerecent/">recently updated content</a> indicated in the left bar, or you can check out the <a href="/web/mceindex/">index of all content.</a></p> <p>I am also working on a <a href="/web/mailsearch">searchable socialized Erlang Questions archive</a>.&nbsp; I'm slowly populating it with messages so you may or may not find it useful right now.</p> <p>Naturally the site is written entirely in Erlang.&nbsp; If you are curious how it is put together, I provide <a href="/web/mcedemo/about/infrastructure.html">some information</a>.</p>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/index.htmlFri, 17 Jul 2009 20:19:18 GMTmnesia/DistributedHelloWorld.htmlhttp://erlanganswers.com/web/mcedemo/mnesia/DistributedHelloWorld.html<h2>Mnesia Distributed Hello World</h2> <h3>Table of Contents</h3> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#helloworld">Hello World</a></li> <li><a href="#problemsolved">Problem Solved</a></li> <li><a href="#dynamicmaintenance">Dynamic Maintenance</a> <ul> <li><a href="#schema">Schema</a></li> <li><a href="#othertables">Other Tables</a></li> </ul> </li> </ul> <h3><a name="introduction"></a>Introduction</h3> <p>One of the simultaneously coolest and most challenging features of Mnesia is distributed operation.&nbsp; Before even tackling the complexities of managing failure cases, some developers have trouble starting it up, leading to <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:44799:200906:caijbnjiclmkdogppafl">solicitations for help</a> like the following:</p> <blockquote>Hi<br />I've recently started learning erlang/mnesia and need help setting up mnesia<br />on two nodes. Here's the series of steps (on same host):<br />$cd ~/test/foo<br />$erl -sname foo -setcookie chip_cookie<br /><br /> 1. mnesia:create_schema([node()]).<br /> 2. mnesia:start().<br /> 3. rd(person, {name,age}).<br /> 4.<br /> mnesia:create_table(person,[{attributes,record_info(fields,person)},{disc_copies,[node()]}]).<br /> 5. mnesia:dirty_write(#person{name="george", age=30}).<br /><br /># in another window, on same host:<br />$cd ~/test/bar<br />$erl -sname bar -setcookie chip_cookie<br />1&gt; mnesia:start().<br />#back on foo node:<br />net_adm:ping('foo@*hostname*').<br />*pong*<br />mnesia:add_table_copy(schema, 'bar@*hostname*', disc_copies).<br />*{aborted,{badarg,schema,disc_copies}} &lt;--- WHY DOES THIS FAIL?*<br />mnesia:add_table_copy(schema, 'bar@*hostname*', ram_copies).<br />*{atomic,ok}*<br />mnesia:add_table_copy(person, 'bar@*hostname*', disc_copies).<br />*{atomic,ok}*<br />*# back on node bar*<br />mnesia:info().<br />*It doesn't show the 'person' table that was copied from foo. Why is that?*<br />How can I add more nodes to an existing mnesia database which was created<br />with only one node ie. mnesia:create_schema([node()]).<br />Thanks<br />-- <br />Yellowfish Technologies Inc<br /><a href="http://www.yellowfish.biz/">http://www.yellowfish.biz</a><br />praveen.ray@yellowfish.biz<br /></blockquote> <p>It happens alot in Erlang that the error messages only make sense if you know alot about Erlang already, as is the case above.</p> <h3><a name="helloworld"></a>Hello World</h3> <p>First, let's do a distributed hello world, and then we'll be in a better position to understand what went wrong for the poster.&nbsp; You can follow along by typing in these commands into two OS shells, which I identify as shell1 and shell2.</p> <p class="codesnippet">shell1% erl -name one -setcookie yum<br /> shell2% erl -name two -setcookie yum</p> <p>At this point you have two Erlang shells started.&nbsp; For this next bit, everything we will type will be inside the Erlang shells, which I identify as <span class="codesnippet">one@ted-teds-computer.local </span>and <span class="codesnippet">two@ted-teds-computer.local</span>.&nbsp; In what follows you should substitute <span class="codesnippet">ted-teds-computer.local</span> with your hostname as identified by the Erlang shell.</p> <p class="codesnippet">(one@ted-teds-computer.local)1&gt; mnesia:start ().<br /> ok<br /> (two@ted-teds-computer.local)1&gt; mnesia:start ().<br /> ok<br /> (one@ted-teds-computer.local)2&gt; mnesia:change_config (extra_db_nodes, [ 'two@ted-teds-computer.local' ]).<br /> {ok,['two@ted-teds-computer.local']}<br /> (one@ted-teds-computer.local)3&gt; mnesia:system_info (running_db_nodes).<br /> ['two@ted-teds-computer.local', 'one@ted-teds-computer.local']</p> <p>At this point you have a distributed Mnesia system running on the two nodes.&nbsp; Let's dissect the steps.</p> <ol> <li>Mnesia is started on both nodes via <span class="codesnippet">mnesia:start/0</span>. <ul> <li>Since this is a fresh start of Mnesia on two brand new nodes, each node creates a ram-resident schema.&nbsp; To make these nodes have persistent databases the schema must be converted to disc-resident.&nbsp; Once converted to disc-resident, this will be remembered across node restarts.</li> </ul> </li> <li>On (either) one of the nodes Mnesia is told there is an additional node, via <span class="codesnippet">mnesia:change_config/2</span>.&nbsp; Behind the scenes, it tries to contact the Mnesia on the other node and merge the schemas.&nbsp; Assuming the schemas are compatible, a unified schema containing all the nodes is created and installed on all the nodes. <ul> <li>Newly created ram-resident schemas are always considered consistent with any other schema, in order to facilitate reintroduction of stateless nodes.&nbsp; Disc-resident schemas that were part of a distributed schema but were out of contact for a while, e.g. due to node failure, will also be made consistent if possible.&nbsp; However, two persistent schemas created in isolation cannot be merged in this fashion; it's really intended for building up a distributed schema starting with a single node.</li> </ul> </li> </ol> <p>So far so good.&nbsp; To persist the situation so that it survives restart we need to convert the schemas to disc-resident.</p> <p class="codesnippet">(one@ted-teds-computer.local)4&gt; mnesia:change_table_copy_type(schema, node(), disc_copies).<br /> {atomic,ok}<br /> (two@ted-teds-computer.local)2&gt; mnesia:change_table_copy_type(schema, node(), disc_copies).<br /> {atomic,ok}</p> <p>At this point (but not before) you should see directories on your disk where you started the Erlang nodes.</p> <p class="codesnippet">shell1% ls -l Mnesia.*<br />Mnesia.one@ted-teds-computer.local:<br />total 40<br />-rw-r--r--&nbsp;&nbsp; 1 pmineiro&nbsp; admin&nbsp;&nbsp; 170 Jun 26 14:43 DECISION_TAB.LOG<br />-rw-r--r--&nbsp;&nbsp; 1 pmineiro&nbsp; admin&nbsp;&nbsp; 195 Jun 26 14:43 LATEST.LOG<br />-rw-r--r--&nbsp;&nbsp; 1 pmineiro&nbsp; admin&nbsp; 8824 Jun 26 14:43 schema.DAT<br /><br />Mnesia.two@ted-teds-computer.local:<br />total 40<br />-rw-r--r--&nbsp;&nbsp; 1 pmineiro&nbsp; admin&nbsp;&nbsp; 170 Jun 26 14:43 DECISION_TAB.LOG<br />-rw-r--r--&nbsp;&nbsp; 1 pmineiro&nbsp; admin&nbsp;&nbsp; 195 Jun 26 14:43 LATEST.LOG<br />-rw-r--r--&nbsp;&nbsp; 1 pmineiro&nbsp; admin&nbsp; 8824 Jun 26 14:43 schema.DAT</p> <p>Only at this point, with a persistent on-disc database, can you succesfully create disc-based table copies.</p> <p class="codesnippet">(one@ted-teds-computer.local)5&gt; mnesia:create_table (foo, [ { disc_copies, mnesia:system_info (running_db_nodes) } ]).<br /> {atomic,ok}<br /> (one@ted-teds-computer.local)6&gt; mnesia:dirty_write ({ foo, hello, world }).<br /> ok<br /> (two@ted-teds-computer.local)3&gt; mnesia:dirty_read (foo, hello).<br /> [{foo,hello,world}]</p> <p>The data was written on the first node and read on the second.&nbsp; Cool!</p> <h3><a name="problemsolved"></a>Problem Solved</h3> <p>Now you have enough intuition that I can explain what was wrong with the original poster's sequence.</p> <ol> <li>He creates a disc-resident schema on the node foo via <span class="codesnippet">mnesia:create_schema/1</span>.</li> <li>He starts mnesia on the node foo via <span class="codesnippet">mnesia:start/0</span>.&nbsp; Node foo is now running a disc-resident schema.</li> <li>He creates a disc-resident table called person on node foo via <span class="codesnippet">mnesia:create_table/2</span>, and writes a record via <span class="codesnippet">mnesia:dirty_write/1</span>.</li> <li>He creates a ram-resident schema on the node bar via <span class="codesnippet">mnesia:start/0</span>. <ul> <li>So far so good, but now the trouble starts.</li> </ul> </li> <li>He tries to add a disc-resident schema on the node bar via <span class="codesnippet">mnesia:add_table_copy/3</span>.&nbsp; This fails because he already has a table copy of schema on the node bar which is ram-resident.&nbsp; He would need to call <span class="codesnippet">mnesia:change_table_copy_type/3</span> instead to change the schema to disc-resident.</li> <li>He tries to add a disc-resident copy of the person table on node bar via <span class="codesnippet">mnesia:add_table_copy/3</span>, which would work, except that adding a disc-resident table copy to a node with a ram-resident schema is not allowed; first the schema must be made disc-resident.</li> </ol> <p>For pedagogical fun, see if you can modify the sequence in the original message to cause the person table to be replicated disc-based on both nodes.</p> <h3><a name="dynamicmaintenance"></a>Dynamic Maintenance</h3> <h4><a name="schema"></a>Schema</h4> <p>In general if one wants to create or update a distributed Mnesia configuration, the sequence of operations is:</p> <ol> <li><span class="codesnippet">mnesia:start/0</span> <ul> <li>If this is the first time starting mnesia for this node, a fresh ram-based schema will be created, which can be merged with any existing schema.&nbsp; If this is not the first time starting mnesia for this node, any previously created disc-based schema will be loaded, otherwise a fresh ram-based schema will be created.</li> </ul> </li> <li><span class="codesnippet">mnesia:change_config (extra_db_nodes, NodeList)</span> <ul> <li>In theory this can be done at any time in order to add additional nodes to the mnesia distributed schema, assuming the additional nodes have compatible schemas.&nbsp; In practice, the best way to ensure a compatible schema is to join the distributed configuration as a fresh ram-based schema which is guaranteed to merge with any existing schema.</li> </ul> </li> <li><span class="codesnippet">mnesia:change_table_copy_type (schema, node (), disc_copies) </span> <ul> <li>This will change the local mnesia schema type from ram-based to disc-based.</li> </ul> </li> </ol> <p><a href="http://code.google.com/p/schemafinder">Schemafinder</a> is a project on Google Code which automates these steps of maintaining a distributed mnesia configuration, working in conjunction with a <a href="http://code.google.com/p/nodefinder">nodefinder</a> strategy to discover other Erlang nodes.&nbsp; Schemafinder tries to avoid extra work by not calling <span class="codesnippet">mnesia:change_config/2</span> if the node list has not changed, and not calling <span class="codesnippet">mnesia:change_table_copy_type/3</span> if the schema is already disc-based.&nbsp; Although these functions are idempotent, they acquire a schema transaction in order to run which can introduce latency when a cluster is under extreme load.&nbsp; You can inspect the <a href="http://code.google.com/p/schemafinder/source/browse/trunk/schemafinder/src/schemafindersrv.erl#44">relevant bit of code</a> for inspiration.</p> <p>In the event you want to remove a node from the schema, you can use <span class="codesnippet">mnesia:del_table_copy/2</span> on the schema table.</p> <h4><a name="othertables"></a>Other Tables</h4> <p>For other tables, tables can be initialized with a node list using the <span class="codesnippet">ram_copies</span>, <span class="codesnippet">disc_copies</span>, and/or <span class="codesnippet">disc_only_copies</span> options to <span class="codesnippet">mnesia:create_table/2</span>.&nbsp; In addition an existing table can have additional copies placed via <span class="codesnippet">mnesia:add_table_copy/3</span>, and removed via <span class="codesnippet">mnesia:del_table_copy/2</span>.&nbsp; Note a node can only have one copy of a table, so if you want to change the type of a table at a node use <span class="codesnippet">mnesia:change_table_copy_type/3</span>.&nbsp; Continuing with the above example:</p> <p class="codesnippet">(one@ted-teds-computer.local)7&gt; mnesia:create_table (bar, [ { ram_copies, [ node () ] } ]).<br />{atomic,ok}<br />(two@ted-teds-computer.local)4&gt; mnesia:table_info (bar, active_replicas).<br />['one@ted-teds-computer.local']<br />(two@ted-teds-computer.local)5&gt; mnesia:add_table_copy (bar, node (), disc_copies).<br />{atomic,ok}<br />(two@ted-teds-computer.local)6&gt; mnesia:table_info (bar, active_replicas).<br />['two@ted-teds-computer.local',<br />&nbsp;'one@ted-teds-computer.local']<br />(two@ted-teds-computer.local)7&gt; mnesia:add_table_copy (bar, node (), ram_copies).<br />{aborted,{already_exists,bar,'two@ted-teds-computer.local'}}<br />(two@ted-teds-computer.local)8&gt; mnesia:change_table_copy_type (bar, node (), ram_copies).<br />{atomic,ok}</p> <p>The sequence is as follows:</p> <ol> <li>Initially table bar is created with a single ram copy on node one.</li> <li>A disc copy is added to node two. <ul> <li>Note there is no requirement that the copy type be the same across the nodes.</li> </ul> </li> <li>An attempt to add another copy of the table to node two fails.</li> <li>The copy on node two is changed from disc copies to ram copies.</li> </ol>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/mnesia/DistributedHelloWorld.htmlTue, 07 Jul 2009 17:15:08 GMTmnesia/SchemaFree.htmlhttp://erlanganswers.com/web/mcedemo/mnesia/SchemaFree.html<h2>Mnesia Schema Free Operation</h2> <h3>Introduction</h3> <p>Most programmers are aware of the distinction between <a href="http://en.wikipedia.org/wiki/Statically_typed#Static_typing">statically typed</a> and <a href="http://en.wikipedia.org/wiki/Dynamically_typed#Dynamic_typing">dynamically typed</a> programming languages, with respective advantages and disadvantages.&nbsp; Analogously, databases can be schema driven or schema-free.&nbsp; All relational database engines that I am familiar with are schema driven, while recent champions of the schema-free approach include <a href="http://couchdb.apache.org/">CouchDB</a> and <a href="http://aws.amazon.com/simpledb/">SimpleDB</a>.&nbsp; A schema-free database can simplify issues around the evolution of the database-driven application, in exchange for more work from the application developer (and more opportunities to introduce defects).</p> <p>Since the Erlang ethos embraces continuous application upgrades and operational simplicity, it is natural to ask whether Mnesia can be used in an essentially schema-free way.&nbsp; The answer is yes.&nbsp; Futhermore, relational database engines utilize the schema for validation, consistency, indexing, and query planning, but Mnesia's built-in capabilities in this regard are primitive that the cost of forgoing them is minimal (typically, the application programmer does these things manually).&nbsp; This, plus the ability to store an arbitrary Erlang term in an Mnesia table, leads to a natural way to use Mnesia in a disciplined but essentially schema-free manner.</p> <h3>The Strategy</h3> <p>An approach I've taken successfully on large projects in the past is as follows:</p> <ul> <li>Mnesia tables require a schema, but the table is created with a trivial initial specification and never changed over the lifetime of the application.</li> <li>Records in the table consist of a key (Mnesia requirement) and an additional record which is coupled with an API for manipulation.</li> <li>As the application evolves, the implementation term changes for newly created database records, and the application interprets old implementation terms to the extent possible.</li> </ul> <p>An example is best, so let's do one.&nbsp; Our database driven application will utilize a table called notes to keep web-sticky notes.</p> <p class="codesnippet">-record (note, { id, impl }).</p> <p class="codesnippet">mnesia:create_table (note, [ { attributes, record_info (fields, note) } ]).</p> <p>So far so good.&nbsp; Initially our note will have an id and an owner.</p> <p class="codesnippet">-module (notes).<br />-export ([ make_note/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; get_id/1,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; get_owner/1,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; set_id/2,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; set_owner/2 ]).<br /><br />-include ("notes.hrl").<br /><br />-record (notes_impl, { owner }).<br /><br />make_note (Id, Owner) -&gt;<br />&nbsp; #note { id = Id,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; impl = #notes_impl { owner = Owner }<br />&nbsp; }.<br /><br />get_id (#note { id = Id }) -&gt;<br />&nbsp; Id.<br /><br />get_owner (#note { impl = #notes_impl { owner = Owner } }) -&gt;<br />&nbsp; Owner.<br /><br />set_id (Note = #note{}, Id) -&gt;<br />&nbsp; Note#note { id = Id }.<br /><br />set_owner (Note = #note { impl = #notes_impl {} }, Owner) -&gt;<br />&nbsp; Note#note { impl = #notes_impl { owner = Owner } }.</p> <p>The declaration of <span class="codesnippet">#notes_impl{}</span> is localized to the <span class="codesnippet">notes</span> module, and by convention any access or mutation of a note is done exclusively via the functions in the <span class="codesnippet">notes</span> module.</p> <p>Well our application happily exists for a while persisting notes to the data store.&nbsp; Finally we get a feature request that our notes contain some content in addition to an owner.&nbsp; Now we are going to make the change without changing the schema, by changing the notes module.</p> <p class="codesnippet">-module (notes).<br />-export ([ make_note/3,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; get_id/1, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; get_owner/1,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; get_contents/1,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; set_id/2, &nbsp;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; set_owner/2, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; set_contents/2 ]).<br /><br />-include ("notes.hrl").<br /><br />-record (notes_impl, { owner }). % deprecated<br />-record (notes_implv2, { owner, contents="" }).<br /><br />make_note (Id, Owner, Contents) -&gt;<br />&nbsp; #note { id = Id,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; impl = #notes_implv2 { owner = Owner, contents = Contents } <br />&nbsp; }.<br />&nbsp; &nbsp;<br />get_id (#note { id = Id }) -&gt;<br />&nbsp; Id.<br /><br />get_owner (#note { impl = #notes_implv2 { owner = Owner } }) -&gt;<br />&nbsp; Owner;<br />get_owner (Note) -&gt;<br />&nbsp; get_owner (convert_old (Note)).<br />&nbsp;<br />get_contents (#note { impl = #notes_implv2 { contents = Contents } }) -&gt;<br />&nbsp; Contents;<br />get_contents (Note) -&gt;<br />&nbsp; get_contents (convert_old (Note)).<br />&nbsp;<br />set_id (Note = #note{}, Id) -&gt;<br />&nbsp; Note#note { id = Id }.<br /><br />set_owner (Note = #note { impl = Impl = #notes_implv2 {} }, Owner) -&gt;<br />&nbsp; Note#note { impl = Impl#notes_implv2 { owner = Owner } };<br />set_owner (Note, Owner) -&gt;<br />&nbsp; set_owner (convert_old (Note), Owner).<br /><br />set_contents (Note = #note { impl = Impl = #notes_implv2 {} }, Contents) -&gt;<br />&nbsp; Note#note { impl = Impl#notes_implv2 { contents = Contents } };<br />set_contents (Note, Contents) -&gt;<br />&nbsp; set_contents (convert_old (Note), Contents).<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;<br />%%% Private&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;<br />convert_old (Note = #note { impl = #notes_impl { owner = Owner } }) -&gt;<br />&nbsp; Note#note { impl = #notes_implv2 { owner = Owner } }.</p> <p>The changes can be summarized thusly:</p> <ol> <li>A new implementation record (<span class="codesnippet">#notes_implv2{}</span>) containing the new field (<span class="codesnippet">contents</span>) is introduced.</li> <li>The constructor is modified to return the latest implementation type.</li> <li>The accessors and mutators are modified to pattern match on the latest implementation type; and to otherwise attempt to convert an old implementation to the latest implementation and try again. <ul> <li>I like to centralize all the "backwards compatibility" logic into a single function like <span class="codesnippet">convert_old/1</span> for maintainability.</li> </ul> </li> </ol> <p>In practice the database will consist of a mixture of the latest and older implementation records, with individual records being migrated when mutated and persisted.&nbsp; In this case there is a perfectly reasonable interpretation of old records so this is not a problem.&nbsp; Sometimes this is not possible.&nbsp; In such cases I would suggest doing an "unreasonable" interpretation of old records, which is designed to operate only transiently, while the database is incrementally transformed to the latest implementation record (e.g., using an expensive one-time operation which joins against or precomputes from another data source).</p> <h3>Pros and Cons</h3> <p>There's no free lunch.&nbsp; Let's enumerate some of the disadvantages:</p> <ul> <li>We've exchanged record syntax (e.g., <span class="codesnippet">Note#note.contents</span>) for function calls (e.g., <span class="codesnippet">notes:get_contents (Note)</span>).&nbsp; It's (probably irrelevantly) slower and more code maintenance, although fancy use of parse-transforms might be able to mitigate both of these problems. <ul> <li>The backwards compatibility mapping is an interesting source of defects and good testing strategies are required to cover all the cases.</li> </ul> </li> <li>We can't use Mnesia's built-in indexing capabilities on the "hidden columns".&nbsp; In my experience this is totally moot, since I always end up making my own indices (e.g., functional indices, ordered indices, composite column indices, etc).</li> <li>QLC usage is also frustrated.&nbsp; Again in my experience this is totally moot since I tend to manually query plan my joins in Mnesia.</li> <li>Nice-to-haves like <span class="codesnippet">mnesia:dirty_update_counter/3</span> are unavailable.&nbsp; In my opinion this is a real pain point, but a <a href="#finalnote">hybrid approach</a> can admit this.</li> </ul> <p>So why do it?&nbsp; There is one big advantage:</p> <ul> <li>The application and the datastore are now loosely coupled, rather than strongly coupled.&nbsp; Therefore, synchronization requirements between the two are greatly mitigated.</li> </ul> <p>What this means in practice is, you can:</p> <ul> <li> Launch a software change to an entire cluster a few boxes at a time, without worry.</li> <li>Continue to operate your datastore with no delays or performance degradations during launch, and (if necessary) migrate the data asynchronously using a background process (e.g., consuming only spare capacity).</li> </ul> <p>In my experience the advantages outweigh the disadvantages.&nbsp; YMMV.</p> <h3><a name="finalnote"></a>Final Note</h3> <p>It is possible to take a hybrid approach, with records that have some columns represented explicitly and part of the schema, and other columns represented implicitly via an implementation member as indicated here.&nbsp; This can be a way to instrument an application for flexibility to future changes.</p>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/mnesia/SchemaFree.htmlFri, 03 Jul 2009 00:25:25 GMTmnesia/SchemaMaintenance.htmlhttp://erlanganswers.com/web/mcedemo/mnesia/SchemaMaintenance.html<h2>Mnesia Schema Maintenance</h2> <h3>Table of Contents</h3> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#basicsmakingsuretablesarethere">Basics: Making Sure Tables are There</a></li> <li><a href="#advancedschemamigrations">Advanced: Schema Migrations</a></li> </ul> <h3><a name="introduction"></a>Introduction</h3> <p>Any database driven application faces the problem of initialization and maintenance of the data store in conjunction with the rest of the application, and Mnesia driven applications are no exception.&nbsp; Consequently the <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:44846:200906:iecdocabpddjinbfojpm">following type of inquiry</a> is periodically directed to the list</p> <blockquote>Hi all,<br /><br />I have a question about how folks bootstrap an OTP application that<br />depends on mnesia. Consider an OTP application foo that uses mnesia.<br />Since mnesia is specified in the needed applications list, mnesia must<br />be started before you start foo.<br /><br />For the first run of foo, one needs to call mnesia:create_schema/1<br />_before_ mnesia is started.<br /><br />I'd like to be able to start foo on a clean system and have the schema<br />created if needed, but don't see a way to handle this. I suspect that<br />I need to adjust my expectations :-)<br /><br />Is there a recommended way to handle mnesia initialization? Is the<br />standard practice to create the schema outside of the application<br />start up flow or am I missing a way to handle this as part of foo's<br />application initialization?<br /><br />Thanks,<br /><br />+ seth<br /><br /></blockquote> <p>Digging into the thread there are actually two questions here:</p> <ol> <li>How to initialize and maintain the (distributed) schema.</li> <li>How to initialize and maintain application specific tables.</li> </ol> <p>The first question was answered in another article <a href="/web/mcedemo/mnesia/DistributedHelloWorld.html#dynamicmaintenance">Mnesia Distributed Hello World</a>, so I'll focus on the second question.</p> <h3><a name="basicsmakingsuretablesarethere"></a>Basics: Making Sure Tables are There</h3> <p>My favorite solution is to have a gen_server in the application supervision tree which owns the schema, and I use this strategy for erlanganswers.com.&nbsp; The init callback for the gen_server owning the schema for this site is</p> <p class="codesnippet">init ([]) -&gt;<br />&nbsp; ensure_schema (),<br /><br />&nbsp; { ok, #statev4 { backup = none } }.</p> <p>All the heavy lifting is in the <span class="codesnippet">ensure_schema/0</span> function.&nbsp; <em>Note: if you adapt this code for your own purposes, you will find that unless you are using <a href="http://code.google.com/p/mnesiaex">mnesiaex</a> the calls mentioning external_copies will fail, since vanilla mnesia lacks the concept.</em></p> <p class="codesnippet">ensure_schema () -&gt;<br />&nbsp; case fast_change_table_copy_type (schema, node (), disc_copies) of<br />&nbsp;&nbsp;&nbsp; { atomic, ok } -&gt; ok;<br />&nbsp;&nbsp;&nbsp; { aborted, { already_exists, schema, _, disc_copies } } -&gt; ok<br />&nbsp; end,<br /><br />&nbsp; ensure_table (mcedoc,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ { attributes, record_info (fields, mcedoc) },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { type, { external, ordered_set, tcbdbtab } },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { external_copies, [ node () ] },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { user_properties, [<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { deflate, true },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { async_write, true },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { bucket_array_size, 101 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { leaf_node_cache, 1 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { nonleaf_node_cache, 1 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { leaf_members, 256 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { non_leaf_members, 512 }<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ] }<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ]),<br /><br />&nbsp; maybe_transform_mcedoc (),<br /><br />&nbsp; ensure_table (mcedoctime,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ { attributes, record_info (fields, mcedoctime) },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { type, { external, ordered_set, tcbdbtab } },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { external_copies, [ node () ] },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { user_properties, [<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { deflate, true },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { async_write, true },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { bucket_array_size, 101 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { leaf_node_cache, 1 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { nonleaf_node_cache, 1 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { leaf_members, 256 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { non_leaf_members, 512 }<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ] }<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ]),<br /><br />&nbsp; ensure_table (mcecomment,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ { attributes, record_info (fields, mcecomment) },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { type, { external, ordered_set, tcbdbtab } },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { external_copies, [ node () ] },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { user_properties, [<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { deflate, true },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { async_write, true },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { bucket_array_size, 101 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { leaf_node_cache, 1 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { nonleaf_node_cache, 1 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { leaf_members, 256 },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { non_leaf_members, 512 }<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ] }<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ]),<br /><br />&nbsp; ok.<br /><br />ensure_table (TableName, TabDef) -&gt;<br />&nbsp; Result = case fast_create_table (TableName, TabDef) of<br />&nbsp;&nbsp;&nbsp; { atomic, ok } -&gt; ok;<br />&nbsp;&nbsp;&nbsp; { aborted, { already_exists, _ } } -&gt; ok;<br />&nbsp;&nbsp;&nbsp; { aborted, { combine_error, _, _ } } -&gt; ok&nbsp; % wtf: why not already_exists?<br />&nbsp; end,<br />&nbsp; ok = mnesia:wait_for_tables ([ TableName ], infinity),<br />&nbsp; ok = mnesia:wait_for_tables ([ frag_table_name (TableName, N)<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; || N &lt;- lists:seq (1, num_frags (TableName)),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; N &gt; 1 ],<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; infinity),<br />&nbsp; Result.<br /><br />fast_change_table_copy_type (TableName, Node, CopyType) -&gt;<br />&nbsp; try { lists:member (Node, used_nodes (TableName)),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; lists:member (Node, used_nodes (TableName, CopyType)) } of<br />&nbsp;&nbsp;&nbsp; { true, true } -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { aborted, { already_exists, TableName, Node, CopyType } };<br />&nbsp;&nbsp;&nbsp; { true, false } -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mnesia:change_table_copy_type (TableName, Node, CopyType);<br />&nbsp;&nbsp;&nbsp; { false, _ } -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { aborted, { no_exists, TableName, Node } }<br />&nbsp; catch<br />&nbsp;&nbsp;&nbsp; _ : _ -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { aborted, { no_exists, TableName } }<br />&nbsp; end.<br /><br />fast_create_table (TableName, TabDef) -&gt;<br />&nbsp; try mnesia:table_info (TableName, type),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { aborted, { already_exists, TableName } }<br />&nbsp; catch<br />&nbsp;&nbsp;&nbsp; _ : _ -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mnesia:create_table (TableName, TabDef)<br />&nbsp; end.<br /><br />frag_table_name (TableName, 1) -&gt; TableName;<br />frag_table_name (TableName, FragNum) when FragNum &gt; 1 -&gt;<br />&nbsp; list_to_atom (atom_to_list (TableName) ++<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "_frag" ++<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; integer_to_list (FragNum)).<br /><br />maybe_transform_mcedoc () -&gt;<br />&nbsp; case mnesia:table_info (mcedoc, attributes) =:=<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; record_info (fields, mcedoc) of<br />&nbsp;&nbsp;&nbsp; true -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ok;<br />&nbsp;&nbsp;&nbsp; false -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; transform_mcedoc ()<br />&nbsp; end.<br /><br />num_frags (TableName) -&gt;<br />&nbsp; FragProps = mnesia:table_info (TableName, frag_properties),<br />&nbsp; case lists:keysearch (n_fragments, 1, FragProps) of<br />&nbsp;&nbsp;&nbsp; { value, { n_fragments, NFrags } } -&gt; NFrags;<br />&nbsp;&nbsp;&nbsp; _ -&gt; 1<br />&nbsp; end.<br /><br />transform_mcedoc () -&gt;<br />&nbsp; { atomic, ok } =<br />&nbsp;&nbsp;&nbsp; mnesia:transform_table (mcedoc,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fun ({ mcedoc, PathTime, Contents }) -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #mcedoc { path_time = PathTime,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; contents = Contents,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; draft = false };<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ({ mcedoc, PathTime, Contents, Draft }) -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #mcedoc { path_time = PathTime,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; contents = Contents,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; draft = Draft }<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; record_info (fields, mcedoc),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mcedoc).<br /><br />used_nodes (TableName) -&gt;<br />&nbsp; lists:usort (used_nodes (TableName, ram_copies) ++<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; used_nodes (TableName, disc_copies) ++<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; used_nodes (TableName, external_copies) ++<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; used_nodes (TableName, disc_only_copies)).<br /><br />used_nodes (TableName, CopyType) -&gt;<br />&nbsp; mnesia:table_info (TableName, CopyType).</p> <p>Let's go through it step-by-step.</p> <ol> <li>First I ensure that the local schema is disc-based by essentially calling <span class="codesnippet">mnesia:change_table_copy_type/3</span>.&nbsp; <span class="codesnippet">fast_change_table_copy_type/3</span> is a local function that I wrote for <a href="http://code.google.com/p/schemafinder">schemafinder</a> to prevent unnecessary calls to <span class="codesnippet">mnesia:change_table_copy_type/3</span>.&nbsp; Here it is overkill, but in a large distributed configuration under load the latency associated with acquiring a schema transaction can be sizeable so using "dirty schema reads" (<span class="codesnippet">mnesia:table_info/2</span>, <span class="codesnippet">mnesia:system_info/1</span>) can save time.</li> <li>Next via <span class="codesnippet">ensure_table/2</span> I ensure that the mcedoc table exists, and otherwise create it with the indicated parameters.&nbsp; (These are <a href="http://code.google.com/p/tcerl">tcerl</a> tables which leverage <a href="http://code.google.com/p/mnesiaex">mnesiaex</a>, you should modify them to be ets or dets tables if you are using standard mnesia.)&nbsp; The <span class="codesnippet">ensure_table/2</span> function creates the table if it does not already exist, and then calls <span class="codesnippet">mnesia:wait_for_tables/1</span> to guarantee the table (and any associated fragments) are ready to be used.&nbsp; Note that <span class="codesnippet">ensure_table/2</span> is idempotent.</li> <li>Ignore the <span class="codesnippet">maybe_transform_mcedoc/0</span> call for now, we'll cover it in the next section.</li> <li>The mcedoctime and mcecomment tables are similarly treated via <span class="codesnippet">ensure_table/2</span>.</li> </ol> <p>Since this is called in the init callback of a gen_server which is in the application hierarchy of the application providing the API to the document store for the site, we are assured that the application is not started until the database is ready.&nbsp; In addition, we can list this gen_server as the first child in the childspec for the application, to ensure that no other processes associated with the document store attempt to access the schema prematurely.</p> <p>If you do time-consuming initialization of the database during the <span class="codesnippet">init/1</span> callback (e.g., loading in a large starting dataset), you should increase the timeout passed to <span class="codesnippet">gen_server:start_link/3,4</span> appropriately (or, alternatively, do not specify a timeout).</p> <h3><a name="advancedschemamigrations"></a>Advanced: Schema Migrations</h3> <p>Rarely is the initial schema associated with the first version of the application sufficient as the application evolves (unless an essentially <a href="/web/mcedemo/mnesia/SchemaFree.html">schema-free approach</a> is taken).&nbsp; One of the nice features of Mnesia is the ability to migrate the schema while the database is live, via <span class="codesnippet">mnesia:transform_table/4</span>.&nbsp; Since we have delegated ownership of our schema to a gen_server, we can migrate the schema as part of the code upgrade process.</p> <p>In particular the relevant section of code for erlanganswers.com is</p> <p class="codesnippet">-record (state, {}).<br />-record (statev2, { backup }).<br />-record (statev3, { backup }).<br />-record (statev4, { backup }).<br /><br />code_change (_OldVsn, #state{}, _Extra) -&gt;<br />&nbsp; register (?MODULE, self ()),<br />&nbsp; ensure_schema (),<br />&nbsp; { ok, #statev4 { backup = none } };<br />code_change (_OldVsn, #statev2{ backup = Backup }, _Extra) -&gt;<br />&nbsp; ensure_schema (),<br />&nbsp; { ok, #statev4 { backup = Backup } };<br />code_change (_OldVsn, #statev3{ backup = Backup }, _Extra) -&gt;<br />&nbsp; ensure_schema (),<br />&nbsp; { ok, #statev4 { backup = Backup } };<br />code_change (_OldVsn, State, _Extra) -&gt;<br />&nbsp; { ok, State }.</p> <p>This is the code change handler for the gen_server.&nbsp; There have been 3 hot upgrades of this server since it was launched, and by convention I represent these as different records.</p> <ol> <li><span class="codesnippet">#state{}</span>: This was the original version of the server.</li> <li><span class="codesnippet">#statev2{}</span>: For this upgrade, the mcedoc table was modified.&nbsp; In addition I forgot to register the server (I used gen_server:start_link/3 instead of gen_server:start_link/4) which was frustrating implementation of the backup system.&nbsp; Therefore I register the server in the code change handler.&nbsp; Newly started instances of the server will be registered so this is not required for subsequent code changes.&nbsp; </li> <li><span class="codesnippet">#statev3{}</span>: Another change to the mcedoc table was introduced.</li> <li><span class="codesnippet">#statev4{}</span>: The mcecomments table was introduced as part of the launch of the comments system.</li> </ol> <p>In all cases I write <span class="codesnippet">ensure_schema/0</span> to take any version of the database and bring it all the way up to date.&nbsp; In most cases this is the easiest way to reason about the system, but sometimes migration must take a certain path through database versions (this is more likely on a commercial project where only certain database state transitions have been tested and quality assurance is paramount).&nbsp;</p> <p>We have already met much of the <span class="codesnippet">ensure_schema/0</span> function, with the exception of the <span class="codesnippet">maybe_transform_mcedoc/0</span> function.&nbsp; Let's analyze that now.</p> <p class="codesnippet">maybe_transform_mcedoc () -&gt;<br />&nbsp; case mnesia:table_info (mcedoc, attributes) =:=<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; record_info (fields, mcedoc) of<br />&nbsp;&nbsp;&nbsp; true -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ok;<br />&nbsp;&nbsp;&nbsp; false -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; transform_mcedoc ()<br />&nbsp; end.<br /><br />transform_mcedoc () -&gt;<br />&nbsp; { atomic, ok } =<br />&nbsp;&nbsp;&nbsp; mnesia:transform_table (mcedoc,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fun ({ mcedoc, PathTime, Contents }) -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #mcedoc { path_time = PathTime,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; contents = Contents,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; draft = false };<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ({ mcedoc, PathTime, Contents, Draft }) -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; #mcedoc { path_time = PathTime,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; contents = Contents,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; draft = Draft }<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; record_info (fields, mcedoc),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mcedoc).</p> <p>The decision to transform is driven by whether the record fields associated with the table match the current definition of the record in the software.&nbsp; If there is a mismatch, <span class="codesnippet">mnesia:transform_table/4</span> is invoked.&nbsp; Importantly, we cannot use the record syntax to access an old version of the record, as this will lead to a run-time error.&nbsp; Instead we use the correspondence between records and tuples and access the old versions of the record as tuples.&nbsp; In this case, there was originally a definition of the mcedoc record, <span class="codesnippet">#mcedoc{path_time, contents=""}</span>, which was then augmented with a draft column, <span class="codesnippet">#mcedoc{path_time, contents="", draft=true}</span>, when I introduced the ability to have work-in-progress articles.&nbsp; This was further augmented with a comments_ok column, <span class="codesnippet">#mcedoc{path_time, contents="", draft=true, comments_ok=true}</span>, when I added the comments system.&nbsp; The transform function can map any version of the record that has ever been used into the latest version.</p> <p>Since the default timeout on a code change handler is 5 seconds, you should use one of the <a href="http://linux.die.net/man/4/appup">high level release instructions</a> in your appup file to increase the timeout when doing nontrivial amounts of work in a code change handler.</p>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/mnesia/SchemaMaintenance.htmlThu, 02 Jul 2009 23:20:14 GMTabout/infrastructure.htmlhttp://erlanganswers.com/web/mcedemo/about/infrastructure.html<h2>About the Site</h2> <p>The site consists of a document store written in <a href="http://erlang.org/doc/apps/mnesia/index.html">Mnesia</a>, a serving layer provided by <a href="http://erlang.org/doc/apps/inets/index.html">inets httpd</a>, a presentation layer which leverages <a href="http://nitrogenproject.com/">Nitrogen</a>, and an editing interface which leverages <a href="http://tinymce.moxiecode.com/">TinyMCE</a>.&nbsp; (<a href="http://couchdb.apache.org/">CouchDB</a> looked like a good choice for the document store, but I am super familiar with Mnesia so that was very easy for me to get going).</p> <p>The application style portions of the site are implemented via Nitrogen. &nbsp;The comment system utilizes <a href="http://www.nitrogenproject.com/web/samples/postback">postbacks</a> which update the Mnesia store. &nbsp;The CMS capabilities of the site are only available if you authenticate as myself.&nbsp; Essentially I made a Nitrogen element out of the TinyMCE editor and I use this to modify the content in a WYSIWYG style.&nbsp; Here's a picture of me editing this page:</p> <p><img src="/editpage.png" alt="Me editing this page" width="598" height="387" /></p> <p>TinyMCE is very cool.</p> <p>The <a href="/web/mcerss">rss feed</a> and <a href="/web/mcerecent/">recently updated</a> features are implemented with a separate ordered_set Mnesia table with a timestamp as a key.&nbsp; I use on disk ordered_set via <a href="http://code.google.com/p/tcerl">tcerl</a>.&nbsp; When I make changes it sets a flag which causes Mnesia to backup to <a href="http://aws.amazon.com/s3/">S3</a> at the next interval.&nbsp; I use <a href="http://code.google.com/p/s3fs/wiki/FuseOverAmazon">s3fs</a> to make that easy.</p>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/about/infrastructure.htmlFri, 26 Jun 2009 21:05:05 GMTPrivilegedPort.htmlhttp://erlanganswers.com/web/mcedemo/PrivilegedPort.html<h2>Binding to a Privileged Port</h2> <p>Table of Contents</p> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#donodothissetuidbeamasroot">Do not do this: setuid beam as root</a></li> <li><a href="#portforwarding">Port forwarding</a></li> <li><a href="#capabilities">Capabilities</a></li> <li><a href="#reverseproxy">Reverse proxy</a></li> <li><a href="#disableprivilegedports">Disable privileged ports</a></li> </ul> <h3><a name="introduction"></a>Introduction</h3> <p>Erlang is a natural fit for web applications and there are several webservers available for Erlang.&nbsp; Since ports 80 and 443 are privileged ports, <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:44735:200906:ifnpgolflnhmndieaceh">this question</a> arises periodically.</p> <blockquote>Hello!<br /><br />Often we have to run some kinds of our erlang programs with root<br />privileges, e.g. to allow them to bind a system port (&lt; 1024). It is<br />quite dangerous, because once your web-server has been hacked by a bad<br />guy, the whole system can be affected. I was very impressed to know<br />that some servers drop root priveleges right after port binding.<br /><br />Is there a way which will help me to change the privileges of a<br />running program to the minimum after a system port has been bound?<br /><br />Thanks.<br /><br />-- <br />Sergey Samokhin<br /></blockquote> <p>Here are some ideas.</p> <h3><a name="donotdothissetuidbeamasroot"></a>Do not do this: setuid beam as root</h3> <p>Setting beam setuid root will technically work, but this is a very bad idea.&nbsp; Erlang is a full service environment with functions like <a href="http://erlang.org/doc/man/os.html">os:cmd/1</a> that will rip a huge security hole in your system.</p> <h3><a name="portforwarding"></a>Port forwarding</h3> <p>This technique involves using your OS's port forwarding capabilities to forward ports 80 and 443 to non-privileged ports, and then binding those ports in a beam that runs as an ordinary user.&nbsp; For Linux iptables forwarding port 80 to port 8080 is accomplished via</p> <p><code>iptables -t nat -A PREROUTING -p tcp -m tcp --dport 80 -j REDIRECT --to-ports 8080</code></p> <p>and then the beam can bind port 8080 as an ordinary user.</p> <h3><a name="capabilities"></a>Capabilities</h3> <p>Some OS's can selectively grant only the ability to bind privileged ports.&nbsp; Under Linux this is done by setting the capabilities on the beam executable (or beam.smp if the smp emulator is used), e.g.,</p> <p><code>setcap 'cap_net_bind_service=+ep' /usr/lib/erlang/erts-5.6.5/bin/beam<br />setcap 'cap_net_bind_service=+ep' /usr/lib/erlang/erts-5.6.5/bin/beam.smp</code></p> <h3><a name="reverseproxy"></a>Reverse proxy</h3> <p>For http, there are programs specifically designed to drop privileges after binding the privileged port, and then act as a reverse proxy.&nbsp; <a href="http://nginx.net/">Nginx</a> is a popular example, and has many other interesting capabilities, including load balancing across a set of backends with failure detection, query rewriting recapabilities, keepalive support, and ssl support.</p> <h3><a name="disableprivilegedports"></a>Disable privileged ports</h3> <p>The concept of privileged ports generally creates more problems than it solves, since becoming root is a very small barrier these days (unlike the early days of the internet), yet the possibility of remote compromise is increased.&nbsp; Some OS's provide the ability to disable the concept of privileged ports entirely.&nbsp; On FreeBSD this can be done via</p> <p><code>sysctl net.inet.ip.portrange.reservedhigh=0</code></p>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/PrivilegedPort.htmlWed, 24 Jun 2009 23:07:16 GMTfacebook/VerifySignature.htmlhttp://erlanganswers.com/web/mcedemo/facebook/VerifySignature.html<h2>Verifying a Facebook Connect Signature</h2> <p>This site uses Facebook connect to authenticate.&nbsp; Basically, Facebook javascript sets cookies in my domain (erlanganswers.com) which the server can read, and one of these cookies contains a user id.&nbsp; In order to prevent malicious persons from faking authentication by setting the cookie values themselves, the cookies are signed via a secret key associated with the application.&nbsp; This signature can be verified by the server in order to reject cookies not produced by Facebook.</p> <p>The algorithm for <a href="http://wiki.developers.facebook.com/index.php/Verifying_The_Signature#Signatures_and_Facebook_Connect_Sites">verifying the signature</a> is described extensively on the Facebook developer wiki.&nbsp; Essentially all cookies prefixed by your application key are concatenated in sorted order, the secret key is appended, and the signature is the md5sum.&nbsp; The signature is contained in another cookie for verification.&nbsp;</p> <p class="codesnippet">-define (FB_APP_KEY, "YOURAPPKEY").<br />-define (FB_SECRET_KEY, "YOURSECRETKEY").<br /><br />check_fb_auth () -&gt;<br />&nbsp; case wf_platform:get_cookie (?FB_APP_KEY) of<br />&nbsp;&nbsp;&nbsp; undefined -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; false;<br />&nbsp;&nbsp;&nbsp; Sig -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FbMac = <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; erlang:md5 (<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; lists:sort (<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ [ Key, "=", Value ]<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; || Cookie &lt;- <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; string:tokens (wf_platform:get_header (cookie), "; "),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { ?FB_APP_KEY "_" ++ Key, Value } &lt;- <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ list_to_tuple (string:tokens (Cookie, "=")) ] <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ]<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ?FB_SECRET_KEY<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ]<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FbHexMac = [ C || &lt;&lt;N:4&gt;&gt; &lt;= FbMac, <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; H &lt;- io_lib:format ("~1.16.0b", [ N ]),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; C &lt;- H ],<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; if FbHexMac =:= Sig -&gt; <br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { ok, facebook, wf_platform:get_cookie (?FB_APP_KEY "_user") };<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; true -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; false<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end<br />&nbsp; end.</p> <p>How it works:</p> <ul> <li>I use the <a href="http://nitrogenproject.com/">Nitrogen web framework</a> for Erlang which provides an abstraction for extracting cookies and headers (wf_platform).</li> <li>The signature if present is contained in a cookie whose key is your application key: <span class="codesnippet">wf_platform:get_cookie (?FB_APP_KEY)</span>.</li> <li>All cookies prefixed by the application key and an underbar are concatenated in sorted order, the secret key appended, and the md5sum is computed. <ul> <li>Erlang, like C, will compile-time concatenate string constants together, so <span class="codesnippet">{ ?FB_APP_KEY "_" ++ Key, Value } &lt;-</span> selects only the Facebook Connect cookies.</li> <li>erlang:md5/1, like many Erlang functions, takes an iolist argument.&nbsp; We do not waste time flattening the list after constructing it.</li> </ul> </li> <li>The md5sum is converted to hex and compared with the signature.</li> </ul> <p>&nbsp;</p>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/facebook/VerifySignature.htmlSun, 21 Jun 2009 18:33:45 GMTmnesia/EqualsIgnoresCase.htmlhttp://erlanganswers.com/web/mcedemo/mnesia/EqualsIgnoresCase.html<h2>Mnesia Equals Ignores Case Query</h2> <h3>Table of Contents</h3> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#maintainthedatacanonicalized">Maintain the Data Canonicalized</a></li> <li><a href="#matchspecificationsolution">Match Specification Solution</a></li> <li><a href="#secondaryindexsolution">Secondary Index Solution</a></li> </ul> <h3><a name="introduction"></a>Introduction</h3> <p>Many more people know SQL than know Mnesia, so questions about how to translate a particular SQL construct into Mnesia arise frequently on the list, and in particular <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:44451:200906:dfjdpljnepgpjmofhilb">this question arose</a>:</p> <blockquote>Hi,<br /><br />I can't see any advice previously offered for the best way to query a<br />(large) mnesia table with a query comparable to "equals ignore case".<br /><br />My first thought is to storing an indexed lower or upper "case"<br />version of the data in an additional field, but with nearly 20 million<br />records that doesn't seem very attractive.<br /><br />TIA for any thoughts,<br /><br />Regards,<br />Steve<br /></blockquote> <p>There are several approaches which mimic approaches in relational databases.</p> <h3><a name="maintainthedatacanonicalized"></a><a name="maintainthedatacanonicalized"></a>Maintain the Data Canonicalized</h3> <p>One very simple solution is to maintain the column in canonical form so that no transformations need to be applied to compute constraints.&nbsp; If this is practical, this is a very efficient solution, <a href="http://oracle.ittoolbox.com/groups/technical-functional/oracle-dev-l/ignore-case-in-sql-2538037">not just for Mnesia</a>.</p> <h3><a name="matchspecificationsolution"></a>Match Specification Solution</h3> <p>This solution is relatively inefficient but general.&nbsp; This technique is analogous to an unoptimized SQL statement of the form</p> <p>SELECT * from foo WHERE upper(key) = 'VALUE'</p> <p>The pivotal step is to create a <a href="http://erlang.org/doc/apps/erts/match_spec.html">match specification</a> corresponding to the WHERE clause.&nbsp; Match specifications can only contain guards so the case folding has to be translated into a match specification. Ulf Wiger posted a <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:44453:200906:dfjdpljnepgpjmofhilb">reply</a> containing a pointer to software he wrote to convert (some) regular expressions into match specifications.&nbsp; This can be used to emulate LIKE where clauses using techniques similar to those listed below.</p> <p>I will hand-roll an implementatation of (ascii) case-folding into match specifications here.&nbsp; As an additional optimization, I will bind the first character of the constrained field in the pattern and issue multiple (up to 2) selects, which will be faster if the field compared is the primary key of an ordered_set table, but slower otherwise.&nbsp; Thus you might want to adapt the example given below to a single select.&nbsp; (It would of course be nice to detect this condition automatically and adapt. Query planning is one of the many things that come with most relational database engines).&nbsp; Because we issue multiple selects, the return value will be a list of list of results, rather than a list of results.</p> <p class="codesnippet">-module (testequalsignorecase).<br />-export ([ equals_ignores_case/3 ]).<br /><br />equals_ignores_case (Tab, Field, Value) -&gt;<br />&nbsp; Guard = case_fold_guard (Value),<br />&nbsp; FieldOffset = get_field_number (Tab, Field),<br />&nbsp; WildPattern = mnesia:table_info (Tab, wild_pattern),<br /><br />&nbsp;&nbsp;[mnesia:select (Tab,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ { setelement (FieldOffset, WildPattern, Vars),<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ Guard ],<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ '$_' ] } ])<br />&nbsp;&nbsp;&nbsp; || Vars &lt;- construct_vars (Value) ].<br /><br />case_fold_char (H) when H &gt;= $A, H =&lt; $Z -&gt; [ H, H + ($a - $A) ];<br />case_fold_char (H) when H &gt;= $a, H =&lt; $z -&gt; [ H + ($A - $a), H ];<br />case_fold_char (H) -&gt; [ H ].<br /><br />case_fold_guard ([]) -&gt; [];<br />case_fold_guard ([ _ | T ]) -&gt; case_fold_guard (T, 1, true).<br /><br />case_fold_guard ([], _N, Acc) -&gt; Acc;<br />case_fold_guard ([ H | T ], N, Acc) -&gt;<br />&nbsp; case case_fold_char (H) of<br />&nbsp;&nbsp;&nbsp; [ Singleton ] -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; case_fold_guard (T,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; N + 1,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { 'andalso',<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { '=:=', nth_variable (N), Singleton },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Acc });<br />&nbsp;&nbsp;&nbsp; [ Upper, Lower ] -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; case_fold_guard (T,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; N + 1,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { 'andalso',<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { 'or',<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { '=:=', nth_variable (N), Upper },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; { '=:=', nth_variable (N), Lower }<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; },<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Acc })<br />&nbsp; end.<br /><br />construct_vars ([]) -&gt;<br />&nbsp; [];<br />construct_vars ([ H ]) -&gt;<br />&nbsp; [ [ First ] || First &lt;- case_fold_char (H) ];<br />construct_vars ([ H | T ]) -&gt;<br />&nbsp; Vars = [ nth_variable (N) || N &lt;- lists:seq (1, length (T)) ],<br />&nbsp; [ [ First | Vars ] || First &lt;- case_fold_char (H) ].<br /><br />nth_variable (N) -&gt;<br />&nbsp; list_to_atom (lists:flatten (io_lib:format ("$~b", [ N ]))).<br /><br />get_offset ([ Field | _ ], Field, N) -&gt; N;<br />get_offset ([ _ | T ], Field, N) -&gt; get_offset (T, Field, N + 1).<br /><br />get_field_number (Tab, Field) -&gt;<br />&nbsp; get_offset (mnesia:table_info (Tab, attributes), Field, 2).</p> <p>Here's how it works:</p> <ul> <li><span class="codesnippet">equals_ignores_case/3</span> is the entry point, e.g., <span class="codesnippet">equals_ignores_case (foo, key, "hello")</span>.</li> <li><span class="codesnippet">case_fold_guard/1</span> computes the guard from the value.&nbsp; It ignores the first character of the value, and creates a conjunction of disjunctions for the rest of the characters.&nbsp; It uses <span class="codesnippet">'$1'</span> to refer to the second character of the field being matched, <span class="codesnippet">'$2'</span> to refer to the third, etc.</li> <li><span class="codesnippet">construct_vars/1</span> computes the pattern(s) from the value.&nbsp; It binds the first element to each case of the desired value, and binds subsequent elements to variables.&nbsp; This ensures variable <span class="codesnippet">'$1'</span> corresponds to the second character of the field being matched, <span class="codesnippet">'$2'</span> the third, etc.&nbsp; Because <span class="codesnippet">construct_vars/1</span> can return 0, 1, or 2 patterns, we end up doing 0, 1, or 2 selects.</li> <li>the patterns returned by <span class="codesnippet">construct_vars/1</span> are substituted into the wildcard pattern for the table at the field position to bind that field as desired, and the selects are issued.</li> </ul> <p>Here are some example invocations:</p> <p class="codesnippet">% erl <br />Erlang (BEAM) emulator version 5.6.5 [source] [async-threads:0] [kernel-poll:false]<br /><br />Eshell V5.6.5&nbsp; (abort with ^G)<br />1&gt; mnesia:start (), mnesia:create_table (foo, []), mnesia:dirty_write ({ foo, "hello", "world" }), mnesia:dirty_write ({ foo, "HeLlO", "WoRlD" }), mnesia:dirty_write ({ foo, "goodbye", "world" }), mnesia:dirty_write ({ foo, "123", "456" }).<br />ok<br />2&gt; mnesia:async_dirty (fun testequalsignorecase:equals_ignores_case/3, [ foo, key, "hello" ]).<br />[[{foo,"HeLlO","WoRlD"}],[{foo,"hello","world"}]]<br />3&gt; mnesia:async_dirty (fun testequalsignorecase:equals_ignores_case/3, [ foo, key, "hi" ]).<br />[[],[]]<br />4&gt; mnesia:async_dirty (fun testequalsignorecase:equals_ignores_case/3, [ foo, key, "goodbye" ]).<br />[[],[{foo,"goodbye","world"}]]<br />5&gt; mnesia:async_dirty (fun testequalsignorecase:equals_ignores_case/3, [ foo, key, "goodbYe" ]).<br />[[],[{foo,"goodbye","world"}]]<br />6&gt; mnesia:async_dirty (fun testequalsignorecase:equals_ignores_case/3, [ foo, key, "" ]).<br />[]<br />7&gt; mnesia:async_dirty (fun testequalsignorecase:equals_ignores_case/3, [ foo, key, "123" ]).<br />[[{foo,"123","456"}]]</p> <h3><a name="secondaryindexsolution"></a>Secondary Index Solution</h3> <p>Relational database practitioners will tell you that a select statement of the form</p> <p>SELECT * from foo WHERE upper(field) = 'VALUE'</p> <p>can be <a href="http://oracle.ittoolbox.com/groups/technical-functional/oracle-dev-l/ignore-case-in-sql-2538244">very inefficient</a>.&nbsp; One solution is to index the case folded version of the column.&nbsp; There are several ways to do this.&nbsp;</p> <p>With an equality condition such as above, Mnesia's built-in (bag) indices are sufficient; so one could add another column to the table which contains a canonically cased version of the original column, and an index lookup can be done on this auxiliary column.&nbsp; If space is an issue, the canonical casing function can return a hash which has a very high probability of not colliding (e.g., md5), which reduces the size of the auxiliary column.&nbsp;</p> <p>With an inequality condition such as</p> <p>SELECT * from foo WHERE upper (field) &gt;= 'VALUE'</p> <p>Mnesia's built-in indices are not powerful enough, and a secondary ordered_set table needs to be maintained manually.&nbsp; It would contain records of the form</p> <p class="codesnippet">-record (foo_upper_index, { canon_plus_key, void = [] }).</p> <p>When a record <span class="codesnippet">F</span> is written into the primary table, a record of the form <span class="codesnippet">#foo_upper_index{ canon_plus_key = { canonicalize (F#foo.field), F#foo.key } }</span> would be inserted into the secondary table.&nbsp; The secondary table can be queried efficiently for ranges of the canonicalized version of the field, and the resulting list of keys can be used to query the original table for records.</p> <p>&nbsp;</p>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/mnesia/EqualsIgnoresCase.htmlSat, 20 Jun 2009 15:10:56 GMTComplexCasePatterns.htmlhttp://erlanganswers.com/web/mcedemo/ComplexCasePatterns.html<h2>Complex Case Patterns</h2> <h3>Table of Contents</h3> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#disjunctiveguards">Disjunctive Guards</a></li> <li><a href="#universalclause">Universal Clause</a></li> <li><a href="#transformexpression">Transform Expression</a></li> </ul> <h3><a name="introduction"></a>Introduction</h3> <p>Erlang case statements have clauses that are guards, and since they can do pattern matching the case statement resembles conditional constructs from other languages.&nbsp; However the correspondence breaks down when complicated conditionals are required, especially attempts to share clause bodies against distinct deconstructors.&nbsp; Thus periodically someone emails the list with something like <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:44599:200906:mfagkdkncfhfhhlilhbb">the following</a>:</p> <blockquote>Hi All,<br /><br />How to formulate a "complex pattern" in a case clause ?<br /><br />case Expr of<br />&nbsp;&nbsp; a,b -&gt; block1; % if Expr = a or b then block1<br />&nbsp;&nbsp; c -&gt; block2 % if Expr = c then block2<br />end<br /><br />(a,b) gives an error<br />What is the correct syntax ?<br /><br />John<br /></blockquote> <p>The answer is complicated.&nbsp; Here are some suggestions offered.</p> <h3><a name="disjunctiveguards"></a>Disjunctive Guards</h3> <p>When the patterns being matched have identical structure but there are different possibilities for each component, guard disjunction can be useful.&nbsp; In this case it would become</p> <p><span class="codesnippet" style="font-family: times new roman,times;">case Expr of<br />&nbsp;&nbsp; X when X =:= a; X =:= b -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; block1;<br />&nbsp;&nbsp; c -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; block2<br />end</span></p> <p>However in a more complicated situation this approach breaks down.&nbsp; Suppose for instance we want to share a block between <span class="codesnippet">ok</span> and <span class="codesnippet">{ error, eexist }</span>, e.g., some of the possible return values of <span class="codesnippet">file:make_link/2</span>.&nbsp; In this case it becomes</p> <p><span style="font-family: times new roman,times;">case Expr of<br />&nbsp;&nbsp; X when X =:= ok; is_tuple (X), size (X) =:= 2, element (1, X) =:= error, element (2, X) =:= eexist -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; block1;<br />&nbsp;&nbsp; _ -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; block2 % error we don't like has occurred<br />end</span></p> <p>which isn't going to win any beauty contests.</p> <h3><a name="universalclause"></a>Universal Clause</h3> <p>In the original example, we could match on <span class="codesnippet">c</span> and then assume everything else is either <span class="codesnippet">a</span> or <span class="codesnippet">b</span>.</p> <p><span style="font-family: times new roman,times;">case Expr of<br />&nbsp;&nbsp; c -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; block2;<br />&nbsp;&nbsp; _ -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; block1 % assumes (not c) implies (a or b): different than original!<br />end</span></p> <p>This is a <strong>bad idea</strong> for the original poster, because it is not semantically equivalent.&nbsp; The original post would crash if <span class="codesnippet">Expr</span> was not one of (<span class="codesnippet">a</span>, <span class="codesnippet">b</span>, or <span class="codesnippet">c</span>), whereas this case statement will accept any expression.&nbsp; However in some situations this can be a good idea, such as when there are only a small number of "acceptable" return values from a function and anything else is to be considered an error.</p> <h3><a name="transformexpression"></a>Transform Expression</h3> <p>The idea here is to use an auxiliary function to map the original expression into a set of equivalence classes which are easily managed by the case statement.&nbsp;</p> <p><span style="font-family: times new roman,times;">case my_type (Expr) of<br />&nbsp;&nbsp; class1 -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; block1;<br />&nbsp;&nbsp; class2 -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; block2<br />end</span></p> <p>with auxiliary function</p> <p><span class="codesnippet">my_type (a) -&gt; class1;<br />my_type (b) -&gt; class1;<br />my_type (c) -&gt; class2.</span></p> <p>For simple examples such as this one this seems overkill, but for complicated examples involving matching using distinct deconstruction clauses, this can be a win.&nbsp; For instance with the <span class="codesnippet">ok</span> and <span class="codesnippet">{ error, eexist }</span> example from above</p> <p><span style="font-family: times new roman,times;">case exists_ok (file:make_link (Existing, New)) of<br />&nbsp;&nbsp; ok -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; block1;<br />&nbsp;&nbsp; { error, BadError } -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; block2<br />end</span></p> <p>with auxiliary function</p> <p><span class="codesnippet">exists_ok ({ error, eexist }) -&gt; ok;<br />exists_ok (X) -&gt; X.<br /></span></p> <p>seems nicer than attempting the same with guard disjunction.</p>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/ComplexCasePatterns.htmlWed, 17 Jun 2009 22:10:45 GMTVersionedVariables.htmlhttp://erlanganswers.com/web/mcedemo/VersionedVariables.html<h2>Versioned Variables</h2> <h3>Table of Contents</h3> <ul> <li><a href="#introduction">Introduction</a></li> <li><a href="#numberedvariables">Numbered Variables</a></li> <li><a href="#functionarguments">Function Arguments</a></li> <li><a href="#introducenewscope">Introduce New Scope</a> <ul> <li><a href="#namedfunctions">Named Functions</a></li> <li><a href="#listcomprehensions">List Comprehensions</a></li> <li><a href="#anonymousfunctions">Anonymous Functions</a></li> </ul> </li> <li><a href="#letconstruct">Let Construct</a></li> <li><a href="#folding">Folding</a></li> </ul> <h3><a name="introduction"></a>Introduction</h3> <p>Erlang doesn't have an assignment operator, but it has a pattern matching operator that looks like one.&nbsp; The first time a variable is used in a pattern match it is bound, and subsequent uses of the variable become a matching assertion.&nbsp; This is different from the typical imperative language (which has destructive assignments) and the typical functional language (which has a let construct).</p> <p>Thus periodically a question arises on the list regarding what to do about a sequence of statements for which temporary intermediates are uninteresting, like <a href="http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:44462:200906:jollkhpiabpdbdpbhjoc">the following</a>:</p> <blockquote>Hello!<br /><br />I think there wasn't any grumbling this month about the immutable<br />local variables in Erlang, so here's real world code I've found just<br />today:<br /><br /> <span style="font-family: terminal,monaco;">% Take away underscore and replace with hyphen<br /> MO1 = re:replace(MO, "_", "-", [{return, list}, global]),<br /> MO2 = toupper(MO1),<br /> % Replace zeros<br /> MO3 = re:replace(MO2, "RX0", "RXO", [{return, list}, global]),<br /> % Insert hyphen if missing<br /> MO5 = case re:run(MO3, "-", [{capture, none}]) of<br />&nbsp;&nbsp; nomatch -&gt;<br /> &nbsp;&nbsp;&nbsp;&nbsp; insert_hyphen(MO3);<br />&nbsp;&nbsp; match -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp; MO3<br /> end,</span><br /><br />I think it's fairly clumsy to use MOx (MO for managed object) in the<br />code. MO4 was removed during the regexp-&gt;re refactoring step. How to<br />eliminate the versioned variable names? The<br />MOAfterUnderscoresWereReplaced, UpperCaseMO, MOAfterRX0WasReplaced,<br />etc. variablenames are really ugly. It used to use regexp, so at that<br />point it wasn't possible to easily nest the whole into one call, but<br />that would be still ugly. So any other ideas?<br /></blockquote> <p>We can see the poster is intentionally trying to stir up the hornet's nest.&nbsp; Kudos!&nbsp; Without further ado, the proposed solutions:</p> <h3><a name="numberedvariables"></a>Numbered Variables</h3> <p>This is the "original" solution: use different variable names for the different intermediate values.&nbsp; This solution has some proponents, but mostly everybody agrees that it sucks.&nbsp; For instance, it's very easy to unintentionally use a previously computed temporary value.&nbsp; In the above example a maintenance programmer might have put</p> <p><span class="codesnippet">nomatch -&gt;<br /> &nbsp;&nbsp;&nbsp;&nbsp; insert_hyphen(MO2);</span></p> <p>Which would compile just fine and cause a bug.</p> <h3><a name="functionarguments"></a>Function Arguments</h3> <p>Arguments to functions can be expressions, in which case anonymous temporaries are introduced.&nbsp; This can be good for short sequences such as <span style="font-family: times new roman,times;">h (g (f (x)))</span> but would be unwieldy for the poster's original example</p> <p><span class="codesnippet">maybe_insert_hyphen (<br />&nbsp;&nbsp; re:replace (<br />&nbsp;&nbsp; &nbsp; toupper (<br />&nbsp;&nbsp;&nbsp; &nbsp;&nbsp; re:replace (MO, "_", "-", [{return, list}, global])<br />&nbsp;&nbsp;&nbsp;&nbsp; ),<br />&nbsp;&nbsp;&nbsp;&nbsp; "RX0",<br />&nbsp;&nbsp;&nbsp;&nbsp; "RXO",<br />&nbsp;&nbsp;&nbsp;&nbsp; [{return, list}, global]<br />&nbsp;&nbsp; )<br />)</span></p> <p>and would also require defining a helper function maybe_insert_hyphen/1</p> <p><span class="codesnippet">maybe_insert_hyphen (X) -&gt;<br />&nbsp;&nbsp; case re:run (X, "-", [{capture, none}]) of<br />&nbsp; &nbsp; &nbsp; nomatch -&gt;<br /> &nbsp;&nbsp; &nbsp; &nbsp; &nbsp; insert_hyphen (X);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; match -&gt;<br />&nbsp;&nbsp;&nbsp; &nbsp; &nbsp;&nbsp; X<br />&nbsp;&nbsp; end.</span></p> <p>in order to avoid recomputing the temporary unnecessarily.</p> <h3><a name="introducenewscope"></a>Introduce New Scope</h3> <h4><a name="namedfunctions"></a>Named Functions</h4> <p>Functions introduce a new scope, so judicious use of functions can eliminate temporaries.</p> <p class="codesnippet">% Take away underscore and replace with hyphen<br />doit_stage1 (X) -&gt; doit_stage2 (re:replace (X, "_", "-", [{return, list}, global])).</p> <p class="codesnippet">doit_stage2 (X) -&gt; doit_stage3 (toupper (X)).</p> <p class="codesnippet">% Replace zeros<br />doit_stage3 (X) -&gt; doit_stage4 (re:replace (X, "RX0", "RXO", [{return, list}, global])).</p> <p class="codesnippet">% Insert hyphen if missing<br />doit_stage4 (X) -&gt;<br />&nbsp; case re:run (X, "-", [{capture, none}]) of<br />&nbsp;&nbsp;&nbsp; nomatch -&gt;<br /> &nbsp;&nbsp; &nbsp;&nbsp; insert_hyphen (X);<br />&nbsp; &nbsp; match -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; X<br />&nbsp; end.</p> <p class="codesnippet">In this case the result is not very compelling.</p> <h4 class="codesnippet"><a name="listcomprehensions"></a>List Comprehensions</h4> <p class="codesnippet">List comprehensions also introduce new scope so this could be done as</p> <p class="codesnippet">hd ([X || X &lt;- [re:replace (MO, "_", "-", [{return, list}, global])],<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; X &lt;- [toupper (X)],<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; % Replace zeros<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; X &lt;- [re:replace (X, "RX0", "RXO", [{return, list}, global])],<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; % Insert hyphen if missing<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; X &lt;- [case re:run (X, "-", [{capture, none}]) of<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;&nbsp; nomatch -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; insert_hyphen (X);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp;&nbsp; match -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; X<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end]<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ])</p> <p>which is not in danger of winning any beauty contests, but is arguably a superior amalgamation of the <a href="#folding">folding technique</a> and the <a href="#anonymousfunctions">anonymous functions technique</a>.&nbsp; In particular unlike several solutions here this one is not "backwards": computations are listed in the order performed.</p> <h4><a name="anonymousfunctions"></a>Anonymous Functions</h4> <p>Anonymous functions also introduce new scope so this could be done as</p> <p><span class="codesnippet" style="font-family: times new roman,times;">(fun (X) -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp; case re:run (X, "-", [{capture, none}]) of<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; nomatch -&gt;<br /> &nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; insert_hyphen (X);<br />&nbsp; &nbsp; &nbsp; &nbsp; match -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; X<br />&nbsp;&nbsp;&nbsp;&nbsp; end<br /> end) (<br />&nbsp; (fun (X) -&gt;<br />&nbsp; &nbsp;&nbsp; &nbsp; re:replace (X, "RX0", "RXO", [{return, list}, global])<br />&nbsp;&nbsp; end) (<br />&nbsp;&nbsp;&nbsp;&nbsp; toupper (re:replace (MO, "_", "-", [{return, list}, global]))<br />&nbsp;&nbsp; )<br />)</span></p> <p>which is also not very compelling, but this form is interesting because it is essentially a let construct expanded into lambda constructs.</p> <h3><a name="letconstruct"></a>Let Construct</h3> <p>Erlang doesn't have a native let construct but a <a href="http://dukesoferl.blogspot.com/2009/06/let-parse-transform.html">parse transform</a> can be used to expand something that looks like a let construct into an equivalent sequence of lambda constructs.&nbsp; The result looks like</p> <p><span class="codesnippet" style="font-family: times new roman,times;">let_ (<br />&nbsp; % Take away underscore and replace with hyphen<br />&nbsp; MO = re:replace (MO, "_", "-", [{return, list}, global]),<br />&nbsp; MO = toupper (MO),<br />&nbsp; % Replace zeros<br />&nbsp; MO = re:replace (MO,"RX0","RXO",[{return, list}, global]),<br />&nbsp; % Insert hyphen if missing<br />&nbsp; case re:run(MO, "-", [{capture, none}]) of<br />&nbsp;&nbsp;&nbsp;&nbsp; nomatch -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; insert_hyphen(MO);<br />&nbsp;&nbsp;&nbsp;&nbsp; match -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MO<br />&nbsp; end<br />) </span></p> <p>which looks nice enough, but parse transforms cannot change the syntax of the language, just the semantics.&nbsp; Thus the let_ usage must look like a function call, and for other examples this would mean using <span style="font-family: times new roman,times;">begin ... end</span> blocks to make sequences of expressions act like a single expression.</p> <h3><a name="folding"></a>Folding</h3> <p>A fold can be thought of as threading state through a sequence of computations, and given a list of lambda expressions this is very flexible.&nbsp; The poster's example becomes</p> <p><span class="codesnippet" style="font-family: times new roman,times;">lists:foldl (fun (Fun, Acc) -&gt; Fun (Acc) end,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; MO,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [ fun (X) -&gt; re:replace (X, "_", "-", [ { return, list }, global ]) end,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fun toupper/1,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fun (X) -&gt; re:replace (X, "RX0", "RXO", [ { return, list }, global ]) end,<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; fun (X) -&gt; case re:run (X, "-", [ { capture, none } ]) of<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; nomatch -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; insert_hyphen (X);<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; match -&gt;<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; X<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; end ])</span></p> <p>This is neither pretty nor concise, but it does make maintaining a sequence of operations less error prone (i.e., inserting or removing a computation step).&nbsp; With several temporaries it's necessary to fold with a tuple accumulator and the ugliness goes up.</p>paul-erlanganswers@mineiro.com (Paul Mineiro)http://erlanganswers.com/web/mcedemo/VersionedVariables.htmlWed, 17 Jun 2009 18:21:22 GMT