Ruby, iOS, and Other Development

A place to share useful code snippets, ideas, and techniques

All code in posted articles shall be considered public domain unless otherwise noted.
Comments remain the property of their authors.

2008-01-07

Randomizing an Array Revisited

It was pointed out in a comment on my post about randomizing arrays in Ruby that the sort_by{rand} is O(n log n), and it can be done in linear time, i.e. O(n). This is, of course, correct. Efficiency wasn't my primary concern in the original post, so much as a quick and easy to remember solution. That said, it's worth presenting the more complicated but more efficient algorithm.

I could just give the link to the blog post linked in the comment, but for the convenience of the reader I'll repeat the solution here (with minor changes that make me happier without changing the algorithm):

class Array
  def shuffle
    array = dup
    size.downto 2 do |j|
      r = rand j
      array[j-1], array[r] = array[r], array[j-1]
    end
    array
  end
end

Enjoy!

Labels: ,

2008-01-01

Do You Understand What Your Web Framework is Doing?

It's a new year, and I'm going to start off 2008 wrong with a code-free post. Sorry about that. This stems from realizing how little many developers (judging from postings to a variety of mailing lists) seem to understand about what their web frameworks do for them when it comes to generating code in other languages (particularly JavaScript and SQL). It's Rails-flavored, but not Rails-specific.

Here's a quick quiz. I'm assuming that you, the reader, are a web developer familiar with JavaScript, SQL, and some reasonably modern web app framework:

  1. Does JavaScript validation on a web form guarantee that when the form is submitted the server will receive valid data?
  2. Should foreign key columns each get their own index?
  3. How is JSON parsed into data structures in memory in a browser?
  4. Are multi-table joins inefficient?
  5. Can a web page make requests to a host other than the host from which the page itself was requested?

We'll come back to that. I am going to start by talking about a web browser (client) interacting via HTTP(S) with a web server. There are three pieces here, not two. The HTTP protocol matters since it is easy to work with and understand and there are lots of tools for working with it. There are some important differences between the web client/server environment and a more traditional client/server system:

  • Connections are not persistent, and consist of only a single request and response. (Note that HTTP keep-alive does not change this; what persists is the TCP connection, and does not affect the application layer.)
  • Interaction can only be initiated by the client, not the server. This is a result of the previous difference.
  • The server cannot assume anything about the data received from the client.

Most people developing web sites/applications think in terms of the server software they are developing. Much as first-time GUI developers often find it baffling, the inversion of control involved in modern web programming confuses many developers. The server has full control when responding to a request but, once it has generated that response, control reverts to the client. For one thing, that means that data on the client does not get to the server unless the client decides to send it. It also means that data from the server does not get to the client unless the client decides to request it. One needs to work from the point of view of the user in front of the browser.

A common question on the Rails mailing list is how to use RJS to retrieve some value from the client. While the desire isn't ridiculous, and it can be done in a roundabout way with a certain amount of jumping through hoops, phrasing the question that way shows a lack of understanding of where the RJS-generated code will be executing. (The hoop jumping involves having the RJS generate an AJAX request back to the server to submit the value back to some URL on the server.)

There was a recent thread on the Rails list complaining about the functionality in Rails (largely RJS and various helpers) that attempts to hide the complexity of interactions between client-side and server-side code and largely results in maintainability problems in the code and misunderstandings for the developer. I don't agree with everything in either the original post or the various responses, but it highlights a problem Assaf identified months ago.

Assaf is concerned with bad (inefficient and/or incorrect) SQL being generated because the developer doesn't understand what the framework is doing underneath. I'm concerned about bad (incorrect, unmaintainable, and/or hard to debug) JavaScript being generated. Rails makes it easy to get results without understanding what it is doing for you, which is great for prototyping and dangerous for production.

It is important to understand what code is being executed where, when, and how. When developing a rich user experience in a web browser, one must understand the DOM, the event model, the browser security model, the JavaScript language, the single-threaded nature of JavaScript execution in the browser, XMLHttpRequest, etc. just as one must understand database indexing, column types, SQL, table/row locking, etc. to develop a production-quality database-backed web application.

Let's go back to that quiz. You shouldn't have had to think too hard about any of these, and you should feel certain about your answers. And those answers should be:

  1. Does JavaScript validation on a web form guarantee that when the form is submitted the server will receive valid data? Nope. The server receives data over HTTP, and that HTTP connection could come from any program, not just a web browser. Furthermore, JavaScript can be turned off in most browsers. On top of that, most browsers make it possible to mess with the web page live and/or the data being submitted. Client-side validation is a user interface nicety, but provides no guarantees about the data the server sees.
  2. Should foreign key columns each get their own index? Sometimes. It depends very much on what queries will involve them. A join table (i.e. one with more than one foreign key that represents a many-to-many relationship between tables) usually benefits from an index on all foreign keys, sometimes even multiple indices of the same columns in different orders. Tables only queried by columns other than their foreign keys generally don't benefit from indexing those foreign keys, even if the table is usually joined against the tables to which those foreign keys refer.
  3. How is JSON parsed into data structures in memory in a browser? Since JSON is JavaScript, it is executed with eval() to be parsed with JSON.parse() into memory in the browser.
  4. Are multi-table joins inefficient? This depends on the number of tables, available indices, and the database engine. Joining 18 tables in MySQL can make the query optimizer hang for hours (that's the query optimizer, not executing the query), regardless of available indices on the tables involved. A query on tables lacking indices on appropriate columns will require full table scans in any database, which is always slow (unless the unindexed tables have very few rows). It is always worth asking your database engine to explain and profile the queries you'll be running. Incidentally, database logs from running your unit/functional/integration/whatever tests are a great place to start.
  5. Can a web page make requests to a host other than the host from which the page itself was requested? Yes, but not with XMLHttpRequest. At the simplest level, an img tag makes a request from any arbitrary URL, though the response is not available to JavaScript. To interact with a different host with almost the same flexibility as an XMLHttpRequest, one uses a script tag. See this blog post for a discussion.

How did you do? If you didn't get them all, you need to keep learning. If you got them all right, don't get too cocky; you may still not know everything you need to know to avoid the pitfalls of a code-generating framework. I keep learning about things I thought I knew thoroughly, and I wouldn't have it any other way. Enjoy!

Labels: , , ,