SEATS ARE SELLING OUT - SIGN UP BEFORE IT'S TOO LATE!
SEARCH
 

Hackity Hackity Hackathon!

In just two weeks, a flock of PHP developers will descend on Rosemont, IL for php|tek. In between the presentations watched, the Twitter handles followed, and the pints of Guinness consumed, there will be lots of ideas traded. Ideas are great. They can give you solutions, new perspectives, and new directions in your own projects, but they’re not enough.

We know from thousands of startups and projects.. Execution is everything.

On Thursday evening, we’re running the 2nd annual php|tek Hackathon. This year it’s sponsored by our friends at Tropo. The goal is to give you a chance to try out some of the ideas you’ve heard about and potentially help a project that you already use. This time around, we have leaders & community members from a number of projects available to help, direct, and generally talk about show what they’re doing.

We have quite a few groups represented so far:

Tropo – a Cloud API for adding voice & SMS communications to your applications – represented by Product Manager Adam Kalsey.

CouchDB – a document-oriented NoSQL database solution – represented by CouchBase User Advocate Benjamin Young.

Frapi – an API framework for building RESTful applications – represented by project creators David Coallier and Helgi Pormar.. Tormar.. we’ll just go with Helgi.

Gowalla PHP Library – a way to interact with the Austin-based location based service – represented by Michelangelo van Dam.

JoindIn – a community-driven site focused on bringing together the people sharing the knowledge with the ones giving feedback – represented by Lorna Jane Mitchell.

Node.js – an event-driven I/O framework for the V8 JavaScript engine – represented by Node.js author Travis Swicegood.

Phergie – an open source IRC bot written in PHP 5 – represented by Project Lead Matthew Turland.

PEAR – PHP Extension and Application Repository – represented by PEAR President David Coallier.

PHP Test Fest – a small group of developers whose primary goal is to support the PHP core – represented by Michelangelo van Dam and Rafael Dohms,.

Spaz – the microblogging client for Twitter, Identica, & Laconica – represented by Project Lead Ed “Funkatron” Finkler.

web2project – the stupendous web-based project management system – represented by the Testing King Trevor Morse and Project Lead Keith Casey.*

Windows Azure – Windows-based cloud hosting platform – represented by Developer Evangelist Peter Laudati.

Zend Framework – one of the leading PHP frameworks out there – represented by Michelangelo van Dam.

No, you don’t have to work on one of the projects represented.

We encourage project members and leaders to attend and direct and focus your efforts so you can accomplish something. Sometimes starting is the hardest step, so we’re all there to help. If you want to try out your own ideas, please drop me a note – keith @ blue parabola.com – and we’ll make sure everyone knows about it.

*  I may be biased about web2project.. I’m the project lead. ;)

Xapian More Like This In PHP

Originally posted at: PHP/ir

For my own benefit, if nothing else, since I keep seeming to need this snippet of code, I thought I’d encapsulate a Xapian More Like This/Find Similar example in a very brief blog post.

The code is stolen out of my own Habari MultiSearch plugin:

The code is very simple once you get beyond the clunkiness of the Xapian API. We create a relevance set for the document we want to find similar ones to (that’s the one with id = $search_id), and from that create an eset of the most important terms (Xapian does the heavy lifting here). We then build a new search query out of those, and do a regular query, remember to discard our original document from the search results. Not the slickest solution, but it works!

The basic idea here is actually pretty much the same in most MLT implementations – what we’re losing due to the way we’re adding the terms is any degree of weight – that term N is more important than term N+1 to the document. Some implementations let you control whether or not those weights have any effect – in Solr mlt.boost will either include or discard the weighting depending on whether it’s on or off.

Original Post

Benford’s Law

Originally posted at: PHP/ir

Benfords Law is not an exciting new John Nettles based detective show, but an interesting observation about the distribution of the first digit in sets of numbers originating from various processes. It says, roughly, that in a big collection of data you should expect to see a number starting with 1 about 30% of the time, but starting with 9 only about 5% of the time. Precisely, the proportion for a given digit can be worked out as:

<?php
function benford($num) {
        return log10(1+1/$num);
}

Real data does tend to fit this pretty well. For example, just leaping onto data.gov.uk at random and grabbing a dataset – in this case a list of spending in the Science and Technology Facilities Council, I can compare the first digit to Benford’s expected ones (I grabbed the Amount column out of the april 2010 data and put it into a text file, one amount per line):

<?php
$fh = fopen("data.txt", ‘r’);
$score = array();
$total = 0;
$nums = range(1, 9);
// Count up appearances of digits
while($data = fgets($fh)) {
        $total++;
        $digit = substr(trim($data), 0, 1);
        if(!in_array($digit, $nums)) {
                continue;
        }
        if(!isset($score[$digit])) {
                $score[$digit] = 0;
        }
        $score[$digit]++;
}
arsort($score);
echo "# – Data  - Benford", PHP_EOL;
foreach($score as $digit => $count) {
        echo    "$digit – ",
                number_format($count/$total, 3),
                " – ",
                number_format(benford($digit), 3),
                PHP_EOL;
}

We get a pretty clear match:

# - Data  - Benford
1 - 0.273 - 0.301
2 - 0.181 - 0.176
3 - 0.114 - 0.125
4 - 0.107 - 0.097
5 - 0.088 - 0.079
6 - 0.070 - 0.067
7 - 0.055 - 0.058
8 - 0.050 - 0.051
9 - 0.047 - 0.046

Graph of the STFC versus Beford's Law

This is fun, because if someone makes up a data set, it probably wont follow this distribution. This is used in accountancy to detect fraudulent entries. If there is a reporting limiting at £3000 within a certain company where fraud is going on, there will probably be more dodgy transactions at £2999, for example, which will throw off the stats. More advanced checking actually goes further into the digits rather than just considering the initial one. As always, there’s plenty more on the law on Wikipedia.

Original Post

Conferences (with Discount!)

Once again, a complete lack of new content on this blog is marginally explained by conference activity. I recently spoke about different deployment options at the ThinkVitamin Code Management & Deployment online conference, if you’re interested check the Deployment Tactics slides on slideshare.

In a few weeks time I’ll be talking about ZeroMQ at the excellent PHP UK Conference in London on the 25th of Feb, and if you haven’t already got a ticket you can get one for £20 off thanks to a PHPUK followers discount. Search fans may be interested in talks on Elastic Search, by @a, hadoop by @dzuelke, and a NoSQL comparison by @lorenzoalberton.

After that, it’s off to Canadia in March to the monstrous ConFoo, where I’ll be talking about doing Solr properly (and Andrei is doing his Elastic Search talk again, for double the Lucene pleasure), then in May to Chicago for PHP Tek, where I’ll be talking about ZeroMQ, Debugging, and Finding Fraudsters (with the help of some machine learning techniques).

I am speaking at ConFoo Web Techno Conference. March 9th to 11th 2011. Montreal

Using your own View object with Zend_Application

Originally posted at: Rob Allen’s DevNotes

Let’s say that you want to use your own view object within your Zend Framework application.

Creating the view object is easy enough in library/App/View.php:

class App_View extends Zend_View
{
    // custom methods here
}

along with adding the App_ namespace to the the autoloader in application.ini:

autoloadernamespaces[] = "App_"

All we need to now is get Zend_Application to bootstrap with our new view class. There are two ways of doing this: within Bootstrap.php or using a custom resource.

_initView() in Bootstrap.php

At first blush, the code looks quite easy. In application/Bootstrap.php, we add our own method that creates the view object and assigns it to the viewRenderer:


class Bootstrap extends Zend_Application_Bootstrap_Bootstrap
{
    protected function _initView()
    {
        $view = new App_View();

        $viewRenderer = new Zend_Controller_Action_Helper_ViewRenderer();
        $viewRenderer->setView($view);
        Zend_Controller_Action_HelperBroker::addHelper($viewRenderer);
        return $view;
    }
}

As we have named the method _initView(), our method will take precedence over the built in View resource and be used instead. However, this implementation will ignore any view options that are configured in application.ini using the resources.view key, so a better method is this:

class Bootstrap extends Zend_Application_Bootstrap_Bootstrap
{
    protected function _initView()
    {
        $resources $this->getOption('resources');
        $options = array();
        if (isset($resources['view'])) {
            $options $resources['view'];
        }
        $view = new App_View($options);

        if (isset($options['doctype'])) {
            $view->doctype()->setDoctype(strtoupper($options['doctype']));
            if (isset($options['charset']) && $view->doctype()->isHtml5()) {
                $view->headMeta()->setCharset($options['charset']);
            }
        }
        if (isset($options['contentType'])) {
            $view->headMeta()->appendHttpEquiv('Content-Type'$options['contentType']);
        }
        
        $viewRenderer = new Zend_Controller_Action_Helper_ViewRenderer();
        $viewRenderer->setView($view);
        Zend_Controller_Action_HelperBroker::addHelper($viewRenderer);
        return $view;
    }

}

This version takes into account your configuration settings and behaves the same as the View resource provided by Zend Framework. The only difference is that we’re now using App_View.

Custom resource

Another option is to override Zend_Application_Resource_View with our own view resource. In this case, we create a class called App_Resource_View stored in library/App/Resource/View.php. We only need to override one method, getView():

class App_Resource_View extends Zend_Application_Resource_View
{
    public function getView()
    {
        if (null === $this->_view) {
            $options $this->getOptions();
            $this->_view = new App_View($options);

            if (isset($options['doctype'])) {
                $this->_view->doctype()->setDoctype(strtoupper($options['doctype']));
                if (isset($options['charset']) && $this->_view->doctype()->isHtml5()) {
                    $this->_view->headMeta()->setCharset($options['charset']);
                }
            }
            if (isset($options['contentType'])) {
                $this->_view->headMeta()->appendHttpEquiv('Content-Type'$options['contentType']);
            }
        }
        return $this->_view;
    }
}

Essentially, all I have done is replace the class of the view object to be App_View and left everything else alone so that it behaves the same as the default View resource.

To get Zend_Application to load our custom resource, we just add one line to application.ini:

pluginPaths.App_Resource "App/Resource"

We now have a reusable resource that will load our own View class and can easily take it from project to project.

Original Post

January TriPUG – Object Oriented PHP

I had the pleasure of giving a talk at the January meetup of the Triangle PHP User Group on Object Oriented PHP.  Here are my slides for those that attended: