Sunday, April 21, 2013

inside-outside algorithm on PCFG

The role of a grammar or automata is to reduce the uncertainty inherent in the identification of the input data.

We use trivial example to demonstrate the  inside-outside algorithm.
Suppose we have data \(\mathbf{O}=\{a,a,b,a,a\}\) generated from generative probabilistic context-free grammar \(\mathbf{G}\) model. Assume we know the grammar \(\mathbf{G}\) , then how does the inside-outside algorithm works?
The grammar  \(\mathbf{G}\)  can be intuitively described as
\[S \to AbA \\
A \to aA | a\]

Wednesday, January 23, 2013

Home Network

I recently figure out that the FIOS's IP address is almost for sure not change as long as the router is working properly. Consider my personal requirements and the FIOS quality, use my own computer as a server seems attractive for me. So here I will introduce my home internet topology and the services that set up by myself.
Generally, the following steps are necessary, 
  1. Set your domain name to your home FIOS IP address. 
  2. The crappy FIOS router is not stable, I use secondary router Linksys E3000 with Tomato Firmware. And set E3000's IP address from FIOS to 192.168.1.2.
  3. Setting E3000's IP to different IP segment, e.g 192.168.2.1. And Port forward the port 20(ssh) and 80(http) from FIOS to E3000
  4. Install a Linux Server on your old and unused laptop (for me, Thinkpad X61s, 2*1.6GHz, 2G RAM, around 20W) as the home server. 
  5. Allocate a static IP 192.168.2.* for X61s from E3000, and set DNS in Tomato if necessary.
  6. Port forward the E3000's port 80 and 22 to the X61s.
  7. Keep doing step 4-6 to set up the other devices in your home network. 
By using above methods, your home server's 80 and 22 ports will be public on the outside internet. Plus you pay around $5/year (except FIOS) and use your never used Laptop to provide professional internet service.  Now you just set proper service on those ports to perform regular service. For me, the wordpress, gitolite and one node.js application has be deployed in the home server.

Sunday, November 11, 2012

PUT Request in web2py

To my surprise, the parameters in PUT request are not well parsed at web2py. As you probably already knows that request.get_vars and request.post_vars are the most convenient way to access the request parameter. Then what about PUT request? request.put_vars? The answer is no. However, good news is that they are still available in request object.
So here is my dirty method, to make it work,
# Xingzhong for yesgoody
# Run @ web2py
import urlparse
put_vars = dict(urlparse.parse_qsl(request.body.read()))

Saturday, September 22, 2012

Post JSON object to Google App Engine

Using GAE as the server to accept JSON data is a interesting AJAX example. Here is a sample code that I write for yesgoody.

Requirements:
My Google App Engine (Python) want accept JSON post input, then process, finally given it back to the client. Here the client means a jquery page.
Here is my code for server side,
# Xingzhong for yesgoody
# Run @ GAE Server
import webapp2
import json
import datetime
class MainHandler(webapp2.RequestHandler):
    def get(self):
        self.error(403) #access denied 
        
    def post(self):
        req = json.loads(self.request.body)
        res = {"time":datetime.datetime.now().isoformat()}
        res['data'] = req
        self.response.headers['Content-Type'] = 'application/json'
        self.response.out.write(json.dumps(data))

Here is a simple jquery to demonstrate the functionality,
/*yesgoody debug for post json to server*/
function submit(){
 postdata = new Object();
 postdata.email = "xu@yesgoody.com";
 postdata.name = "Xingzhong";
 $.ajax({url: '/comm',
         type: 'POST',
         contentType: 'application/json',
  data: JSON.stringify(postdata),
  dataType: 'json',
  success: function(response) {
          var output = JSON.stringify(response); 
   console.log(output);
   }
  });
}

Wednesday, August 8, 2012

Grammar Learning (1)

I will consider the task that unsupervised learning probabilistic context-free grammar (PCFG) from data. The following toy example will demonstrate where I should start up.
Suppose given the following 1-D discrete sequence
\[S = abcabcab\]
What is the proper PCFG $G$ can generalized above $S$?
Intuitively, we could infer such grammar $G$ is,
\[ T_1 \implies ab\]
\[ T_2 \implies T_1cT_2 | T_1 \]
Then we can generalize $S$ as,
\[ S \implies T_2 \implies T_1cT_2  \implies T_1cT_1cT_2 \implies T_1cT_1cT_1 \implies abcabcab\].
We can use parsing tree to visualize it as, [S [T_2 [T_1 [a] [b]] [c] [T_2 [T_1 [a] [b]] [c] [T_2 [T_1 [a] [b]]]]]]


Obviously, parsing as well as grammar is not unique for $S$. We could easily generalize grammar $G'$ that,
\[ T_1 \implies ab \]
\[ T_2 \implies T_1T_2T_1|cT_1c \]
Then we can generalize $S$ as,
\[ S \implies T_2 \implies T_1T_2T_1 \implies T_1cT_1cT_1 \implies abcabcab\]
Graphically, it will be, [S [T_2 [T_1 [a] [b]] [T_2 [c] [T_1 [a] [b]] [c]] [T_1 [a] [b]]]]

Though there will be multiple grammar that may fit this data, let's only focus on above two. The natural question is which is better for data $S$?
By answering this question, let's firstly take a look at the grammar itself.  What kinds of sequence $S$ can be generalized based on $G$ and $G'$.
By applying $G$, we can write
\[ ab c ab c ab c ab c ... ab c ab \]
On contrast, the $G'$ can be generalized to
\[ ab ab ... ab c ab ... ab ab\]
Therefore, we can easily distinguish the better model from generative perspective.

In the following, we use python code to demonstrate the above parsing problem.
We combines the above grammars to a single one in cst_grammar.
Therefore, if everything OK, the code should generates the above parsing tree.


import nltk

cst_grammar = """
    S -> T2 [0.5] | T3 [0.5]
    T1 -> A B [1.0]
    T2 -> T1 T2 T1 [0.5] | C T1 C [0.5]
    T3 -> T1 C T3 [0.5] | T1 [0.5]
    A -> 'a' [1.0]
    B -> 'b' [1.0]
    C -> 'c' [1.0]
"""

grammar = nltk.parse_pcfg (cst_grammar)
case = list ('abcabcab')
parser = nltk.InsideChartParser (grammar)
trees = parser.nbest_parse (case)

for tree in trees:
    print tree.pprint (parens = '[]')
    tree.draw()


The two output trees
[S
  [T2
    [T1 [A a] [B b]]
    [T2 [C c] [T1 [A a] [B b]] [C c]]
    [T1 [A a] [B b]]]]
[S
  [T3
    [T1 [A a] [B b]]
    [C c]
    [T3 [T1 [A a] [B b]] [C c] [T3 [T1 [A a] [B b]]]]]]