/sekimura

Text (heredoc) strip margin in Python

import re

def strip_margin(text):
    return re.sub('\n[ \t]*\|', '\n', text)

def strip_heredoc(text):
    indent = len(min(re.findall('\n[ \t]*(?=\S)', text) or ['']))
    pattern = r'\n[ \t]{%d}' % (indent - 1)
    return re.sub(pattern, '\n', text)

print strip_margin(
"""I was reading a book "Programming Scala" and
|noticed that there is stripMargin method in
|RichString class in Scala.
|
|It's damn useful.""")

print strip_heredoc(
"""
And...
There is strip_heredoc in Ruby's String object.
Now you don't need a leader "|".
It does do a great job even if the string has
multi
level
indention.
It's damn useful too.
especially for usage docs like:

-h This message

yay or nay?""")

here's the result of python text_strip_margin.py

I was reading a book "Programming Scala" and
noticed that there is stripMargin method in
RichString class in Scala.

It's damn useful.

And... 
There is strip_heredoc in Ruby's String object.
Now you don't need a leader "|".
It does do a great job even if the string has
    multi
        level
            indention.

It's damn useful too.
especially for usage docs like:

   -h       This message

yay or nay?

Color them all

My favoriate two tools git and vim look good in colorful mode like syntax highlight or colored diff. You can put "almost unlimited" 256 different information in your terminal and which is huge. Well, using all 256 colors at once is not so useful, but I'd like to share my two coloring configurations.

1. Display git branche name in color

Using git means you would have a lot of branches in your working repo. And the git-completion.bash is the magic which allows you to use Tab key to complete name of branches, git sub commands and so on. I've added a little spice to PS1 environment value to display a branch name more clearly. Add these lines in to your ~/.bashrc:

if [ -f $HOME/bin/git-completion.bash ] && ! shopt -oq posix; then
    . $HOME/bin/git-completion.bash
    PS1='\[\033[1;35m\]\u: \[\033[0m\]\W\[\033[1;32m\]$(__git_ps1 " %s")\[\033[0m\] \$ '
fi

And you'll get this.

colored git branch name on PS1

Now, you won't get lost in tens of thousands branches. Btw, I'm using Inconsolata dz font in iTerm2.

2. Less Colors for man Pages

As looking for a way to coloring with git-completions.bash, somehow I came up with this page explaining how to display colored text in man pages.

    # Less Colors for Man Pages
    export LESS_TERMCAP_mb=$'\E[01;31m'       # begin blinking
    export LESS_TERMCAP_md=$'\E[01;38;5;74m'  # begin bold
    export LESS_TERMCAP_me=$'\E[0m'           # end mode
    export LESS_TERMCAP_se=$'\E[0m'           # end standout-mode
    export LESS_TERMCAP_so=$'\E[38;5;246m'    # begin standout-mode - info box
    export LESS_TERMCAP_ue=$'\E[0m'           # end underline
    export LESS_TERMCAP_us=$'\E[04;38;5;146m' # begin underline

After putting these into .bashrc, I got these from man and Python online help:

colored-less color-less-python

Neat.

Expressions for heavy rain in Japanese and more

it's raining cats and dogs

It was a raining week in San Francisco. We got enough rain to talk about the origin of "raining cats and dogs" phrase and expressions for heavy rain in different languages.

Its equivalent is 土砂降り (doe-sha-boo-lee) in Japanese. 土砂 literary means "earth and sand" but these are phonetic equivalent for どさ coming from a word どさくさ (doe-sa-koo-sa) to express confusion or mess. So, it's more like "what a mess! it's raining so hard" which is just express one's emotional and it's not a metaphor expression unfortunately.

My French co-worker told me that they would say "Il pleut des cordes" (it's raining ropes) in French. I love the one. So vivid.

Now I assume you got curious about other languages. As you expect, there is a page for that. idiomatic expressions for heavy rain in many different languages. Enjoy.

The origin of "raining cats and dogs" phrase? Who knows!

Linus Torvalds And Github | blueparen

Linus Torvalds recently put the Linux kernel on Github as well as his dive log software. Here I asked him how he likes Github and what he thinks about it.

From Linus:

It seems to work well as a hosting place, I'm less impressed with the infrastructure to help "develop" things. It's clearly way too easy to create pointless issue requests, and the "pull requests" seem to be actively designed for somebody who pulls without ever even thinking about what he does - which is against everything I believe in as a project manager.

So from the pull request it's actually hard to see *what* somebody asks you to pull. Together with making it trivial to create commits and pull requests entirely in the browser, I think it's much too easy to do bad-quality requests.

That said, it's working fairly well for the small dive log software I originally put there. For the kernel, I just wish I had a way to disable pull requests entirely, because they are so worthless - even if you disregard any code issues, we just have much higher standards even just for the process of a real kernel pull-request (much more explanation about what the pull contains etc).

That said, I did give them some feedback about the things that really don't work well. So who knows..

via blueparen.com

mnot’s blog: HTTP Pipelining Today

I’ve been active in trying to get pipelining more widely deployed, but to date I haven’t tested mobile browsers much. So, one VM and two test pages (20 images on each) later, I asked my twitter peeps to hit it with their phones while I was watching with htracr.

The results confirm what they saw; mobile browsers do pipeline, sometimes aggressively.

how are the mobile browsers doing it today? First, Opera Mobile:

Opera Mobile-2

via www.mnot.net

Heroku suffered a DDOS attack

Root domains are aesthetically pleasing, but the nature of DNS prevents them from being a robust solution for web apps. Root domains don't allow CNAMEs, which requires hardcoding IP addresses, which in turn prevents flexibility on updates to IPs which may need to change over time to handle new load or divert denial-of-service attacks.

We strongly recommend against using root domains. Use a subdomain that can be CNAME aliased to proxy.heroku.com, and avoid ever manually entering IPs into your DNS configuration. We also recommend a low TTL value, which will allow Heroku network engineers to quickly make changes to DNS mapping when necessary.

via status.heroku.com

There's a cost to use a root domain (eg. example.com, but not www.example.com).

Making TypePad Infinite Scroll Page

It looks like pagination is a new marquee tag. Now all hip websites like Twitter, Facebook got a fancy infinite scrolling. Scroll down more, you'll get more tweets, status updates or contents. Just for browsing stuff, scrolling with space key (or petting your shiny trackpad) is good enough for those lazy people.

Currently TypePad provides only paginations on monthly archive pages like http://sekimura.typepad.com/blog/2011/02/index.html. But you can make your own infinite scroll page by following these steps:

* Step 1: Create a new "Page" *

Go to "Compose" page and select "New Page" from Compose pull down menu.

New-Page

* Step 2: Put base html and save it as "all-archives.html"*

Click "HTML" tab and put html code input area.

Infinite-scroll-html

Here is html code I used. Replace "blogId" with your own. To find your blogId, go to the dashboard for your blog and look at the URL. You will find something like:

http://www.typepad.com/site/blogs/6a00d8341d3fee53ef010535c8916d970c/dashboard

In this case, 6a00d8341d3fee53ef010535c8916d970c is your blogId.

And then save this page as "all-archives.html" or something relevant to publish it.

* Step 3: Enjoy infinite scrolling *

Go to the page you just created and scrolling down. It will automatically loading more entries when you about to get the bottom of page. Mine is here. http://sekimura.typepad.com/blog/all-archives.html.

TypePad JSON API provides such a powerful way to render your contents. Go to the page and find more fun! http://www.typepad.com/services/apidocs

Sumo match-fix scandal and Freakonomics

In this weekend, I've got asked about sumo match-fix scandal and realized that the news spreads the world over and people outside Japan are also interested in. I said to my friend "they have been doing that and media keep on hiding the fact. So, it's no surprise to me". After checking some blogs, I found out that a controversial book, Freakonomics by Levitt, mentioned sumo wrestlers are match rigging. I haven't read the book yet but there's a pdf file of his paper about corruption in sumo wrestling. Some quotes here:

The key institutional feature of sumo wrestling that makes it ripe for corruption is the existence of a sharp nonlinearity in the payoff function for competitors. A sumo tournament (basho) involves 66 wrestlers (rikishi) participating in 15 bouts each. A wrestler who achieves a winning record (eight wins or more, known as kachi-koshi) is guaranteed to rise up the ofŽ cial ranking (banzuke); a wrestler with a losing record (make-koshi) falls in the rankings. A wrestler’s rank is a source of prestige, the basis for salary determination, and also influences the perks that he enjoys

"Figure 1" on the paper demonstrates the importance of an eighth win to a wrestler.

The critical eighth win—which results in a substantial promotion in rank rather than a demotion garners a wrestler approximately 11 spots in the ranking, or roughly four times the value of the typical victory. Consequently, a wrestler entering the final match of a tournament with a 7-7 record has far more to gain from a victory than an opponent with a record of, say, 8-6 has to lose.

He found out the peak on 8-7 records. See below.

Figure 2 provides clear visual evidence in support of the model’s prediction. Approximately 26.0 percent of all wrestlers finish with exactly eight wins, compared to only 12.2 percent with seven wins.

There're some more interesting analysis such as a give-and-take case "What Happens When Wrestlers Meet Again in the Future" so download the pdf file to read a whole paper.

You might be disappointed with the fact but don't get wrong. Sumo is "professional" wrestling and you won't be noticed how they cheat. Just enjoy watching two naked fat guys hugging each other. ;p

The New Readability | Readability Blog

We’re turning Readability into a monthly subscription service with a unique twist: the great majority of your fees (70%) will go directly to the writers and publishers you enjoy. We’re tethering a small, passive transaction to the reading decisions you make through the platform. You can even publicly share the top domains you’re enjoying through Readability. It’s a new type of badge: “I support these writers & publishers.”

via blog.readability.com

This is great win-win situation. A reader clicks "Readability" link to get more readable content: a writer gets a percentage of reader's subscription fee.

How Twitter spent money on marketing

1) We created a Twitter visualizer and negotiated with the festival to put flat panel screens in the hallways. This is something they'd never done before, but we didn't want a booth on the trade show floor, because we knew hallways is where the action was. We paid $11K for this and set up the TVs ourselves. (This was about the only money Twitter's *ever* spent on marketing.)

2) We created an event-specific feature, where, you could text 'join sxsw' to 40404. Then you would show up on the screens. And, if you weren't already a Twitter user, you'd automatically be following a half-dozen or so "ambassadors," who were Twitter users also at SXSW. We advertised this on the screens in the hallways. (I don't know how many people signed up this way -- my recollection is not a lot.)

I don't know what was the most important factor, but networks are all about critical mass, so doubling down on the momentum seemed like a good idea. And something clicked.

via www.quora.com

Evan Williams had answered a question on Quora "What is the process involved in launching a start-up at SXSW?"

Loaded Gun Slips Past TSA Screeners

According to one report, undercover TSA agents testing security at a Newark airport terminal on one day in 2006 found that TSA screeners failed to detect concealed bombs and guns 20 out of 22 times. A 2007 government audit leaked to USA Today revealed that undercover agents were successful slipping simulated explosives and bomb parts through Los Angeles's LAX airport in 50 out of 70 attempts, and at Chicago's O'Hare airport agents made 75 attempts and succeeded in getting through undetected 45 times.

via abcnews.go.com

TSA fails 70~90% of all attempts to smuggle guns past screeners. Ooops.