Wednesday, October 31, 2012

Obtaining Census Data from e-stat and converting them into UTF-8 encoding

Objective: To download data of census tracts of Japan from e-stat(Ministry of International Affairs and Communications, Japan) and to make usable on ArcGIS

1. Downloading Census DataGo to the website of Statistics Bureau, Ministry of Internal Affairs and Communications, Japan.
http://www.stat.go.jp/english/index.htm
http://www.e-stat.go.jp/SG1/estat/eStatTopPortal.do(In Japanese)
Downloading only works on Firefox or Safari (Not working on Chrome)

2. Converting the text encoding into UTF-8
All of the table files are encoded in SHIFT-JIS. You need to convert them into Unicode in order to be shown correctly. I wrote a shell script that can run on Mac.

In order to run the code on Macs, you need to run this.
% chmod+x CharConv.command
The shell script converts all tex file encoded as SHIFT-JIS into UTF-8 with file names starts from "utf8-".

3. Combining them into a single csv file
% cat *.txt > [filename]
Then it's good to be joined using the Keycodes.