This is an old revision of the document!
Table of Contents
GalateaTalk
in Japanese : http://ja.nishimotz.com/galateatalk
README
Originally, readme file is in Japanese.
https://github.com/nishimotz/jagtalk/blob/master/README.gtalk
To output the selected (as below) internal data to file:
set Log = filename
If the file exists, append mode is used.
To output using stderr:
set Log = CONSOLE
To disable output:
set Log = NO
Slots are as follows:
Log.conf : configrations of ssm.conf
Log.text : input text
Log.arrangedText : arranged input text
Log.chasen : analysis result of chasen
Log.tag : tag lists (CONTEXT, SPELL is not included)
Log.phoneme : phoneme information
Log.mora : mora information
Log.morph : morphological analysis information
Log.aphrase : accent phrase information
Log.breath : breath paragraphic information
Log.sentence : sentence information
The default value is NO (output is disabled).
To enable output log for 'chasen' slot:
set Log.chasen = YES
text2wav
Mac OS X suport
since 2010-11-14
using Mac OS X 10.6.5 (64bit).
macports
- download and install: MacPorts-1.9.2-10.6-SnowLeopard.dmg
chasen
http://sourceforge.jp/projects/chasen-legacy/
Binary version of Unidic is compatible with 32bit binary of chasen.
MacPorts version of chasen is 64bit binary.
Using terminal:
$ sudo mkdir -p /opt/local/bin/portslocation/dports/chasen
$ cd /opt/local/bin/portslocation/dports/chasen
$ sudo port install chasen
if not installed, darts and nkf are also fetched and installed.
Due to historical reasons, the default encoding of ChaSen is set to EUC-JP. If you'd like to handle text files written in UTF-8 or Shift_JIS, you may use -r and -i options. UTF-8) chasen -r /opt/local/etc/chasenrc-UTF-8 -i w <input> Shift_JIS) chasen -r /opt/local/etc/chasenrc-Shift_JIS -i s <input>
$ file /opt/local/bin/chasen /opt/local/bin/chasen: Mach-O 64-bit executable x86_64
$ echo "123" | /opt/local/bin/chasen | nkf -w 1 イチ 1 名詞-数 2 ニ 2 名詞-数 3 サン 3 名詞-数 EOS
nkf -w converts output (EUC-JP) to Terminal default (UTF-8).
- at this time, ipadic-2.7.0 is used with chasen.
- if you want to remove chasen: sudo port -f uninstall chasen
chaone + unidic
- http://www.tokuteicorpus.jp/dist/ (Japanese pages, user registration required)
- download 1: chaone-1.3.3.tar.gz
- download 2: unidic-chasen1312src.tar.gz (use source. binary version is for 32bit chasen)
gtalk + speakers
- download 1: gtalk-090225.tar.gz (or clone jagtalk from github.com)
- download 2: speakers-060820.tar.gz
uncompress and compile
$ cd $ cd code $ pwd /Users/nishimotz/code $ tar xvfz ~/Downloads/unidic-chasen1312src.tar.gz $ tar xvfz ~/Downloads/chaone-1.3.3.tar.gz $ tar xvfz ~/Downloads/gtalk-090225.tar.gz.gz $ tar xvfz ~/Downloads/speakers-060820.tar.gz.gz
Xcode (gcc) must be installed.
$ gcc -v Using built-in specs. Target: i686-apple-darwin10 Configured with: /var/tmp/gcc/gcc-5659~1/src/configure --disable-checking --enable-werror --prefix=/usr --mandir=/share/man --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^[cg][^.-]*$/s/$/-4.2/ --with-slibdir=/usr/lib --build=i686-apple-darwin10 --program-prefix=i686-apple-darwin10- --host=x86_64-apple-darwin10 --target=i686-apple-darwin10 --with-gxx-include-dir=/include/c++/4.2.1 Thread model: posix gcc version 4.2.1 (Apple Inc. build 5659)
building unidic for x64
seems easier to use default (UTF-8 version) of unidic, rather than to make EUC-JP version of unidic.
$ cd unidic-chasen1312src $ ./configure $ make /opt/local/lib/chasen/makemat -i w parsing grammar.cha parsing cforms.cha parsing ctypes.cha parsing connect.cha table size: 9767 lines: ......................
modify chasenrc:
;(GRAMMAR ./dic) (GRAMMAR .)
or make symbolic link:
$ ln -s . dic
test chasen using unidic:
$ echo "123" | chasen -r chasenrc 1 イッ 名詞-数詞 lForm="イチ" lemma="一" orthBase="1" pronBase="イッ" kanaBase="イッ" formBase="イチ" goshu="漢" iConType="N1" fType="チ促" fForm="促音形" aType="2" aConType="C3" 2 ニ 名詞-数詞 lForm="ニ" lemma="二" orthBase="2" pronBase="ニ" kanaBase="ニ" formBase="ニ" goshu="漢" fType="イ長添" fForm="基本形" aType="1" aConType="C3" 3 サン 名詞-数詞 lForm="サン" lemma="三" orthBase="3" pronBase="サン" kanaBase="サン" formBase="サン" goshu="漢" iConType="N3" aType="0" aConType="C3" EOS
rename the directory:
$ cd .. $ mv unidic-chasen1312src unidic-chasen1312_utf8-x64
building chaone
$ cd chaone-1.3.3 $ sh configure $ make
$ sudo port install libxml $ sudo port install libxml2 $ sudo port install libxslt
still errots:
In file included from chaone.c:12: /usr/include/libxslt/transform.h:15:27: error: libxml/parser.h: No such file or directory /usr/include/libxslt/transform.h:16:26: error: libxml/xmlIO.h: No such file or directory
$ sh configure (omitted) configure: WARNING: "xml2-config is not found" $ make
to avoid the errors:
$ cd /usr/include/ $ sudo ln -s libxml2/libxml .
$ sh configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... ./install-sh -c -d
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking for gcc... gcc
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking for
xmlCleanupParser,
xlFreeDoc,
xmlLoadExtDtdDefaultValue,
xmlFree,
xmlParseMemory,
xmlStrcat,
xmlStrdup,
xmlSubstituteEntitiesDefault in -lxml2... yes
checking for
xsltApplyStylesheet,
xsltCleanupGlobals,
xsltFreeStylesheet,
xsltParseStylesheetFile,
xsltSaveResultToFile in -lxslt... yes
checking for
exsltRegisterAll in -lexslt... yes
checking how to run the C preprocessor... gcc -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... rm: conftest.dSYM: is a directory
rm: conftest.dSYM: is a directory
yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for stdlib.h... (cached) yes
checking for string.h... (cached) yes
checking libxslt/transform.h usability... yes
checking libxslt/transform.h presence... yes
checking for libxslt/transform.h... yes
checking libxslt/xsltutils.h usability... yes
checking libxslt/xsltutils.h presence... yes
checking for libxslt/xsltutils.h... yes
checking libexslt/exslt.h usability... yes
checking libexslt/exslt.h presence... yes
checking for libexslt/exslt.h... yes
checking for an ANSI C-conforming const... yes
checking for stdlib.h... (cached) yes
checking for GNU libc compatible malloc... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: executing depfiles commands
program runs, but fails to read data:
$ ./chaone I/O warning : failed to load external entity "/usr/local/chaone/chaone.xsl" error xsltParseStylesheetFile : cannot parse /usr/local/chaone/chaone.xsl Segmentation fault
copy to /usr/local (“sudo make install” does not work??):
$ sudo mkdir /usr/local/chaone $ sudo cp *.xml *.xsl /usr/local/chaone/ $ sudo cp chaone /usr/local/bin/
now /usr/local/bin/chaone works.
$ chaone -h
Usage: chaone [options] [file]
[file] input file name. if none is specified, stdin is used
output to stdout
[options]
--encoding {ISO-2022-JP|EUC-JP|Shift_JIS|UTF-8}: set I/O encoding
--mode {prep|chunker|phonetic|accent|postp|pc|pcp|pcpa|gtalk}: set standalone mode
--debug : debug output to stderr in UTF-8
building gtalk
see jagtalk
Mac build (32bit, without ports)
since 2011-10-08
http://chasen.org/~taku/software/darts/
$ tar xvfz darts-0.32.tar.gz $ cd darts-0.32 $ CFLAGS='-arch i386' ./configure $ make $ make check $ sudo make install
http://sourceforge.jp/projects/chasen-legacy/
$ tar xvfz chasen-2.4.4.tar.gz
