Wednesday, November 30, 2016

AYAT AYAT ALLAH PADA TUBUH MANUSIA

http://pustakaimamsyafii.com/ayat-ayat-allah-pada-tubuh-manusia.html

Sunday, November 20, 2016

Awas, BAHAYA EFEK SAMPING OBAT PENYUBUR PROFERTIL

http://forum.liputan6.com/t/awas-bahaya-efek-samping-obat-penyubur-profertil/28094

Memiliki buah hati dalam kehidupan ber-rumah tangga memang merupakan impian setiap pasangan suami isteri, tak terkecuali mungkin Anda salah satunya. Berbagai cara mungkin sudah banyak dilakukan untuk segera mendapatkan momongan, seperti berkonsultasi dengan dokter spesialis kandungan atau dengan mengkonsumsi obat penyubur kandungan.
Namun tahukah Anda ? Bahwa terlalu sering mengkonsumsi obat penyubur kandungan berbahan kimia seperti profertil dapat menyebabkan efek samping dan pengaruh negatif terhadap kesehatan. Apalagi jika tidak disertai dengan rekomendasi dokter.
Apa itu obat penyubur profertil ?
Obat profertil ialah obat yang memiliki kandungan zat clomiphene citrate. Ini obat yang digunakan untuk mengobati infertilitas pada wanita. Zat clomiphene citrate bekerja dengan merangsang peningkatan jumlah hormon yang mendukung pertumbuhan dan pelepasan telur yang matang ( ovulasi ).
Zat clomiphene citrate pada profertil merupakan salah satu obat penyubur wanita yang memang banyak direkomendasikan oleh para dokter kandungan, hal ini karena obat ini dipercaya dapat menunda pematangan sel telur dan memperbaiki pematangan sel telur hingga 70 %, akan tetapi memiliki efek samping mengentalkan lendir di mulut rahim serta menipiskan dinding endometrium (dalam rahim) sehingga mengurangi peluang hamil.
Baca juga :
Cara AMPUH Menyuburkan Kandungan Agar Cepat Hamil19
Lalu, apa saja efek samping dan pengaruh BURUK jika mengkonsumsi obat kandungan profertil / klomifen sitrat ?
Berhenti mengambil clomiphene dan mencari perhatian medis darurat jika Anda mengalami reaksi alergi (kesulitan bernapas, penutupan tenggorokan Anda, pembengkakan bibir, lidah, atau wajah, atau gatal-gatal).
Efek Samping Obat Kimia Profertil / Klomifen SitratEfek samping lainpun dapat terjadi, termasuk :
  1. Penglihatan kabur
  2. Sensitivitas mata terhadap cahaya
  3. Sakit kepala
  4. Pembentukan kista ovarium
  5. Depresi mental
  6. Mual atau muntah
  7. Kegelisahan
  8. Kesulitan dalam tidur
  9. Perdarahan uterus abnormal
  10. Sindrom hiperstimulasi ovarium (OHSS)
  11. Ketidaknyamanan dan nyeri pada payudara
Catatan !
Efek samping selain yang tercantum di sini juga dapat terjadi. Bicarakan dengan dokter Anda tentang efek samping yang tampaknya tidak biasa atau yang sangat mengganggu. Reaksi alergi yang sangat serius terhadap obat ini tidak mungkin, tetapi segera mencari bantuan medis jika terjadi. Gejala reaksi alergi yang serius mungkin termasuk: ruam, gatal / bengkak (terutama wajah / lidah / tenggorokan), pusing berat, kesulitan bernapas.

Thursday, November 17, 2016

StarDict as an open-source toolchain for working with dictionaries

http://filosofie.unibuc.ro/~solcan/wt/gnu/s/stardict.html

The first snippet after a “StarDict” search on Google sounds like a commercial message: “The best dictionary program in Linux and Windows”. Under it, with fine print there is a more qualified message: “The best free dictionary in Linux and Windows”. The article about StarDict in Wikipedia is more sober and informative, but not very comprehensive. 
In fact, StarDict is a toolchain for working with dictionaries: you can compile and decompile dictionaries, use them with a GUI shell for searches and much more and there is even a commandline version of the program. I find the open source toolchain far more important than the “power” of the dictionary shell alone. 
I will try in this note to share my experience with StarDict. This experience - in the first half of January 2010 - is limited to the Fedora distribution of Linux and the 2.4.5 and 3.0.1 versions of the StarDict program. “Share” does not mean that I wrote a friendly tutorial. I just described the experience with StarDict. 
Please note two things: these are working notes (no claim of clarity!); you must be familiar with GNU/Linux, regular expressions, the Vim programming editor and other things connected with programming. Reading a text like this is not enough. You must experiment with the programs on a real computer. If you are just a computer user, these lines will be of little help and you might just end up in total confusion. 
[18 February 2010: for further observations concerning StarDict, see also the notes on the use of DEX online with StarDict under GNU/Linux and Windows.]

Installation

I did some years ago execute a “configure, make and install” procedure for StarDict, but since Fedora 6 I have just used the RPMs provided by the distribution. Thus I have nothing special to note about the initial part of the installation process. 
Can you start the program automatically when you log in? Yes, you can. Under Gnome I have added a file called StarDict.desktop to the directory
~/.config/autostart
This is an usual desktop file, i.e. a text file that you can create with an editor like Vim. You can see in the screenshot its content, in my case. 
stardict-desktop
The content of the desktop file
If you do not want to use the more intricate machinery of the rpm command, in order to know where the files installed by StarDict are placed you can just use File Roller (or some other archive manager which is capable to show these files). 
With root privileges (usually when you install form RPMs) the dictionaries go to the folder:
/usr/share/stardict/dic
You may also use/create in your home a folder
~/.stardict/dic
I put, for exemple, in this folder the dictionaries that I create myself. I find this location more convenient on the computer that is just for my personal use. 

Using StarDict

For the moment, I will just suppose that you have installed at least one dictionary. For example, under, Fedora 10 one can install
stardict-dic-en-2.4.2-5.fc10.noarch.rpm

The Preferences

You can start the program (for example, clicking the icon in Accessories) and you will see the main window of StarDict. On the top-right corner you will see the familiar “home” icon: a little house. There you can access the Main menu. Then you can read the StarDict Manual. In my 3.0.1 version of the program, the manual is still for version 2.4.2, but is quite useful for a start. I will not repeat what is written there. I will insist on preferences and the management of dictionaries and plugins, as they are available in the 3.0.1 version. 
The button for the preferences dialogue is the last one on the bottom right corner. In the Dictionary menu you find a series of options. 
I left a tick on Dictionary/Cache/Create cache files. The Sort word list by collation function is useful for languages as my native Romanian. Without it the program does not know how to order the words according to the Romanian alphabet. It does not make sense however to activate the collation function for Romanian when you use other dictionaries. 
Dictionary/Export means that you can copy the content of the dictionary article to a (text) file. Check the location of the text file! 
Dictionary/Sound has an option for Use TTS program. This means that I can put there a line like this:
espeak -v ro %s &
This means that the eSpeak speech synthesizer will pronounce the word in Romanian (ro). Check the available voices in a terminal with:
espeak --voices
For current use I disable the Dictionary/Sound option. 
The Network/Net Dic option has been the subject of discussions concerning security. I do not enable network dictionaries. You can read more about the risks on the Internet. First, there is the risk of sending sensitive content from the clipboard on the net. Second, you certainly have to trust the server that you use when you enable this option. 
The Main window/Serch webside has an obvious meaning. For example, I use this option to go to Wikipedia articles. The key is the website search link:
http://ro.wikipedia.org/wiki/%s

The Management of the Plugins

On the top-right corner, in the Main Menu, one can find the Manage Plugins dialogue. You can enable/disable and configure the plugins. In the figure you can see an example for the configuration of the Spell Check plugin. 
stardict-spell
The dialogue for spell check configuration
What is the meaning of that ro for the language? How does StarDict understand it? 
StarDict uses Enchant. You can control the behavior of Enchant with the .enchant configuration file, placed in your home. From the figure, the syntax of this configuration file should be obvious. 
stardict-enchant
The Enchant configuration file
How can I know that the plugins have been loaded without errors? Open StarDict from a terminal and read the messages. Again, the figure should give an idea about what kind of messages you can get. 
stardict-messages
Stardict messages
When you try to use a non-existent language you get an error message. 
Now, when you install StarDict, pay attention to the dependencies. For example, the installer would probably require Enchant, but you may also need the enchant-aspell-1.4.2-4.fc10.i386.rpm or some similar file, because you want to use Aspell. 
A very interesting plugin is the one for WordNet dict rendering. It can be configured in graphic mode. 
stardict-wordnet
StarDict as WordNet browser
Most important are the plugins for the Data Parsing Engine. I left all of them enabled. Of course, you may disable some of them, but you really must know what you are doing. They affect the look of the Definition area

The management of the dictionaries

Where can I get dictionaries? Go to the StarDict page and download dictionaries. 
I find the *.tar.bz2 archives very convenient. As shown in the figure, I use the mc commander for the installation of these dictionaries. 
stardict-mc
Use mc for the installation of dictionaries
On the right panel of the mc the archive is opened. On the left panel an appropriate folder has been created. Then use mc for copying from the right to the left panel. Pay attention to the attributes of the files! 
The new dictionary appears in the list of the dictionaries when you restart StarDict. 
What happens if I want to use two versions of the WordNet dictionary, for example? Hack the .ifo file! Change the name of the dictionary. For example, put:
bookname=WordNet2
Is this important? Not that much, but it helps you when you manage the dictionaries. 
How I manage the dictionaries? First you must look for the Manage Dictionaries button on the bottom-right corner of the main window. Then, of course, you have to open the dialogue box. The essential panel is Manage Dict
You can group the dictionaries. Click (in order to select it) the line on which it is written Default Group. Press the button with + on it (the add button). In the dialogue box which will show up write the name of the new group. Press OK. Now, select the line with Query Dict on it. Press the add button. Select a dictionary from the list. Then repeat the operation on the Scan Dict line. 
What is the difference between real dictionaries and virtual dictionaries? Virtual dictionaries are created by plugins using commands from the GNU/Linux system. 
In the figure, you can see the result of the above operations for a group called Romanian
stardict-manage-dict
The Romanian group of dictionaries
Of course, in order to get spelling suggestions for Romanian you have to configure the spell check plugin as shown above. 
Is pressing the Delete button a tragic event? Not really. You do not erase the dictionary from the list. You just erase it from the group. You can put it back. 
In the main window, on the left side, under the icon with a broom you find four buttons (five, if you have installed a tree dictionary). You can use the last button to choose a dictionary group. [Pay attention to security problems if you press Enable Net Dic; especially, do not log to an untrusted site.]

The format of the files

Now, let's move a bit towards the workshop where you can forge StarDict dictionaries. For this one must understand a bit the format of the files used by StarDict. 
The format of the files is described in the documentation available on the site of the StarDict project. There are three essential files for the dictionaries. The ifo file contains information about the dictionary, such as the number of words (of articles) or the name of the dictionary. The dict.dz file contains the articles of the dictionary. The idx file contains a sorted list of entries. From version 2.4.8 on, there might be several entries containing the same word(s), but corresponding to different definitions. This is useful for dictionaries with several definitions for the same word (like DEX online, the online dictionary of the Romanian language, for example; see also the notes on the use of DEX online with StarDict under GNU/Linux and Windows). 

The dict files

In fact, the dict.dz file is a compressed dict file. 
The dict files are described on the site of the DICT project. 
For the work on a StarDict dictionary you are going to need a tool from the DICT project: dictzip. This is a program for compressing and uncompressing dictionaries. The sources have been created by the DICT group, but they are compiled in various GNU/Linux distributions. 
Under Fedora, I have used an RPM for the installation of dictzip. When the tool is called dictzip you need the -d option on the commandline for decompression. 

The dictionaries and the StarDict editor

On the Internet, when it comes to software like StarDict, people seem to look for programs with a lot of dictionaries. This is the case of StarDict, but dictionaries are not like manna. They do not fall from heaven. You need tools to build them. 
I think that the most precious thing is to have an open way of building the dictionaries. StarDict has a set of stardict-tools. I have examined the 2.4.8 and the 3.0.1 versions of the tools. I will refer mainly to the later version. 
A set of 35 tools might look very frightening. Many might think that this is not for them to try. In fact, the stardict-editor, which comes with the whole set is very friendly. It has a graphical interface, it is easy to use and fast. 
Under Fedora 10, I had a problem with the compilation. There is no official rpm of the stardict-tools and one must compile from the sources. However, one has to patch the sources, because the gcc compiler does not accept them. 
Some Linux distributions do include the stardict-tools and they have patches for the sources. For example, the Arch Linux repository has a patch for stardict-tools. 
For Windows there are binaries of the stardict-editor. I have not tested them however. 
The stardict-editor includes “a simple UTF-8 text file editor”. In fact, I did not use the edit function of the tool. Instead, I have used Vim. You can use of course any text editor. Avoid however WYSIWYG monsters! 
Now, what you really need to use is the compile function from the stardict-editor. I will show in a simple example how easy it is to use the compiler for StarDict dictionaries. 

A very simple example

HanDeDict is a Chinese-German Dictionary. Its license is a German version of Creative Commons. You can download HanDeDict in EDICT format
Now, I will describe my recipe. It is not difficult to adapt it. The “cooking-time” is very short. 
First, one has to identify - in the downloaded archive - the file which is encoded in UTF-8 (because this is the encoding used by StarDict). Then you have to open it in an editor (I use Vim) and study it a bit. 
stardict-dedict-src
HanDeDict opened in Vim
The structure of each line is the following: (1) first is the entry in traditional Chinese script; (2) then the entry in simplified Chinese; (3) the pronunciation (using tone numbers) - enclosed in square brackets; (4) the meanings explained in German - enclosed in slashes. Spaces separate each structural element from the next one. 
The stardict-editor can use as a source a “Tab file”. This is a text file in which all the dictionary articles are written on one line and the lexicon entry is separated by a tab from the rest. There are no empty lines, of course. 
Now, the structure of the HanDeDict file is very convenient. One has to replace the first space on each line with a tab. In the figure, one can see how this is done for the line with “Apollon”. The regular expression used for the substitution is at the bottom. The tab is clearly indicated by Vim. 
After a successful test, one can put a percent in front of the s, on the bottom line, and make the substitution in the whole file. This is the source that one needs for the stardict-editor. 
I have also prepared a source with the lexicon entry in the simplified Chinese script. For this one has also to invert the structural elements (1) and (2). This is very easy (using Vim!). 
stardict-dedict-simple
HanDeDict source for the stardict-editor with simplified Chinese first
In the next step one calls the stardict-editor and compiles the sources. This is a very simple operation, as shown in the figure. 
stardict-editor
Compilation with the stardict-editor
The warning that we see in the figure is caused by the copyright notice on the first line of the HanDeDict file. All the other lines are OK and one can install the dictionary. 
I prefer an installation in the home (in the .stardict/dic folder). It is easy to create there a folder and put the dict.dz, idx and ifo files of the dictionary in it. Then one has to open StarDict and use the dictionary. 
stardict-dedict
The HanDeDict dictionary in StarDict
In the figure one can see how, using the scan facility of StarDict, one can easily find the meaning of the Chinese words. In the figure we use as an example the Wikipedia article about StarDict. 
Adding Gucharmap as a virtual dictionary is very useful. One can get the codes for the Chinese characters. I also enjoy the better visibility of the characters with Gucharmap. With bitmap fonts it's better than the magnifier. 
You can also add sound to the dictionary. I will only sketch a solution. 
In the Preferences dialogue enable Use TTS program, as shown in the figure. 
stardict-sound
Enable sound
The external program cnplay is a small piece of software written accoding to the following scheme: it takes as input a string (in the case of StarDict the selected text); the string is cleaned and then splited into a list (of syllables written in pinyin with tone numbers); finally, another external program is used for playing audio files (each containing the pronunciation of a syllable). 
stardict-audio
Play a string of Chinese syllables
I will add also a few remarks about the searches. 
One can search with StarDict in the whole HanDeDict. For example, let's say that I have heard the Chinese saying xie4 xie5. It is possible to look in the whole HanDeDict for this sequence in pinyin (and find that it means thank you). 
Now that I know the character for thanks, I can use a regular expression and find out all the entries which begin with this character. Even more interesting, I can find all the entries which contain in some position this character. 
Summing up, with a few simple procedures I can satisfy my curiosity and find out the meaning of some of the mysterious (for me, of course!) Chinese characters in a Wikipedia article or elsewhere. 

The commandline tool

There is a commandline version of StarDict. The name of this tool is “sdcv”. It has a home page
The version that I have examined is 0.4.2. As in the case of the stardict-tools there problems with the compilation with newer versions of gcc. One can read a discussion on AUR about patching the sdcv source. 
The sdcv tool works with dictionaries of the older 2.4.2 type. 
The tool is very useful when you want to test the compiled dictionaries without installing them. It has an option for the path of the database. 
stardict-sdcv
A search on the commandline

The versions of StarDict and their authors

StarDict has a rather long story. 
At this time, the project leader is Hu Zheng. Evgeniy A. Dushistov and HuZheng developed sdcv. Alex Murygin has contributed to the project. There is also a long list of people who have translated the menus of StarDict in various languages. 
Acording to Wikipedia, StarDict evolved from the program StarDic by Ma Su-an. 
Version 2.4.2 of StarDict was a milestone. It was released in 2003. It had a new dictionary format. The version 2.4.5 (included in Fedora 6) was released in 2005. 
The plugin system was introduced in StarDict-3.0.0 (RedHat). 
Version 3.0.1 (included in Fedora 10) was released in 2007. See the ChangeLog file in 
/usr/share/doc/stardict-3.0.1/
for a history of the changes of the StarDict program. 
In the same folder you can find the COPYING file (the GPL license). 
While 2.4.5 was a shell for dictionaries in text-mode, 3.0.1 has the capability to use mark-up languages. 

A collection of all the files of the program (including older versions) can be found on sourceforge.net

How to make your own Stardict/Goldendict compatible dictionary

http://languagehopper.blogspot.co.id/2013/06/how-to-make-your-own-stardictoldendict.html

I recently found a PDF online of a fairly decent Ojibwe<>English dictionary that I wanted to incorporate into my list of dictionaries that I use on my system. I currently use Goldendict, which is compatible with Stardict, because it easily incorporates itself in my system and is usable with any application. Both Stardict and Goldendict are currently available for both Windows and Linux. Since I primarily use Ubuntu, both packages are available in the standard repository to install, but there are also installation packages located at the Startdict Project Google Code page. In any case, if you've already installed either Stardict or Goldendict, you'll want to grab stardict-tools (for Linux users) or stardict-editor (for Windows users) and install it.

I'll go through the steps to convert the PDF file to something that can be used within Stardict and Goldendict.

First, you want to create a simple text file with the dictionary. I just copied and pasted all the text I wanted to include into a new text file:


You'll notice that the delimiter between the two languages is a dash. I needed to change that to something that the convert program could understand. I chose a [TAB] as the delimiter. I also made sure to put a space before and after my dash, because Ojibwe uses dashes with some affixes.


I then saved that file as a text file. Once the file was saved, I then called up stardict-editor. This is a simple, single-window application that will do the conversion to a compatible format for use in the Stardict and Goldendict applications.



Click on the "Browse" button to load your saved newly edited text file, then click "Compile". If all goes well, you'll get the following dialog:




I had hundreds of duplicate entries, because the particular dictionary I'm using includes other dialects, and some of the entries were the same for the various dialects. If there are duplicates, an error is shown with a line number. Simply go back and fix/delete the entry, then try again until you get the above dialog.

Once compiled, three files will be created, a dict.dx, .idx, and .ifo file:


Next we want to open the .ifo file in a text editor and change the name of the dictionary to what we want it to be:


This name is what will be visible in the dictionary application.

Save the file and then start Stardict or Goldendict. Make sure that all three of these newly created files are easily accessible to the dictionary program. On my system there is a global user location, and I've also created my own dictionary directory and place all my own user-created dictionaries there.

Now we want to let the application know where the dictionaries are located. Start Goldendict (what I use), go to "Edit... Dictionaries". The following dialog box will appear:


Click on "Rescan". Now click on the "Dictionaries tab in the same dialog box, and you should see your new dictionary recognized.


That's it . You're done! You can now use your new dictionary.


The above screenshot is a simple dictionary lookup, but what makes Stardict and Goldendict so useful is that it can be used with any text application. While you're reading along in an epub, PDF, text program, you can just click on any word and you'll get the definition for it, provided it's in the dictionary:


Keep in mind that this process needs to be done for each language direction. The screenshots I've included here only show the process for an Ojibwe > English dictionary. The same thing must be done if you want a dictionary for the other direction (English > Ojibwe in my case).

I don't know of any direct way to do this for a Mac, but I know that there is something that will convert an already created Stardict/Goldendict dictionary to Mac Dictionary format. It's called the Mac Dictionary Kit and includes DictUnifier. It can be found here.

4 comments:

  1. Thanks for your writing, but i'm searching how to customize my dictionary: i want to add the images and as i see in your example, a word has one meaning. If i want to add more level of meaning, is it possible?


    Thankyou
    Reply
    Replies
    1. In the file .tab, use the mark-up Pango (search in Wikipedia).
  2. Thanks, this is very useful. Stardict can apparently make use of more complex dictionary files with added features such as including an audio file to play, but it looks like you have to create the necessary files manually in order to do this. Is there by any chance a way to use stardict-editor to create dictionaries with audio and other such features?
    Reply
  3. when I try to compile in stardict editor I get this

    Building...
    Error, no new line at the end.
    Failed!

    Someone told me here http://askubuntu.com/questions/810407/stardict-goldend-abiword that I need an enter to add in html file , I dont know what is meant...

    I tried everything but I always get en error, can someone solve my problem.

    Thanks in advance.
    Reply

konversi stardict ke file tab txt

copy dulu backup sebelum melakukan ini semua






Note on Stardict Tools in Ubuntu

http://thanhsiang.org/faqing/node/181

1. Install:
sudo apt-get install stardict-tools
2. All programs located:
/usr/lib/stardict-tools/
Help files:
/usr/share/doc/stardict-tools/README
3. Run in Terminal:
3.1 To compose/edit dictionaries:
stardict-editor
Help File
/usr/share/doc/stardict-common/HowToCreateDictionary
3.2 use dictzip to unzip a file
dictzip -d *dict.dz
4. Stardict Help files
/usr/share/doc/stardict
===========

How to create your own dictionaries

Original file located at: /usr/share/doc/stardict-common/HowToCreateDictionary
First, read doc/DICTFILE_FORMAT.
Second, install stardict-editer package. For more tools, download the stardict-tools source code tarball and compile it, you will find many tools at src/, such as tabfile, babylon, stardict2txt, stardict_verify.
In most case, I recommend the "tabfile" converter to create your own dictionaries. You need to create a text file first, it should be encoded in UTF-8, and the new line characters should be "\n", if it is in DOS file format("\r\n"), you can use "dos2unix" to convert it.
Here is a example dict.tab file:
============
a 1\n2\n3
b 4\\5\n6
c 789
============
It means: write the search word first, then a Tab character, and the definition. If the definition contains new line, just write \n, if contains \ character, just write \\.
Then use "tabfile" to compile it:
./tabfile dict.tab
You will find three files that are generated by tabfile: "dict.ifo", "dict.dict" and "dict.idx", then you can compress the "dict.dict" file by dictzip:
dictzip dict.dict
You will get the "dict.dict.dz" file. You can find dictzip at DICT project, http://www.dict.org, just download its source code tarball and compile it, then you can find "dictzip" in it.
StarDict can load the dict.dict directly too, so dictzip is optional.
You can use gedit to edit the "dict.ifo" file, change the bookname, description, etc. and you can look at the "src/example.ifo" file for a example. Use "ls -l dict.tab.idx" to get the idxfilesize.
Now you can create a directory at /usr/share/stardict/dic/, for example:
mkdir /usr/share/stardict/dic/example-dict/
And move dict.dict.dz, dict.idx, dict.ifo into this directory:
mv dict.dict.dz dict.idx dict.ifo /usr/share/stardict/dic/example-dict/
It is suggested that you verify this dictionaries at last by:
./stardict_verify /usr/share/stardict/dic/example-dict/dict.ifo
Run StarDict and you will find the dictionary that created by yourself.
Another format that StarDict recommends is babylon source file format, it is just like this:
======
apple|apples
the meaning of apple
2dimensional|2dimensionale|2dimensionaler|2dimensionales|2dimensionalem|2dimensionalen
two dimensional's meaning
the sencond line.
======
Use babylon to compile it. stardict-editer can compile it too.
If you want to distribute your dictionary at StarDict website, just contact me.
Hu Zheng 
http://forlinux.yeah.net
2006.6.29
=================

Format for StarDict dictionary files

Original file: /usr/share/doc/stardict-common/StarDictFileFormat.gz
------------------------------------
StarDict homepage: http://stardict.sourceforge.net
StarDict on-line dictionary: http://www.stardict.org
{0}. Number and Byte-order Conventions
When you record the numbers that identify sizes, offsets, etc., you
should use 32-bits numbers, such as you might represent with a glong.
In order to make StarDict work on different platforms, these numbers
must be in network byte order. You can ensure the correct byte order
by using the g_htonl() function when creating dictionary files.
Conversely, you should use g_ntohl() when reading dictionary files.
Strings should be encoded in UTF-8.
{1}. Files
Every dictionary consists of these files:
(1). somedict.ifo
(2). somedict.idx or somedict.idx.gz
(3). somedict.dict or somedict.dict.dz
(4). somedict.syn (optional)
You can use gzip -9 to compress the .idx file. If the .idx file are not
compressed, the loading can be fast and save memory when using, compress it
will make the .idx file load into memory and make the quering become faster
when using.
You can use dictzip to compress the .dict file.
"dictzip" uses the same compression algorithm and file format as does gzip,
but provides a table that can be used to randomly access compressed blocks
in the file. The use of 50-64kB blocks for compression typically degrades
compression by less than 10%, while maintaining acceptable random access
capabilities for all data in the file. As an added benefit, files
compressed with dictzip can be decompressed with gunzip.
For more information about dictzip, refer to DICT project, please see:
http://www.dict.org
When you create a dictionary, you should use .idx and .dict.dz in normal
case.
Stardict will search for the .ifo file, then open the .idx or
.idx.gz file and the .dict.dz or .dict file which is in the same directory and
has the same base name.
{2}. The ".ifo" file's format.
The .ifo file has the following format:
StarDict's dict ifo file
version=2.4.2
[options]
Note that the current "version" string must be "2.4.2" or "3.0.0". If it's not,
then StarDict will refuse to read the file.
If version is "3.0.0", StarDict will parse the "idxoffsetbits" option.
[options]
---------
In the example above, [options] expands to any of the following lines
specifying information about the dictionary. Each option is a keyword
followed by an equal sign, then the value of that option, then a
newline. The options may be appear in any order.
Note that the dictionary must have at least a bookname, a wordcount and a
idxfilesize, or the load will fail. All other information is optional. All
strings should be encoded in UTF-8.
Available options:
bookname= // required
wordcount= // required
synwordcount= // required if ".syn" file exists.
idxfilesize= // required
idxoffsetbits= // New in 3.0.0
author=
email=
website=
description= // You can use
for new line.
date=
sametypesequence= // very important.
dicttype=
wordcount is the count of word entries in .idx file, it must be right.
idxfilesize is the size(in bytes) of the .idx file, even the .idx is compressed
to a .idx.gz file, this entry must record the original .idx file's size, and it
must be right too. The .gz file don't contain its original size information,
but knowing the original size can speed up the extraction to memory, as you
don't need to call realloc() for many times.
idxoffsetbits can be 64 or 32. If "idxoffsetbits=64", the offset field of the
.idx file will be 64 bits.
dicttype is used by some special dictionary plugins, such as wordnet. Its value
can be "wordnet" presently.
The "sametypesequence" option is described in further detail below.
***
sametypesequence
You should first familiarize yourself with the .dict file format
described in the next section so that you can understand what effect
this option has on the .dict file.
If the sametypesequence option is set, it tells StarDict that each
word's data in the .dict file will have the same sequence of datatypes.
In this case, we expect a .dict file that's been optimized in two
ways: the type identifiers should be omitted, and the size marker for
the last data entry of each word should be omitted.
Let's consider some concrete examples of the sametypesequence option.
Suppose that a dictionary records many .wav files, and so sets:
sametypesequence=W
In this case, each word's entry in the .dict file consists solely of a
wav file. In the .dict file, you would leave out the 'W' character
before each entry, and you would also omit the 32-bits integer at the
front of each .wav entry that would normally give the entry's length.
You can do this since the length is known from the information in the
idx file.
As another example, suppose a dictionary contains phonetic information
and a meaning for each word. The sametypesequence option for this
dictionary would be:
sametypesequence=tm
Once again, you can omit the 't' and 'm' characters before each data
entry in the .dict file. In addition, you should omit the terminating
'\0' for the 'm' entry for each word in the .dict file, as the length
of the meaning string can be inferred from the length of the phonetic
string (still indicated by a terminating '\0') and the length of the
entire word entry (listed in the .idx file).
So for cases where the last data entry for each word normally requires
a terminating '\0' character, you should omit this character in the
dict file. And for cases where the last data entry for each word
normally requires an initial 32-bits number giving the length of the
field (such as WAV and PNG entries), you must omit this number in the
dictionary.
Every dictionary should try to use the sametypesequence feature to
save disk space.
***
{3}. The ".idx" file's format.
The .idx file is just a word list.
The word list is a sorted list of word entries.
Each entry in the word list contains three fields, one after the other:
word_str; // a utf-8 string terminated by '\0'.
word_data_offset; // word data's offset in .dict file
word_data_size; // word data's total size in .dict file
word_str gives the string representing this word. It's the string
that is "looked up" by the StarDict.
Two or more entries may have the same "word_str" with different
word_data_offset and word_data_size. This may be useful for some
dictionaries. But this feature is only well supported by
StarDict-2.4.8 and newer.
The length of "word_str" should be less than 256. In other words,
(strlen(word) < 256).
If the version is "3.0.0" and "idxoffsetbits=64", word_data_offset will
be 64-bits unsigned number in network byte order. Otherwise it will be
32-bits.
word_data_size should be 32-bits unsigned number in network byte order.
It is possible the different word_str have the same word_data_offset and
word_data_size, so multiple word index point to the same definition.
But this is not recommended, for mutiple words have the same definition,
you may create a ".syn" file for them, see section 4 below.
The word list must be sorted by calling stardict_strcmp() on the "word_str"
fields. If the word list order is wrong, StarDict will fail to function
correctly!
============
gint stardict_strcmp(const gchar *s1, const gchar *s2)
{
gint a;
a = g_ascii_strcasecmp(s1, s2);
if (a == 0)
return strcmp(s1, s2);
else
return a;
}
============
g_ascii_strcasecmp() is a glib function:
Unlike the BSD strcasecmp() function, this only recognizes standard
ASCII letters and ignores the locale, treating all non-ASCII characters
as if they are not letters.
stardict_strcmp() works fine with English characters, but the other
locale characters' sorting is not so good, in this case, you can enable
the collation feature, see section 6.
{4}. The ",syn" file's format.
This file is optional, and you should notice tree dictionary needn't this file.
Only StarDict-2.4.8 and newer support this file.
The .syn file contains information for synonyms, that means, when you input a
synonym, StarDict will search another word that related to it.
The format is simple. Each item contain one string and a number.
synonym_word; // a utf-8 string terminated by '\0'.
original_word_index; // original word's index in .idx file.
Then other items without separation.
When you input synonym_word, StarDict will search original_word;
The length of "synonym_word" should be less than 256. In other
words, (strlen(word) < 256).
original_word_index is a 32-bits unsigned number in network byte order.
Two or more items may have the same "synonym_word" with different
original_word_index.
The items must be sorted by stardict_strcmp() with synonym_word.
{5}. The offset cache file's format.
StarDict-2.4.8 start to support cache files, this feature can speed up
loading and save memory as mmap() the cache file. The cache file names
are .idx.oft and .syn.oft, the format is:
First a utf-8 string terminated by '\0', then many 32-bits numbers as
the wordoffset index, this index is sparse, and "ENTR_PER_PAGE=32",
they are not stored in network byte order.
The string must begin with:
=====
StarDict's oft file
version=2.4.8
=====
Then a line like this:
url=/usr/share/stardict/dic/stardict-somedict-2.4.2/somedict.idx
This line should have a ending '\n'.
StarDict will try to create the .oft file at the same directory of
the .ifo file first, if failed, then try to create it at
~/.cache/stardict/, ~/.cache is get by g_get_user_cache_dir().
If two or more dictionaries have the same file name, StarDict will
create somedict.idx.oft, somedict(2).idx.oft, somedict(3).idx.oft,
etc. for them respectively, each with different "url=" in the
beginning string.
{6}. The collation file's format.
StarDict-2.4.8 start to support collation, that sort the word
list by collate function. It will create collation file which
names .idx.clt and .syn.clt, the format is a little like offset
cache file:
First a utf-8 string terminated by '\0', then many 32-bits numbers as
the index that sorted by the collate function, they are not stored
in network byte order.
The string must begin with:
=====
StarDict's clt file
version=2.4.8
=====
Then two lines like this:
url=/usr/share/stardict/dic/stardict-somedict-2.4.2/somedict.idx
func=0
The second line should have a ending '\n' too.
StarDict support these collate functions currently:
typedef enum {
UTF8_GENERAL_CI = 0,
UTF8_UNICODE_CI,
UTF8_BIN,
UTF8_CZECH_CI,
UTF8_DANISH_CI,
UTF8_ESPERANTO_CI,
UTF8_ESTONIAN_CI,
UTF8_HUNGARIAN_CI,
UTF8_ICELANDIC_CI,
UTF8_LATVIAN_CI,
UTF8_LITHUANIAN_CI,
UTF8_PERSIAN_CI,
UTF8_POLISH_CI,
UTF8_ROMAN_CI,
UTF8_ROMANIAN_CI,
UTF8_SLOVAK_CI,
UTF8_SLOVENIAN_CI,
UTF8_SPANISH_CI,
UTF8_SPANISH2_CI,
UTF8_SWEDISH_CI,
UTF8_TURKISH_CI,
COLLATE_FUNC_NUMS
} CollateFunctions;
These UTF8_*_CI functions comes from MySQL in fact.
The file's locate path just like the .oft file.
Notice, for "somedict.idx.gz" file, the corresponding collation
file is somedict.idx.clt, but not somedict.idx.gz.clt, the
"url=" is somedict.idx, not somedict.idx.gz. So after you gzip
the .idx file, StarDict needn't create the .clt file again.
{7}. The ".dict" file's format.
The .dict file is a pure data sequence, as the offset and size of each
word is recorded in the corresponding .idx file.
If the "sametypesequence" option is not used in the .ifo file, then
the .dict file has fields in the following order:
==============
word_1_data_1_type; // a single char identifying the data type
word_1_data_1_data; // the data
word_1_data_2_type;
word_1_data_2_data;
...... // the number of data entries for each word is determined by
// word_data_size in .idx file
word_2_data_1_type;
word_2_data_1_data;
......
==============
It's important to note that each field in each word indicates its
own length, as described below. The number of possible fields per
word is also not fixed, and is determined by simply reading data until
you've read word_data_size bytes for that word.
Suppose the "sametypesequence" option is used in the .idx file, and
the option is set like this:
sametypesequence=tm
Then the .dict file will look like this:
==============
word_1_data_1_data
word_1_data_2_data
word_2_data_1_data
word_2_data_2_data
......
==============
The first data entry for each word will have a terminating '\0', but
the second entry will not have a terminating '\0'. The omissions of
the type chars and of the last field's size information are the
optimizations required by the "sametypesequence" option described
above.
If "idxoffsetbits=64", the file size of the .dict file will be bigger
than 4G. Because we often need to mmap this large file, and there is
a 4G maximum virtual memory space limit in a process on the 32 bits
computer, which will make we can get error, so "idxoffsetbits=64"
dictionary can't be loaded in 32 bits machine in fact, StarDict will
simply print a warning in this case when loading. 64-bits computers
should haven't this limit.
Type identifiers
----------------
Here are the single-character type identifiers that may be used with
the "sametypesequence" option in the .idx file, or may appear in the
dict file itself if the "sametypesequence" option is not used.
Lower-case characters signify that a field's size is determined by a
terminating '\0', while upper-case characters indicate that the data
begins with a network byte-ordered guint32 that gives the length of
the following data's size(NOT the whole size which is 4 bytes bigger).
'm'
Word's pure text meaning.
The data should be a utf-8 string ending with '\0'.
'l'
Word's pure text meaning.
The data is NOT a utf-8 string, but is instead a string in locale
encoding, ending with '\0'. Sometimes using this type will save disk
space, but its use is discouraged.
'g'
A utf-8 string which is marked up with the Pango text markup language.
For more information about this markup language, See the "Pango
Reference Manual."
You might have it installed locally at:
file:///usr/share/gtk-doc/html/pango/PangoMarkupFormat.html
't'
English phonetic string.
The data should be a utf-8 string ending with '\0'.
Here are some utf-8 phonetic characters:
θʃŋʧðʒæıʌʊɒɛəɑɜɔˌˈːˑṃṇḷ
æɑɒʌәєŋvθðʃʒɚːɡˏˊˋ
'x'
A utf-8 string which is marked up with the xdxf language.
See http://xdxf.sourceforge.net
StarDict have these extention:
can have "type" attribute, it can be "image", "sound", "video"
and "attach".
can have "k" attribute.
'y'
Chinese YinBiao or Japanese KANA.
The data should be a utf-8 string ending with '\0'.
'k'
KingSoft PowerWord's data. The data is a utf-8 string ending with '\0'.
It is in XML format.
'h'
Html codes.
'n'
WordNet data.
'r'
Resource file list.
The content can be:
img:pic/example.jpg // Image file
snd:apple.wav // Sound file
vdo:film.avi // Video file
att:file.bin // Attachment file
More than one line is supported as a list of available files.
StarDict will find the files in the Resource Storage.
The image will be shown, the sound file will have a play button.
You can "save as" the attachment file and so on.
'W'
wav file.
The data begins with a network byte-ordered guint32 to identify the wav
file's size, immediately followed by the file's content.
'P'
Picture file.
The data begins with a network byte-ordered guint32 to identify the picture
file's size, immediately followed by the file's content.
'X'
this type identifier is reserved for experimental extensions.
{8}. Resource Storage
Resource Storage store the external file in 'r' resource file list, the
image in html code, the image, media and other files in wiki tag.
It have two forms:
1. Direct directory and files in the "res" sub-directory.
2. The res.rifo, res.ridx and res.rdic database.
Direct files may have file name encoding problem, as Linux use UTF-8 and
Windows use local encoding, so you'd better just use ASCII file name, or
use databse to store UTF-8 file name.
Databse may need to extract the file(such as .wav) file to a temporary
file, so not so efficient compare to direct files. But database have the
advantage of compressing.
You can convert the res directory and the res database from each other by
the dir2resdatabse and resdatabase2dir tools.
StarDict will try to load the storage database first, then try the direct
files form.
The format of the res.rifo file:
StarDict's storage ifo file
version=3.0.0
filecount= // required.
idxoffsetbits= // optional.
The format of the res.ridx file:
filename; // A string end with '\0'.
offset; // 32 or 64 bits unsigned number in network byte order.
size; // 32 bits unsigned number in network byte order.
filename can include a path too, such as "pic/example.png". filename is
case sensitive, and there should have no two same filenames in all the
entries.
if "idxoffsetbits=64", then offset is 64 bits.
These three items are repeated as each entry.
The entries are sorted by the strcmp() function with the filename field.
It is possible that different filenames have the same offset and size.
The format of the res.rdic file:
It is just the join of each resource files.
You can dictzip this file as res.rdic.dz
{9}. Tree Dictionary
The tree dictionary support is used for information viewing, etc.
A tree dictionary contains three file: sometreedict.ifo, sometreedict.tdx.gz
and sometreedict.dict.dz.
It is better to compress the .tdx file, as it is always load into memory.
The .ifo file has the following format:
StarDict's treedict ifo file
version=2.4.2
[options]
Available options:
bookname= // required
tdxfilesize= // required
wordcount=
author=
email=
website=
description=
date=
sametypesequence=
wordcount is only used for info view in the dict manage dialog, so it is not
important in tree dictionary.
The .tdx file is just the word list.
-----------
The word list is a tree list of word entries.
Each entry in the word list contains four fields, one after the other:
word_str; // a utf-8 string terminated by '\0'.
word_data_offset; // word data's offset in .dict file
word_data_size; // word data's total size in .dict file. it can be 0.
word_subentry_count; //how many sub word this entry has, 0 means none.
Subentry is immidiately followed by its parent entry. This make the order is
just as when a tree list with all its nodes extended, then sort from top to
bottom.
word_data_offset, word_data_size and word_subentry_count should be 32-bits
unsigned numbers in network byte order.
The .dict file's format is the same as the normal dictionary.
{10}. More information.
You can read "src/lib.cpp", "src/dictmanagedlg.cpp" and
"src/tools/*.cpp" for more information.
After you have build a dictionary, you can use "stardict_verify" to verify the
dictionary files. You can find it at "src/tools/".
If you have any questions, email me. :)
Thanks to Will Robinson for cleaning up this file's
English.
Hu Zheng 
http://forlinux.yeah.net
2007.4.24