Jump to content
IndiaDivine.org

RE: Manipulating, sorting, converting, editing, searching, translating Chinese text digitally

Rate this topic


Guest guest

Recommended Posts

Hi Bob & Helene,

 

Bob wrote

> A very large compilation is available as a download that can be burned

> to CD from Nigel's site at Chang Gung Medical College:

> http://memo.cgu.edu.tw/wiseman/WebPublished.htm It is not necessary to

> copy characters from a database to use them.

 

With respect, Bob, I disagree. If we can copy text in Western languages, why not

copy text in Eastern languages?

 

> It is likely that an electronic ... PDCM will be available on Pleco and

> Wenlin this year. Nonetheless, people will want to properly set their

> OS for Chinese.

 

That would be great!. Meanwhile, I am frustrated by WWW sites, and some TCM

CDs that HAVE the Chinese characters (and great info), but that use PDF or other

(especially graphic) format that prevents capture of the text. For example,

Nigel

gave me a copy of his shorter Chinese dictionary, but it is of little value to

my

research because it prevents copying/saving in formats that I need.

 

It is very useful to be able to COPY the Chinese characters in digital (not

graphic)

form for many reasons. For example, one can file the characters and linked

fields

in MS Word, or Excel, or a database. This allows one to sort, search and compare

them quickly. It also allows one to use them in searches of one's hard-drive

(via

Google Desktop), or search WWW (via any Chinese-enabled search engine. Also,

it is futile to attemp machine translations if the characters are in graphic

(jpeg or

bmp-type) format.

 

Re your advice on loading and configuring: I use Windows XP Professional and

my PC can display and store Chinese, Japanese and Korean characters in the

Browser and MS Office files.

 

However, my PC was not configured to input Chinese, or to translate it, or to

convert simple to formal script, or convert Hanzi to pinyin (or vice versa). I

have

copies of Wenlin and NJStar software installed but I cannot use them very

efficiently.

 

Therefore, I was excited when you (Bob & Helene) wrote that I might be able to

add plug-ins that would enable me to input and transform Chinese characters or

pinyin in MS Office.

 

My PC has 55Gb of hard drive, 1Gb memory and a Pentium 4 (3GHz) processor.

This afternoon, I spent several hours searching for and installing the MS XP

Global IMEs for Chinese text:

 

(a) in Simplified format (imechs.exe– I could not locate MSIME5SC.exe in the MS

site, or on WWW);

 

(b) Traditional format (imecht.exe – I could not locate MSIME5TC.exe in the MS

site, or on WWW).

 

© I also installed the MS Office XP Tool: Simplified Chinese Language Pack

(ie_zhc.exe).

 

> To use Global IME, enter one of the supported applications and select

> from the menu the language you desire. Bob

 

Helene wrote:

> If it is any help I had this installed- it is very useful. However if

> you are using classical characters make sure you have this in both pin

> yin (PRC) and classical characters (Taiwan). I use this along with a

> program Powerword 2003 (which has probably been updated since) which I

> use as a dictionary and for checking text and translations into and

> from either language. Powerword however is only in pinyin so when

> working on old texts one is likely to have to use a classical chinese

> dictionary alongside it. Helene

 

Despite several attempts (including reloading the three .exe files) and having

selected PRC Chinese or Taiwanese Chinese in MS Word Tools – Language –

Set Language procedure, I could not get the system to work re translations or

transformations of Traditional Chinese characters from WWW, nor could I get it

to

input Chinese characters.

 

My hair lies in handfuls on my desk!! I wish that one of you (Bob or Helene)

lived

nearby to help me get MS Word set up to handle input, conversion & translation

of

Chinese characters!

 

Any advice on how I can proceed from here?

 

Best regards,

Phil

 

 

 

 

 

Link to comment
Share on other sites

Thought I'd let you know: in Adobe Reader 7.0, you can select and copy

image or text data in a pdf file.

 

SE

 

wrote:

 

>Hi Bob & Helene,

>

>Bob wrote

>

>

>>A very large compilation is available as a download that can be burned

>>to CD from Nigel's site at Chang Gung Medical College:

>>http://memo.cgu.edu.tw/wiseman/WebPublished.htm It is not necessary to

>>copy characters from a database to use them.

>>

>>

>

>With respect, Bob, I disagree. If we can copy text in Western languages, why

not

>copy text in Eastern languages?

>

>

>

>>It is likely that an electronic ... PDCM will be available on Pleco and

>>Wenlin this year. Nonetheless, people will want to properly set their

>>OS for Chinese.

>>

>>

>

>That would be great!. Meanwhile, I am frustrated by WWW sites, and some TCM

>CDs that HAVE the Chinese characters (and great info), but that use PDF or

other

>(especially graphic) format that prevents capture of the text. For example,

Nigel

>gave me a copy of his shorter Chinese dictionary, but it is of little value to

my

>research because it prevents copying/saving in formats that I need.

>

>It is very useful to be able to COPY the Chinese characters in digital (not

graphic)

>form for many reasons. For example, one can file the characters and linked

fields

>in MS Word, or Excel, or a database. This allows one to sort, search and

compare

>them quickly. It also allows one to use them in searches of one's hard-drive

(via

>Google Desktop), or search WWW (via any Chinese-enabled search engine. Also,

>it is futile to attemp machine translations if the characters are in graphic

(jpeg or

>bmp-type) format.

>

>Re your advice on loading and configuring: I use Windows XP Professional and

>my PC can display and store Chinese, Japanese and Korean characters in the

>Browser and MS Office files.

>

>However, my PC was not configured to input Chinese, or to translate it, or to

>convert simple to formal script, or convert Hanzi to pinyin (or vice versa). I

have

>copies of Wenlin and NJStar software installed but I cannot use them very

>efficiently.

>

>Therefore, I was excited when you (Bob & Helene) wrote that I might be able to

>add plug-ins that would enable me to input and transform Chinese characters or

>pinyin in MS Office.

>

>My PC has 55Gb of hard drive, 1Gb memory and a Pentium 4 (3GHz) processor.

>This afternoon, I spent several hours searching for and installing the MS XP

>Global IMEs for Chinese text:

>

>(a) in Simplified format (imechs.exe- I could not locate MSIME5SC.exe in the MS

>site, or on WWW);

>

>(b) Traditional format (imecht.exe - I could not locate MSIME5TC.exe in the MS

>site, or on WWW).

>

>© I also installed the MS Office XP Tool: Simplified Chinese Language Pack

>(ie_zhc.exe).

>

>

>

>>To use Global IME, enter one of the supported applications and select

>>from the menu the language you desire. Bob

>>

>>

>

>Helene wrote:

>

>

>>If it is any help I had this installed- it is very useful. However if

>>you are using classical characters make sure you have this in both pin

>>yin (PRC) and classical characters (Taiwan). I use this along with a

>>program Powerword 2003 (which has probably been updated since) which I

>>use as a dictionary and for checking text and translations into and

>>from either language. Powerword however is only in pinyin so when

>>working on old texts one is likely to have to use a classical chinese

>>dictionary alongside it. Helene

>>

>>

>

>Despite several attempts (including reloading the three .exe files) and having

>selected PRC Chinese or Taiwanese Chinese in MS Word Tools - Language -

>Set Language procedure, I could not get the system to work re translations or

>transformations of Traditional Chinese characters from WWW, nor could I get it

to

>input Chinese characters.

>

>My hair lies in handfuls on my desk!! I wish that one of you (Bob or Helene)

lived

>nearby to help me get MS Word set up to handle input, conversion & translation

of

>Chinese characters!

>

>Any advice on how I can proceed from here?

>

>Best regards,

>Phil

>

>

>

>

>

Link to comment
Share on other sites

Phil,

 

as I understand it you have installed the language pack but not the input

method.

 

You can read about installing input methods here http://newton.uor

edu/Departments & Programs/AsianStudiesDept/Language/asianlanguageinstallation_

P.html

 

there's also some info about the typing itself http://newton.uor

edu/Departments & Programs/AsianStudiesDept/Language/chinese_write.htm

 

I believe it is a good idea to learn how to type Chinese, you will be less

bound to the appearance of characters in files. However, this does require

us to know the pinyin and to recognize and select the character from a list

(during the input process).

 

I think Phil was more interested in copying and saving WITHOUT studying the

pinyin and the typing, right? The only solution to that problem would be

that databases exist as Phil requested them.

 

Has anyone tried those PDF converters? Do they convert Chinese PDF files

into a Chinese .doc file?

 

Tom.

 

----

 

 

01/21/06 03:56:26

Chinese Medicine

RE: Manipulating, sorting, converting, editing, searching,

translating Chinese text digitally

 

Hi Bob & Helene,

 

Bob wrote

> A very large compilation is available as a download that can be burned

> to CD from Nigel's site at Chang Gung Medical College:

> http://memo.cgu.edu.tw/wiseman/WebPublished.htm It is not necessary to

> copy characters from a database to use them.

 

With respect, Bob, I disagree. If we can copy text in Western languages, why

not

copy text in Eastern languages?

 

> It is likely that an electronic ... PDCM will be available on Pleco and

> Wenlin this year. Nonetheless, people will want to properly set their

> OS for Chinese.

 

That would be great!. Meanwhile, I am frustrated by WWW sites, and some TCM

CDs that HAVE the Chinese characters (and great info), but that use PDF or

other

(especially graphic) format that prevents capture of the text. For example,

Nigel

gave me a copy of his shorter Chinese dictionary, but it is of little value

to my

research because it prevents copying/saving in formats that I need.

 

It is very useful to be able to COPY the Chinese characters in digital (not

graphic)

form for many reasons. For example, one can file the characters and linked

fields

in MS Word, or Excel, or a database. This allows one to sort, search and

compare

them quickly. It also allows one to use them in searches of one's hard-drive

(via

Google Desktop), or search WWW (via any Chinese-enabled search engine. Also,

 

it is futile to attemp machine translations if the characters are in

graphic (jpeg or

bmp-type) format.

 

Re your advice on loading and configuring: I use Windows XP Professional and

 

my PC can display and store Chinese, Japanese and Korean characters in the

Browser and MS Office files.

 

However, my PC was not configured to input Chinese, or to translate it, or

to

convert simple to formal script, or convert Hanzi to pinyin (or vice versa).

I have

copies of Wenlin and NJStar software installed but I cannot use them very

efficiently.

 

Therefore, I was excited when you (Bob & Helene) wrote that I might be able

to

add plug-ins that would enable me to input and transform Chinese characters

or

pinyin in MS Office.

 

My PC has 55Gb of hard drive, 1Gb memory and a Pentium 4 (3GHz) processor.

This afternoon, I spent several hours searching for and installing the MS XP

 

Global IMEs for Chinese text:

 

(a) in Simplified format (imechs.exe– I could not locate MSIME5SC.exe in the

MS

site, or on WWW);

 

(b) Traditional format (imecht.exe – I could not locate MSIME5TC.exe in the

MS

site, or on WWW).

 

© I also installed the MS Office XP Tool: Simplified Chinese Language Pack

 

(ie_zhc.exe).

 

> To use Global IME, enter one of the supported applications and select

> from the menu the language you desire. Bob

 

Helene wrote:

> If it is any help I had this installed- it is very useful. However if

> you are using classical characters make sure you have this in both pin

> yin (PRC) and classical characters (Taiwan). I use this along with a

> program Powerword 2003 (which has probably been updated since) which I

> use as a dictionary and for checking text and translations into and

> from either language. Powerword however is only in pinyin so when

> working on old texts one is likely to have to use a classical chinese

> dictionary alongside it. Helene

 

Despite several attempts (including reloading the three .exe files) and

having

selected PRC Chinese or Taiwanese Chinese in MS Word Tools – Language –

Set Language procedure, I could not get the system to work re translations

or

transformations of Traditional Chinese characters from WWW, nor could I get

it to

input Chinese characters.

 

My hair lies in handfuls on my desk!! I wish that one of you (Bob or Helene)

lived

nearby to help me get MS Word set up to handle input, conversion &

translation of

Chinese characters!

 

Any advice on how I can proceed from here?

 

Best regards,

Phil

 

 

Link to comment
Share on other sites

Chinese Medicine , " "

<@e...> wrote:

 

> (especially graphic) format that prevents capture of the text. For

example, Nigel

> gave me a copy of his shorter Chinese dictionary, but it is of

little value to my

> research because it prevents copying/saving in formats that I need.

 

 

Phil

 

Try the latest version from the previously mentioned website of Nigel.

 

This latest version does allow you to copy the chinese character as

asian characters (not graphics).

 

I had the same experience as you had with a prevous version which

prevented copying but this one does allow copying. I use Acrobat

Reader 7.05 and Wenlin and all MS Office tools for writing and

editing. I all works together now.

 

I use his dictionary regularly now in my study especially for writing

and translating the formula names.

 

Best wishes

 

Alwin

Link to comment
Share on other sites

Tom Verhaeghe wrote:

<snip>

> Has anyone tried those PDF converters? Do they convert Chinese PDF files

> into a Chinese .doc file?

 

Hi Tom!

 

For Windows users the Gold Standard for handling PDF files is Adobe

Acrobat Professional. If you are a Linux user there are ways to convert

PDF by writing it to a Postscript file and then opening that file in a

text editor that supports postscript.

 

That said, I have no idea if the Chinese characters would work. I think

you would have to have the Chinese version of whatever Operating System

and whatever programs you were using.

 

There is an Adobe online converter which I have heard about but I have

not tried. Make a copy of the file you want to work with and try it.

 

<http://www.adobe.com/products/acrobat/access_onlinetools.html>

 

Regards,

 

Pete

Link to comment
Share on other sites

<http://www.adobe.com/products/acrobat/access_onlinetools.html>

 

Regards,

 

Pete

 

 

Hi Pete,

 

from the description on the page you linked I understand that this converter

will only convert English and other West-European languages.

 

Another tip that may help sometimes: in Mozilla Firefox, there is a PDF

download extension that gives you the option of opening the file as a PDF

file OR as a html file. The html file often allows copying, whereas the pdf

file often doesn't. This trick doesn't always work though.

If you have Powerword 2006 installed ( the newer version of the software a

forum member mentioned the other day) you can highlight terms with automatic

translation (no tcm terms though). Studying the tcm terms is not that hard;

Western medical terms in Chinese are harder imho.

 

Regards,

 

Tom.

 

 

 

Link to comment
Share on other sites

Tom Verhaeghe wrote:

> <http://www.adobe.com/products/acrobat/access_onlinetools.html>

>

> Regards,

>

> Pete

>

>

> Hi Pete,

>

> from the description on the page you linked I understand that this

> converter will only convert English and other West-European

> languages.

 

Hi Tom!

 

Try a google search in Chinese if you can, perhaps they have a Chinese

version.

 

Regards,

 

Pete

Link to comment
Share on other sites

Hi ALwin & ALl,

 

Alwin van Egmond wrote:

> Phil Try the latest version from the previously mentioned website of

> Nigel. This latest version does allow you to copy the chinese character

> as asian characters (not graphics). I had the same experience as you

> had with a prevous version which prevented copying but this one does

> allow copying. I use Acrobat Reader 7.05 and Wenlin and all MS Office

> tools for writing and editing. I all works together now. I use his

> dictionary regularly now in my study especially for writing and

> translating the formula names. Best wishes, Alwin

 

I updated to Adobe 7.0.5 today but still cannot capture Chinese text for,

for example:

http://www.paradigm-

pubs.com/paradigm/refs/wiseman/AllChineseRef.pdf

 

http://www.paradigm-pubs.com/paradigm/refs/wiseman/drugname.pdf

 

I can see the text, block it and press Copy, but the text does not transfer

to Wenlin.

 

Best regards,

 

HOME + WORK: 1 Esker Lawns, Lucan, Dublin, Ireland

Tel: (H): +353-(0) or (M): +353-(0)

<

 

 

 

" Man who says it can't be done should not interrupt man doing it " -

Chinese Proverb

Link to comment
Share on other sites

Hi Tom, & All,

 

Tom Verhaeghe wrote:

 

> Another tip that may help sometimes: in Mozilla Firefox, there is a

> PDF download extension that gives you the option of opening the file

> as a PDF file OR as a html file.

 

Tom, the pile of pulled-out hair on my desk is growing!

 

I downloaded Mozilla Firefox today. Yes, it has an option to save as

text, but that does not solve the problem of locked PDF files; the text

save does not retain the chinese characters.

 

So I pose this question to any of you with experience of capturing

Chinese text from PDF files - Can YOU capture into an MS Office file

(Word, Excel or FrontPage):

 

(a) the SECOND LINE (Chinese characters) of the file:

 

http://www.paradigm-pubs.com/paradigm/refs/wiseman/drugname.pdf

 

and

 

(b) the first few lines of the Chinese text at:

www.ihp.sinica.edu.tw/~medicine/ashm/lectures/paper/paper22.pdf

 

If you can do so, please email me off-list.

 

Many thanks,

Best regards,

 

HOME + WORK: 1 Esker Lawns, Lucan, Dublin, Ireland

Tel: (H): +353-(0) or (M): +353-(0)

<

 

 

 

" Man who says it can't be done should not interrupt man doing it " -

Chinese Proverb

 

 

 

Link to comment
Share on other sites

Hi Phil and others

 

Chinese Medicine , " "

<@e...> wrote:

>

> Hi ALwin & ALl,

>

> I updated to Adobe 7.0.5 today but still cannot capture Chinese

text for,

> for example:

> http://www.paradigm-pubs.com/paradigm/refs/wiseman/AllChineseRef.pdf

>

> http://www.paradigm-pubs.com/paradigm/refs/wiseman/drugname.pdf

>

> I can see the text, block it and press Copy, but the text does not

transfer

> to Wenlin.

 

 

 

It all depends on how the pdf-file was created, you cannot in general

always copy as asian characters.

 

But you CAN copy as asian characters from the C-E Dictionary of

. It is available from Nigel's website at:

http://memo.cgu.edu.tw/wiseman/dict_v4.zip (Be aware this file is 60

MB)

 

You can only copy asian characters when the pdf-file has embedded

asian characters. This C-E Dictionary pdf-document has embedded asian

fonts (you can see in the pdf-file under Document Properties) and

therefore you can copy the characters as true characters.

 

The other two files you mentioned do not have this:

 

> http://www.paradigm-pubs.com/paradigm/refs/wiseman/AllChineseRef.pdf

is a scanned document (let say a photocopy) and is only an image and

does not have any characters in it, not even western characters.

 

> http://www.paradigm-pubs.com/paradigm/refs/wiseman/drugname.pdf

does not have asian fonts embedded and therefore you cannot copy them

as true characters.

 

So it all depends on the way the pdf-file was created and whether the

security settings at time of creation allows you to copy from the

text.

 

But as I said, the new version of his chinese-english medical

dictionary does allow you to copy and as asian characters.

 

I hope this was helpfull for you

 

Best wishes

 

Alwin

Link to comment
Share on other sites

Hi Alwin & All,

 

Alwin van Egmond wrote:

> It all depends on how the pdf-file was created, you cannot in general

> always copy as asian characters. But you CAN copy as asian characters

> from the C-E Dictionary of . It is available from

> Nigel's website at: http://memo.cgu.edu.tw/wiseman/dict_v4.zip (Be

> aware this file is 60 MB)

 

Alwin, many thanks for your note.

 

I downloaded NIgel's Dictionary from

http://memo.cgu.edu.tw/wiseman/dict_v4.zip and have started to

capture the data from the main file (#3), using Adobe Acrobat

Professional 7.05.

 

#3 is a HUGE file - 914 pages long. It is slow, tedious work, but is the

most comprehensive data that I have seen.

 

There are two minor problems (both fixable) with file#3:

 

(a) My Adobe can capture only 2-4 pages each time, so 250-450 chunk-

by-chunk copies are needed to reconstruct the entire file in a Wenlin

file.

 

(b) Many lines are broken, i.e. have follow-on parts in the following

line(s). These broken must be fixed before I begin to break the lines into

fields. Once that is done, I can construct a table for insertion into a

sortable file, such as Excel.

 

> You can only copy asian characters when the pdf-file has embedded

> asian characters. This C-E Dictionary pdf-document has embedded asian

> fonts (you can see in the pdf-file under Document Properties) and

> therefore you can copy the characters as true characters. The other two

> files you mentioned do not have this:

> http://www.paradigm-pubs.com/paradigm/refs/wiseman/AllChineseRef.pdf

> is a scanned document (let say a photocopy) and is only an image and

> does not have any characters in it, not even western characters.

> http://www.paradigm-pubs.com/paradigm/refs/wiseman/drugname.pdf does

> not have asian fonts embedded and therefore you cannot copy them as

> true characters. So it all depends on the way the pdf-file was created

> and whether the security settings at time of creation allows you to

> copy from the text.

 

As you indicated, none of the PDF Converters that I have tried can do

the job on those files.

 

> But as I said, the new version of his chinese-english medical

> dictionary does allow you to copy and as asian characters. I hope this

> was helpfull for you Best wishes Alwin

 

Most helpful.

Many thanks,

Phil

 

 

 

 

Link to comment
Share on other sites

Join the conversation

You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...