Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion GroupsEnglish UsageBritish EnglishESL Teaching
Learnglish.com
Contact UsLink To UsSearch & Site Map

Discussion Groups / ESL Teaching / May 2005



Tip: Looking for answers? Try searching our database.

Software to analyze digitized texts

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Steve - 13 May 2005 09:28 GMT
Hi,

Occasionally I go to Project Gutenberg ( http://www.gutenberg.org ) for
public domain books that have been digitized and made available for
downloading. I thought it might be interesting to examine them using text
analysis software that would give me an idea of the difficulty level and
frequency count of words and phrases. I would also like to have the word
count and frequency results graphed or made available for export to Excel.

Is anyone familiar with a software package, freeware or otherwise, that can
handle this level of analysis and charting?

I did find the web site textalyser (
http://www.lexicool.com/text_analyzer.asp ) that has some of these
capabilities but no way to specify varying phrase lengths or export options
for charting nor does it handle book size jobs.

Thanks for the help, Steve.
credoquaabsurdum - 13 May 2005 14:37 GMT
> Is anyone familiar with a software package, freeware or otherwise, that can
> handle this level of analysis and charting?

I think the Oxford Wordsmythe tools at oup.com are what you're looking
for, but you should carefully check them out before you sign up. I
might not have a real handle on what you really need.

You might want to hit postgraduate student discussion groups at major
applied linguistics places like U. Edinburgh and Nottingham.

Good luck.
John  Ings - 13 May 2005 15:41 GMT
>Hi,
>
[quoted text clipped - 12 lines]
>capabilities but no way to specify varying phrase lengths or export options
>for charting nor does it handle book size jobs.

Is there any possibility that you might have the stones to try a
little programming? There are a variety of computer languages that do
this sort of thing. You might be on a learning curve for a while, but
you would end up having complete control of your results. Compilers,
interpreters and user manuals for these languages are available on the
net for free, so all you need invest is your time. PERL used to be the
language of choice: http://www.perl.com/ but it has been surpassed in
my opinion by PYTHON http://www.python.org

Chances are any software packcage that offers what you want will have
been written in one of those two, since such parsing and analysis is
their forte.
Lee Sau Dan - 13 May 2005 16:31 GMT
>>>>> "Steve" == Steve  <abc@123.com> writes:

   Steve> Hi, Occasionally I go to Project Gutenberg (
   Steve> http://www.gutenberg.org ) for public domain books that
   Steve> have been digitized and made available for downloading.

Digitized?  An  *image* (e.g.  in  JPEG, GIF, PNG formats)  of scanned
book pages and documents is also  in digitized form.  I hope you don't
mean that.  If you're talking  about text format, then you have better
luck.

   Steve> I  thought it might  be interesting  to examine  them using
   Steve> text analysis  software that would  give me an idea  of the
   Steve> difficulty   level

No.  Unless you define  "difficulty level".  Computers can't read your
mind.  They  won't know  what you mean  by "difficult".  If  you don't
give them a clear and unambiguous mathematical formula for "difficulty
level", they can't do it.

   Steve> and frequency count of words and phrases.  

Frequency count of words is trivial.

Phrases...   that's more  difficult, esp.   with  ambiguous sentences.
("Time flies  like an arrow;  fruit flies like  a banna."  How  do you
break  them  up into  phrases?)   Computers  are  pretty incapable  of
dealing with ambiguities.

   Steve> I would also like to have the word count and frequency
   Steve> results graphed or made available for export to Excel.

   Steve> Is anyone familiar with a software package, freeware or
   Steve> otherwise, that can handle this level of analysis and
   Steve> charting?

Word frequency is trivial.  A simple unix command does it:

       tr -dc a-zA-Z < *.txt | sort | uniq -c | sort -nr

would give you  the frequency of words, listed  in descending order of
frequency, appearing in the  files matching "*.txt".  Pipe the results
into a charting software (e.g. gnuplot) and you're done.

   Steve> I    did     find    the    web     site    textalyser    (
   Steve> http://www.lexicool.com/text_analyzer.asp  ) that  has some
   Steve> of these capabilities but  no way to specify varying phrase
   Steve> lengths or  export options for charting nor  does it handle
   Steve> book size jobs.

And I guess it won't do "difficult level".  Right?

Signature

Lee Sau Dan                     ???u??                          ~{@nJX6X~}

E-mail: danlee@informatik.uni-freiburg.de
Home page: http://www.informatik.uni-freiburg.de/~danlee

Jan - 14 May 2005 09:34 GMT
You can try

http://taporware.mcmaster.ca/~taporware/

for one-off jobs over the Internet, but I don't think it does
book-length texts.

For 'charting', if you're not into programming, find a program that
spits an index or word frequency count out in a text file and import it
into a spreadsheet (e.g. Excel).

Do you have a Mac? I have a free program I downloaded that does word
counts, indexing, concordancing etc. and has managed reasonable-length
English teaching coursebooks, though it can't do right/left sorting. I
can never remember the name, but when I get to my Mac on Monday I can
let you have it if it'll help.

And finally, if you have a bit of money, you can find programs quite
easily searching through google.

Jan
Jan - 14 May 2005 09:38 GMT
PS You mentioned 'difficulty level'. Have a look at TextLadder:

http://www.readingenglish.net/software/

I have never had the time to sit down and plod through what it does,
but it sounds interesting enough. It does some kind of ordering of
texts to even out the number of new words in each text a student
encounters over a reading program.

Jan
steverossiter@sbcglobal.net - 16 May 2005 09:08 GMT
Thank you everyone for your help, Steve.
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2012 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.