Sunday, April 19, 2009

Text Analysis

Thanks to Matt Bell who today brought up the subject of the repetitive use of words, and someone else to mention the use of text analyzers... I was curious to check out a flash that I recently published at elimae, Tree Reader. This short text has developed a bit of interest for itself beyond elimae, hopefully more on that later.

The data below, not in a table format (meaning no vertical alignment) is confusing but interesting. For one thing, out of 104 words 93 are shown here as unique (without my going to any further work to verify this). For another, the Gunning-Fog Index for readability seems to be off the chart.

The most frequently used word in the piece is 'fell'.

A much longer (1,800 words) non-fictional essay, composed as an educational tool, came back with a Gunning-Fog Index of 10.

~~~

Textalyser Results
The complete results, including complexity factor, and other features

Total word count : 104

Number of different words : 93
Complexity factor (Lexical Density) : 89.4%
Readability (Gunning-Fog Index) : (6-easy 20-hard) 79.2
Total number of characters : 1006
Number of characters without spaces : 567
Average Syllables per Word : 1.4
Sentence count : 1
Average sentence length (words) : 197
Max sentence length (words) : 197

(in furrowed bark of the old basswood tree as i read an older brother say ten with a younger sister on the train she was all over the place in and out of the seat she fell in the aisle and taunted him then bumped her elbows into commuters who smiled or winced or stared defiantly or shut their eyes to retreat to sleep if they saw nuisance or themselves reflected in their memory of childhood as a climber of trees as he said eat and they ate cold fries and paper wrapped hamburgers she yelped and whined they spilled dark brown soda while all he wanted was his own seat to sit he pushed her down and off of his head where she grabbed the leaves away from his own room and as the commuter train slid on the iron line further east the passengers thinned out stop by station stop until eventually the young boy got to sit alone he fell over and slept and the ride fell quiet as it passed the old basswood tree in the lawn of the cemetery as i read in the metallic flicker of sunlight on the afternoon window)

Min sentence length (words) : 0

Readability (Alternative) beta : (100-easy 20-hard, optimal 60-70) -111.5
Frequency and top words :
Word Occurrences Frequency Rank
fell 3 2.9% 1
sit 2 1.9% 2
train 2 1.9% 2
own 2 1.9% 2
stop 2 1.9% 2
seat 2 1.9% 2
tree 2 1.9% 2
old 2 1.9% 2
read 2 1.9% 2
basswood 2 1.9% 2
Word Length :
Word Length (characters) Word count Frequency
3 56 28.4%
2 38 19.3%
4 37 18.8%
5 19 9.6%
6 14 7.1%
7 12 6.1%
8 8 4.1%
9 5 2.5%
10 4 2%
1 4 2%
Syllable count :
Syllable count Word count Frequency
1 133 68.9%
2 45 23.3%
3 13 6.7%
4 2 1%
2 word phrases frequency :
Expression Expression count Frequency Prominence
in the 3 1.5% 31.6
on the 3 1.5% 40.3
of the 3 1.5% 63.3
to sit 2 1% 31.4
his own 2 1% 39.5
i read 2 1% 50.5
as i 2 1% 51
basswood tree 2 1% 53.6
old basswood 2 1% 54.1
the old 2 1% 54.6
3 word phrases frequency :
Expression Expression count Frequency Prominence
as i read 2 1% 50.8
old basswood tree 2 1% 53.8
the old basswood 2 1% 54.4
4 word phrases frequency :
Expression Expression count Frequency Prominence
the old basswood tree 2 1% 54.1
Unfiltered wordcount :
Expression Expression count Frequency Prominence
the 16 8.1% 39.6
and 9 4.6% 48.1
of 7 3.6% 50.8
as 6 3% 44.8
in 6 3% 57.5
he 4 2% 41.5
to 4 2% 50.4
she 4 2% 65.1
or 4 2% 71.1
fell 3 1.5% 37.9
his 3 1.5% 39.6
on 3 1.5% 40.6
they 3 1.5% 57.9
stop 2 1% 24.1
sit 2 1% 31.2
own 2 1% 39.3
read 2 1% 50.3
i 2 1% 50.8
over 2 1% 52
tree 2 1% 53.3
basswood 2 1% 53.8
old 2 1% 54.3
out 2 1% 54.8
her 2 1% 59.4
train 2 1% 59.9
seat 2 1% 63.5
was 2 1% 66.8
their 2 1% 67
all 2 1% 67.3
a 2 1% 76.1