KRISTEN'S BOARD
KB - a better class of pervert

News:

Word Frequency Analysis Tool

Sweetums · 953

0 Members and 1 Guest are viewing this topic.

Offline Sweetums

  • Total freak
  • *****
    • Posts: 762
    • Woos/Boos: +97/-1
    • Gender: Male
  • Disclaimer: I am an idiot. Don't listen to me.
on: December 23, 2020, 05:06:40 PM
I recently wrote a computer program that produces a list of the words in a story in order of most frequent to least frequent and how many times they appear. I've started using it to look at my stories for overuse of words like "again," "however," "orgasm," and "cooter." Okay, I'm kidding about "cooter."

It skips the pronouns, conjunctions, etc. so you can see the "meaty words."

As of this writing, I only have a binary for Ubuntu 18.04, but I'd be happy to share the source code or port it to other platforms if people want to use it. Also, you could just DM me your story and I'll send you back the histogram.


Offline msslave

  • Co-POY 2019
  • Burnt at the stake
  • *******
    • Posts: 8,821
    • Woos/Boos: +1376/-3
    • Gender: Male
Reply #1 on: December 23, 2020, 05:11:10 PM
So...now there is an app for that. :D

Congrats on creating an aid for writers. You sir have skills.

WOO

Well trained and been made compliant....by my cat Neville


Hilda

  • Guest
Reply #2 on: February 06, 2022, 01:48:25 PM
I recently wrote a computer program that produces a list of the words in a story in order of most frequent to least frequent and how many times they appear
[snip]
As of this writing, I only have a binary for Ubuntu 18.04, but I'd be happy to share the source code or port it to other platforms if people want to use it.

Apologies for the late response to this valuable message.

If any KB members don't work with Ubuntu, and can't take advantage of Sweetums' generous offer, there are quite a few word-count concordance programs over in the Github repository, written in a number of languages (C++, Java, Python, PHP, Go, etc.). A web search will also turn up several apps for Windows.

I haven't used such tools myself, but I know many linguists who use them regularly for analyzing language patterns and keeping track of changes in language usage.

Something well worth exploring for authors wishing to stand back and see their writing from a more objective point of view.



Offline Vela Nanashi

  • Global Moderator
  • Total freak
  • ******
    • Posts: 690
    • Woos/Boos: +391/-2
  • Imagination > Reality
Reply #3 on: February 06, 2022, 02:44:33 PM
Woo to Sweetums for coding a useful program :)

My rusty coder brain thinks that the core of the program requires something to break the text into words, then a hashmap or dictionary from string to int and then do counts that way, then remap the hashmap/dictionary to a sorted type of map, that maps integer to sorted sets of strings :) I assume eliminating capitalized words not in the beginning of sentences would allow removal of names, but seeing names might be useful too. Also when mapping it is probably best to store strings as to lowercase or uppercase, and keep with the integer a string set to hold the variations of that strings capitalizations. Not sure if that is how you did it Sweetums :)



Offline Pornhubby

  • POY 2013
  • Super Freak
  • Burnt at the stake
  • ******
    • Posts: 7,492
    • Woos/Boos: +1608/-24
  • Ph.D in Perversity a/k/a_ToeinH2O
Reply #4 on: February 06, 2022, 05:22:15 PM
I think the concept of deconstructing stories is fascinating. There is a writer at the New York Times who took all of her old diaries, several years worth, and put them in an Excel spreadsheet, and sorted her sentences alphabetically. Then she created new chapters, keeping the alphabetical sentence arrangement. And it makes for a fascinating read. Because you see her patterns of speech used over and over again in so many different contexts. And even though one sentence might have been written years after the preceding one, there is sort of a continuity to it that you would not expect.

”You can be mad as a mad dog at the way things went.  You can swear and curse the fates.  But when it comes to the end, you have to let go.” — The Curious Case of Benjamin Button


Offline Sweetums

  • Total freak
  • *****
    • Posts: 762
    • Woos/Boos: +97/-1
    • Gender: Male
  • Disclaimer: I am an idiot. Don't listen to me.
Reply #5 on: May 26, 2022, 06:47:48 AM
Woo to Sweetums for coding a useful program :)

My rusty coder brain thinks that the core of the program requires something to break the text into words, then a hashmap or dictionary from string to int and then do counts that way, then remap the hashmap/dictionary to a sorted type of map, that maps integer to sorted sets of strings :) I assume eliminating capitalized words not in the beginning of sentences would allow removal of names, but seeing names might be useful too. Also when mapping it is probably best to store strings as to lowercase or uppercase, and keep with the integer a string set to hold the variations of that strings capitalizations. Not sure if that is how you did it Sweetums :)

That’s how I did it. I also have a list of common words like “and” that get filtered out.

From what I learned about my writing from this tool, I wrote a more sophisticated tool that points out some of my own bad habits. It flags my sentences if they’re more than twenty words long. It flags if I use the same proper name more than once in a paragraph. Things like that.