Tag Archives: Blogging

A Brief History of ASR: Automatic Speech Recognition

Dear readers, I apologize for the month long hiatus but I assure you that it was much needed. Today’s post is a guest post. If you’re interested in writing one of these, please reach out to me via my contact page here.

This article is originally published at Descript.

This moment has been a long time coming. The technology behind speech recognition has been in development for over half a century, going through several periods of intense promise — and disappointment. So what changed to make ASR viable in commercial applications? And what exactly could these systems accomplish, long before any of us had heard of Siri?

The story of speech recognition is as much about the application of different approaches as the development of raw technology, though the two are inextricably linked. Over a period of decades, researchers would conceive of myriad ways to dissect language: by sounds, by structure — and with statistics.

Early Days

Human interest in recognizing and synthesizing speech dates back hundreds of years (at least!) — but it wasn’t until the mid-20th century that our forebears built something recognizable as ASR.

1961 — IBM Shoebox

Among the earliest projects was a “digit recognizer” called Audrey, created by researchers at Bell Laboratories in 1952. Audrey could recognize spoken numerical digits by looking for audio fingerprints called formants — the distilled essences of sounds.

In the 1960s, IBM developed Shoebox — a system that could recognize digits and arithmetic commands like “plus” and “total”. Better yet, Shoebox could pass the math problem to an adding machine, which would calculate and print the answer.

Meanwhile researchers in Japan built hardware that could recognize the constituent parts of speech like vowels; other systems could evaluate the structure of speech to figure out where a word might end. And a team at University College in England could recognize 4 vowels and 9 consonants by analyzing phonemes, the discrete sounds of a language.

But while the field was taking incremental steps forward, it wasn’t necessarily clear where the path was heading. And then: disaster.

October 1969 The Journal of the Acoustical Society of America

A Piercing Freeze

The turning point came in the form of a letter written by John R. Pierce in 1969.

Pierce had long since established himself as an engineer of international renown; among other achievements he coined the word transistor (now ubiquitous in engineering) and helped launch Echo I, the first-ever communications satellite. By 1969 he was an executive at Bell Labs, which had invested extensively in the development of speech recognition.

In an open letter³ published in The Journal of the Acoustical Society of America, Pierce laid out his concerns. Citing a “lush” funding environment in the aftermath of World War II and Sputnik, and the lack of accountability thereof, Pierce admonished the field for its lack of scientific rigor, asserting that there was too much wild experimentation going on:

“We all believe that a science of speech is possible, despite the scarcity in the field of people who behave like scientists and of results that look like science.” — J.R. Pierce, 1969

Pierce put his employer’s money where his mouth was: he defunded Bell’s ASR programs, which wouldn’t be reinstated until after he resigned in 1971.

Progress Continues

Thankfully there was more optimism elsewhere. In the early 1970s, the U.S. Department of Defense’s ARPA (the agency now known as DARPA) funded a five-year program called Speech Understanding Research. This led to the creation of several new ASR systems, the most successful of which was Carnegie Mellon University’s Harpy, which could recognize just over 1000 words by 1976.

Meanwhile efforts from IBM and AT&T’s Bell Laboratories pushed the technology toward possible commercial applications. IBM prioritized speech transcription in the context of office correspondence, and Bell was concerned with ‘command and control’ scenarios: the precursors to the voice dialing and automated phone trees we know today.

Despite this progress, by the end of the 1970s ASR was still a long ways from being viable for anything but highly-specific use-cases.

This hurts my head, too.

The ‘80s: Markovs and More

A key turning point came with the popularization of Hidden Markov Models(HMMs) in the mid-1980s. This approach represented a significant shift “from simple pattern recognition methods, based on templates and a spectral distance measure, to a statistical method for speech processing”—which translated to a leap forward in accuracy.

A large part of the improvement in speech recognition systems since the late 1960s is due to the power of this statistical approach, coupled with the advances in computer technology necessary to implement HMMs.

HMMs took the industry by storm — but they were no overnight success. Jim Baker first applied them to speech recognition in the early 1970s at CMU, and the models themselves had been described by Leonard E. Baum in the ‘60s. It wasn’t until 1980, when Jack Ferguson gave a set of illuminating lectures at the Institute for Defense Analyses, that the technique began to disseminate more widely.

The success of HMMs validated the work of Frederick Jelinek at IBM’s Watson Research Center, who since the early 1970s had advocated for the use of statistical models to interpret speech, rather than trying to get computers to mimic the way humans digest language: through meaning, syntax, and grammar (a common approach at the time). As Jelinek later put it: “Airplanes don’t flap their wings.”

These data-driven approaches also facilitated progress that had as much to do with industry collaboration and accountability as individual eureka moments. With the increasing popularity of statistical models, the ASR field began coalescing around a suite of tests that would provide a standardized benchmark to compare to. This was further encouraged by the release of shared data sets: large corpuses of data that researchers could use to train and test their models on.

In other words: finally, there was an (imperfect) way to measure and compare success.

November 1990, Infoworld

Consumer Availability — The ‘90s

For better and worse, the 90s introduced consumers to automatic speech recognition in a form we’d recognize today. Dragon Dictate launched in 1990 for a staggering $9,000, touting a dictionary of 80,000 words and features like natural language processing (see the Infoworld article above).

These tools were time-consuming (the article claims otherwise, but Dragon became known for prompting users to ‘train’ the dictation software to their own voice). And it required that users speak in a stilted manner: Dragon could initially recognize only 30–40 words a minute; people typically talk around four times faster than that.

But it worked well enough for Dragon to grow into a business with hundreds of employees, and customers spanning healthcare, law, and more. By 1997 the company introduced Dragon NaturallySpeaking, which could capture words at a more fluid pace — and, at $150, a much lower price-tag.

Even so, there may have been as many grumbles as squeals of delight: to the degree that there is consumer skepticism around ASR today, some of the credit should go to the over-enthusiastic marketing of these early products. But without the efforts of industry pioneers James and Janet Baker (who founded Dragon Systems in 1982), the productization of ASR may have taken much longer.

November 1993, IEEE Communications Magazine

Whither Speech Recognition— The Sequel

25 years after J.R. Pierce’s paper was published, the IEEE published a follow-up titled Whither Speech Recognition: the Next 25 Years⁵, authored by two senior employees of Bell Laboratories (the same institution where Pierce worked).

The latter article surveys the state of the industry circa 1993, when the paper was published — and serves as a sort of rebuttal to the pessimism of the original. Among its takeaways:

  • The key issue with Pierce’s letter was his assumption that in order for speech recognition to become useful, computers would need to comprehend what words mean. Given the technology of the time, this was completely infeasible.
  • In a sense, Pierce was right: by 1993 computers had meager understanding of language—and in 2018, they’re still notoriously bad at discerning meaning.
  • Pierce’s mistake lay in his failure to anticipate the myriad ways speech recognition can be useful, even when the computer doesn’t know what the words actually mean.

The Whither sequel ends with a prognosis, forecasting where ASR would head in the years after 1993. The section is couched in cheeky hedges (“We confidently predict that at least one of these eight predictions will turn out to have been incorrect”) — but it’s intriguing all the same. Among their eight predictions:

  • “By the year 2000, more people will get remote information via voice dialogues than by typing commands on computer keyboards to access remote databases.”
  • “People will learn to modify their speech habits to use speech recognition devices, just as they have changed their speaking behavior to leave messages on answering machines. Even though they will learn how to use this technology, people will always complain about speech recognizers.”

The Dark Horse

In a forthcoming installment in this series, we’ll be exploring more recent developments and the current state of automatic speech recognition. Spoiler alert: neural networks have played a starring role.

But neural networks are actually as old as most of the approaches described here — they were introduced in the 1950s! It wasn’t until the computational power of the modern era (along with much larger data sets) that they changed the landscape.

But we’re getting ahead of ourselves. Stay tuned for our next post on Automatic Speech Recognition by following Descript on Medium, Twitter, or Facebook.

Timeline via Juang & Rabiner

This article is originally published at Descript.

Regina Bethory is a fiction author. She graduated from Christopher Newport University with a Bachelor’s in Directing and Play Writing and from Newport News Shipbuilding’s Apprentice School as a Test Electrician. She also has a degree in Funeral Services. As an avid minimalist and traveler, she enjoys spending her time learning new things, seeking new experiences and de-cluttering. When she is not writing, she can often be found in comic book stores and early morning matinees.

July Camp NaNoWriMo 2018: Blog Challenge Complete

Dear readers, the July Camp NaNoWriMo 2018 has come to an end and with it, my self-imposed blog-a-day challenge. I have to say, when I first got the idea for this challenge it was about three days before the start of July. It seems like yesterday. I was so afraid that I wouldn’t be able to keep up or that I’d run out of ideas. However, thanks to a remarkable camp cabin and all of you, I’ve been able to persevere.

What I Learned During July Camp NaNoWriMo 2018

Above all, I learned that I am more than capable of writing over 50,000 words in a month. In fact, much like my high school years of running cross country, I find myself crossing the finish line thinking that I could have pushed myself harder. There were nights I came home from work and the last thing I wanted to do was sit in front of a computer screen, but I found a way. There were days that I could’ve gotten ahead by writing multiple blog spots in spare time, but I didn’t.

This month has proved to me the importance of the phrase “slow and steady wins the race.” Too often do I have the notion set in my head that I can sit down and dictate an entire novel’s rough draft in a weekend. While I’m sure it’s possible, it wouldn’t be the greatest to edit. There is something very satisfying about seeing that NaNoWriMo progress bar go up a little each day. (I’ve been trying to create my own spreadsheet in MS Excel to track my words off-season. Any suggestions are appreciated in the comments below!)

Overall, I had a blast this month and proved to myself that I am capable of accomplishing what I set my mind to. While it’s something that I’ve been aware of before, sometimes we all need a little reminding.

What’s Next?

While I do plan to regularly post on my blog, going forward I will no longer be posting every single day. I’m sure my subscribers will be thankful to give their inboxes a break! I do look forward to spending more time on my fiction and sharing pieces with my patrons.

At the end of August, my novel, In Articulo Mortis, will be released for Kindle and in paperback in September. I will be making a few promotional posts and sharing excerpts on my Patreon page. Other than that, I plan to continue travel and minimalism blog posts. I will also be accepting guest posts from other bloggers.

In addition, I’d like to start doing an “Author Spotlight” series. Perhaps once a month? Feel free to leave any suggestions or input in the comments below.

How Was Your July?

If you participated in July Camp NaNoWriMo 2018, how did it go for you? What did you learn from the experience? If you’re not a writer or didn’t participate, that’s OK! Please feel free to share your successes and stumbling blocks this month in the comments below!

Thanks for sticking with me!

-RB

Regina Bethory is a fiction author. She graduated from Christopher Newport University with a Bachelor’s in Directing and Play Writing and from Newport News Shipbuilding’s Apprentice School as a Test Electrician. She also has a degree in Funeral Services. As an avid minimalist and traveler, she enjoys spending her time learning new things, seeking new experiences and de-cluttering. When she is not writing, she can often be found in comic book stores and early morning matinees.

Overcoming Writer’s Block with Automatic Transcription

This article is originally published by Descript.

If you’re a writer — of books, essays, scripts, blog posts, whatever — you’re familiar with the phenomenon: the blank screen, a looming deadline, and a sinking feeling in your gut that pairs poorly with the jug of coffee you drank earlier.

If you know that rumble all too well: this post is for you. Maybe it’ll help you get out of a rut; at the very least, it’s good for a few minutes of procrastination.

Here’s the core idea: thinking out loud is often less arduous than writing. And it’s now easier than ever to combine the two, thanks to recent advances in speech recognition technology.

Of course, dictation is nothing new — and plenty of writers have taken advantage of it. Carl Sagan’s voluminous output was facilitated by his process of speaking into an audio recorder, to be transcribed later by an assistant (you can listen to some of his dictations in the Library of Congress!) And software like Dragon’s Naturally Speaking has offered automated transcription for people with the patience and budget to pursue it.

But it’s only in the last couple of years that automated transcription has reached a sweet spot — of convenience, affordability and accuracy—that makes it practical to use it more casually. And I’ve found it increasingly useful for generating a sort of proto-first draft: an alternative approach to the painful process of converting the nebulous wisps inside your head into something you can actually work with.

I call this process idea extraction (though these ideas may be more accurately dubbed brain droppings).

Part I: Extraction

Here’s how my process works. Borrow what works for you and forget the rest — and let me know how it goes!

  • Pick a voice recorder. Start talking. Try it with a topic you’ve been chewing on for weeks — or when an idea flits your head. Don’t overthink it. Just start blabbing.
  • The goal is to tug on as many threads as you come across, and to follow them as far as they go. These threads may lead to meandering tangents— and you may discover new ideas along the way.
  • A lot of those new ideas will probably be embarrassingly bad. That’s fine. You’re already talking about the next thing! And unlike with text, your bad ideas aren’t staring you in the face.
  • Consider leaving comments to yourself as you go — e.g. “Maybe that’d work for the intro”. These will come in handy later.
  • For me, these recordings run anywhere from 20–80 minutes. Sometimes they’re much shorter, in quick succession. Whatever works.

Part II: Transcription

Once I’ve finished recording, it’s time to harness ⚡️The Power of Technology⚡️

A little background: over the last couple of years there’s been an explosion of tools related to automatic speech recognition (ASR) thanks to huge steps forward in the underlying technologies.

Here’s how ASR works: you import your audio into the software, the software uses state-of-the-art machine learning to spit back a text transcript a few minutes later. That transcript won’t be perfect—the robots are currently in the ‘Write drunk’ phase of their careers. But for our purposes that’s fine: you just need it to be accurate enough that you can recognize your ideas.

Once you have your text transcript, your next step is up to you: maybe you’re exporting your transcript as a Word doc and revising from there. Maybe you’re firing up your voice recorder again to dictate a more polished take. Maybe only a few words in your audio journey are worth keeping — but that’s fine too. It probably didn’t cost you much (and good news: the price for this tech will continue to fall in the years ahead).

A few more tips:

  • Use a recorder/app that you trust. Losing a recording is painful — and the anxiety of losing another can derail your most exciting creative moments (“I hope this recorder is working. Good, it is… @#*! where was I?”)
  • Audio quality matters when it comes to automatic transcription. If your recording has a lot of background noise or you’re speaking far away from the mic, the accuracy is going to drop. Consider using earbuds (better yet: Airpods) so you can worry less about where you’re holding the recorder.
  • Find a comfortable space. Eventually you may get used to having people overhear your musings, but it’s a lot easier to let your mind “go for a walk” when you’re comfortable in your environment.
  • Speaking of walking: why not go for a stroll? The pains of writing can have just as much to do with being stationary and hunched over. Walking gets your blood flowing — and your ideas too.
  • I have a lot of ideas, good and bad, while I’m thinking out loud and playing music at the same time (in my case, guitar — but I suspect it applies more broadly). There’s something about playing the same four-chord song on auto pilot for the thousandth time that keeps my hands busy and leaves my mind free to wander.

The old ways of doing things — whether it’s with a keyboard or pen — still have their advantages. Putting words to a page can force a sort of linear thinking that is otherwise difficult to maintain. And when it comes to editing, it’s no contest: QWERTY or bust.

But for getting those first crucial paragraphs down (and maybe a few keystone ideas to build towards)? Consider talking to yourself. Even if you wind up with a transcript full of nothing but profanity — well, have you ever seen a transcript full of profanity? You could do a lot worse.

This article is originally published by Descript.

Regina Bethory is a fiction author. She graduated from Christopher Newport University with a Bachelor’s in Directing and Play Writing and from Newport News Shipbuilding’s Apprentice School as a Test Electrician. She also has a degree in Funeral Services. As an avid minimalist and traveler, she enjoys spending her time learning new things, seeking new experiences and de-cluttering. When she is not writing, she can often be found in comic book stores and early morning matinees.

Joining in on Writing Prompts: Organs

I’ve never been big on writing prompts but I suppose that is ignorant of me to say because I don’t think I’ve ever participated in one. During this month’s Camp NaNoWriMo, I’ve been privileged to have other bloggers in my cabin. One of my cabin mates, Amelia, runs a blog called You Can Always Start Now, in which she often participates weekly writing prompts. Many of the prompts come from another author and blogger, Linda, on her blog Life in Progress.

This week’s writing prompt for “Stream of Consciousness Saturday” (#SoCS) focused on the topic of “organs.” Since I am a day behind already, I have read both of their responses and while both unique and interesting, my subconscious has led me down a third path. Here is what I wrote for the prompt:

Organs. The first thing that comes to mind is a book I’m reading about being a mortuary technician. Think of all the nasty stuff they have to take out. The book is called “Down Among the Dead Men” by Michelle Williams and I just finished a chapter where she wrote about being careful when slicing a body down the sternum because you don’t want to rupture the stomach and have all of that disgusting-ness spill out.

The second thought that comes to mind is Egyptian canopic jars. I love studying ancient cultures, especially ancient Egypt. They seemed so advanced and yet somehow, we lost all of that wisdom and technology. It baffles me as to how. It also baffles me as to why they thought that the lungs, intestines, stomach and liver were needed in the afterlife. At least those are the organs I think the jars were used to protect. Apparently digestion is important in the afterlife. Take note, mortals!

How about you, fellow mortal? Care to join in on a writing prompt?

Happy writing!

-RB

Regina Bethory is a fiction author. She graduated from Christopher Newport University with a Bachelor’s in Directing and Play Writing and from Newport News Shipbuilding’s Apprentice School as a Test Electrician. She also has a degree in Funeral Services. As an avid minimalist and traveler, she enjoys spending her time learning new things, seeking new experiences and de-cluttering. When she is not writing, she can often be found in comic book stores and early morning matinees.

NaNoWriMo: How to Increase Word Count

The end of July’s NaNoWriMo Camp for 2018 is fast approaching. And with that in mind many writers are looking for ways to increase word count. Myself included. I don’t know what the weather is like where all of you are living but for me I am headed into a weekend of heavy downpours and cloudy skies. In other words, perfect writing weather.

I figured for today a good blog post would focus on ways that we could all increase word count. Next week is the final inning… The home stretch. Personally, I’m about 13,000 words away from my monthly goal of 50,000 words. However, I have spent most of my writing this month on my blog and my morning pages as opposed to working on my WIP. With that in mind, I’m hoping to have an overly productive weekend of words, words, and more words. But we all know how planning for a productive weekend goes. It often results in getting nothing done. With that being said let’s help one another cross the finish line using some of these prompts and ideas.

Tips, Tools and Tricks to Increase Word Count
The Harry Potter Word Crawls

I should saved the best for last but seriously, this one is just too good. If you’re a Harry Potter fan and you haven’t heard of these, you’re missing out. A forum on Reddit has a complete list of links to all of the word count crawls. There has never been a more magical way to increase word count.

Write from All 5 Senses

For real. Go back into every scene and use more description. What are the characters smelling? Is it pleasant? Is it malodorous? What are they seeing? Use adjectives like they are going out of style. You will come back to edit and clean it up later. For now, I expect you to be describing mole hairs. Describe every sound…even the quietest places have sound. For example, my home is quiet right now but I can hear the AC running, my fingers on the keyboard and water trickling from the turtle tank filter. Leave no stone unturned!

Kill a Character – Or Several!

I’m talking Game of Thrones style! Kill three main characters off at once. Take no prisoners. Sacrifice your lambs. BURN THEM ALL! Or you could settle for torturing one of them, brainwashing him, then castrating him. Your choice.

Introduce a Character – Or Several!

I guess this could also be Game of Thrones style as that series has so freakin’ many!

Word Sprints

Word sprints, as painful as they can be, really do help. Why? Because they don’t allow you any time to think about what you’re doing. Even as someone who is a “planner,” when I’m forced to try to write as much as I can within a certain time frame, I start coming up with all sorts of crazy goodness. And by the time the buzzer goes off, I usually want to keep going. Embrace that and run with it. That scene might not make it into the final draft but it counts for this month.

You don’t have to have an account on Twitter or Facebook to participate in them either. Host your own within your cabin! That’s what my kick ass cabin does! I almost feel like we should have a team name…

Write from a Different Medium

Sometimes I type. Other times I write long-hand and sometimes I use dictation software. Each method has its own pros and cons. (Can we say new blog post topic?) All levity aside, don’t be afraid to switch things up. If I get tired of staring at the computer screen and feel stuck or don’t know what to write next or how to write what’s next, I get up and move. That’s when I go lay on the couch or the bed with a notebook and start writing by hand.

Perhaps I know what I want to write and I’ve got the whole scene worked out in my head but it’s so long and my fingers are exhausted. Then I sit at my desk and turn on the microphone. I use Dragon Naturally Speaking but I’m sure there are many other dictation programs out there, this is what works best for me. Sometimes I catch myself rambling but it’s a great way to get the words out quickly!

I hope these tips help you reach your goals for camp this year. And please check out those Harry Potter Word Crawls! They are entertaining.

Happy Writing!

-RB

Regina Bethory is a fiction author. She graduated from Christopher Newport University with a Bachelor’s in Directing and Play Writing and from Newport News Shipbuilding’s Apprentice School as a Test Electrician. She also has a degree in Funeral Services. As an avid minimalist and traveler, she enjoys spending her time learning new things, seeking new experiences and de-cluttering. When she is not writing, she can often be found in comic book stores and early morning matinees.