Tag Archives: Voice control

Windows speech recognition

This week, starting my MBA, I was reminded of windows speech recognition for the use of writing and commenting on documents.  As I have never really got into it,  and people were telling me that it could save me a lot of time using it, I thought I would try it out and provide a small summary of my experience here.

 

Setting up speech recognition systems was very easy.  It took me about 30 seconds to find the right configurations and switch it on.  The next thing to do was to run through the system tutorial, which took me a lot longer (about 30 minutes).  The tutorial itself runs through many different actions that you can accomplish through speech recognition on windows.  Many of these are very intuitive and whilst going through them the system starts learning how to interpret your voice.  I was pretty confident after having finished the tutorial, but have now been stuck for a over 10 minutes on this post, as I was trying to post it using speech recognition alone.  Admittedly I have only been using voice recognition for about 45 minutes and things are getting better as I progress in this post, but it is still not where I would have wanted it to be.  I will give it a fair trial in writing long documents, but I am really not convinced yet.

 

I am keeping this post short, as I would not want to lie when I say that I wrote this post fully using speech recognition.  I just hope that my experience will not stay as bad as it was for the past hour, as I do not think I will be able to hold my calm for much longer.

Android voice control

I have been playing with Android development for quite a while, but two weeks ago I finally finished my first Application. I thought that Text-to-Speech and Speech-to-Text were pretty amazingly easy to integrate and thought I could make everyone benefit from a few snippits, so here is my code (it was almost all done in a single class) :


package com.findarato.cyanide;

import java.util.ArrayList;
import java.util.HashMap;

import android.app.Activity;
import android.content.Intent;
import android.os.Bundle;
import android.speech.RecognizerIntent;
import android.speech.tts.TextToSpeech;
import android.speech.tts.TextToSpeech.OnInitListener;
import android.speech.tts.TextToSpeech.OnUtteranceCompletedListener;
import android.view.*;
import android.view.View.OnClickListener;
import android.widget.Button;
import android.widget.EditText;

public class CyanideRobotActivity extends Activity implements OnClickListener, OnInitListener, OnUtteranceCompletedListener {

private static final int VOICE_RECOGNITION_REQUEST_CODE = 12345;

EditText server = null;
EditText port = null;

TextToSpeech tts = null;

/** Called when the activity is first created. */
@Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.main_layout);

final Button buttonStart = (Button) findViewById(R.id.button_start);
final Button buttonStop = (Button) findViewById(R.id.button_stop);
final Button buttonSpeech = (Button) findViewById(R.id.button_speech);
tts = new TextToSpeech(this, this);
tts.setOnUtteranceCompletedListener(this);
server = (EditText) findViewById(R.id.text_ip);
port = (EditText) findViewById(R.id.text_port);

port.setText("9002");

buttonStart.setOnClickListener(this);
buttonStop.setOnClickListener(this);
buttonSpeech.setOnClickListener(this);

}

private void startVoiceRecognitionActivity() {
Intent intent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
intent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, getClass().getPackage().getName());
intent.putExtra(RecognizerIntent.EXTRA_PROMPT, "Please tell the robot what to do.");
intent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM);
intent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 20);
startActivityForResult(intent, VOICE_RECOGNITION_REQUEST_CODE);
}

protected void onActivityResult(int requestCode, int resultCode, Intent data) {
String serverString = server.getText().toString();
int portInt = Integer.parseInt(port.getText().toString());

if(requestCode == VOICE_RECOGNITION_REQUEST_CODE && resultCode == RESULT_OK) {
ArrayList<String> matches = data.getStringArrayListExtra(RecognizerIntent.EXTRA_RESULTS);
for(String match : matches) {
if(match.equalsIgnoreCase("start robot") || match.equalsIgnoreCase("start") || match.equalsIgnoreCase("start cleaning")) {
tts.speak("Starting cleaning now", TextToSpeech.QUEUE_FLUSH, null);
new networkRequest(serverString, portInt, "START").execute();
}
else if(match.equalsIgnoreCase("stop robot") || match.equalsIgnoreCase("stop") || match.equalsIgnoreCase("stop cleaning")) {
tts.speak("Stopping cleaning now, returning to my charging dock.", TextToSpeech.QUEUE_FLUSH, null);
new networkRequest(serverString, portInt, "STOP", true).execute();
}
}
super.onActivityResult(requestCode, resultCode, data);
}
}

@Override
public void onClick(View v) {

String serverString = server.getText().toString();
int portInt = Integer.parseInt(port.getText().toString());

switch(v.getId()) {
case R.id.button_start :
new networkRequest(serverString, portInt, "START").execute();
break;
case R.id.button_stop :
new networkRequest(serverString, portInt, "STOP").execute();
break;
case R.id.button_speech :
HashMap<String, String> extra = new HashMap<String, String>();
extra.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, "start voice recognition");
tts.speak("Hello master, what would you like me to do ?", TextToSpeech.QUEUE_ADD, extra);
break;
default:
break;
}

}

@Override
public void onInit(int status) {
// TODO Auto-generated method stub

}

@Override
public void onUtteranceCompleted(String utteranceId) {
if(utteranceId.equals("start voice recognition"))
startVoiceRecognitionActivity();
}
}

The interesting parts are the methods startVoiceRecognitionActivity() and onClick(View v) -> switch statement R.id.button_speech

EDIT : I have created a gihub repo for this, if anyone is interested : https://github.com/JoshuaWohle/Android-Voice-Control