app_confbridge: Update dsp_silence_threshold and dsp_talking_threshold docs.

The dsp_talking_threshold does not represent time in milliseconds. It represents the average magnitude per sample in the audio packets. This is what the DSP uses to determine if a packet is silence or talking/noise. Change-Id: If6f939c100eb92a5ac6c21236559018eeaf58443
2018-01-30 15:00:32 -06:00 · 2018-01-30 15:00:32 -06:00 · b9024197ab
parent 6c5e3226ec
commit b9024197ab
7 changed files with 151 additions and 110 deletions
--- a/apps/confbridge/conf_config_parser.c
+++ b/apps/confbridge/conf_config_parser.c
@ -144,72 +144,66 @@
 					</para></description>
 				</configOption>
 				<configOption name="dsp_silence_threshold">
-					<synopsis>The number of milliseconds of detected silence necessary to trigger silence detection</synopsis>
-					<description><para>
-					The time in milliseconds of sound falling within the what
-					the dsp has established as baseline silence before a user
-					is considered be silent.  This value affects several
-					operations and should not be changed unless the impact
-					on call quality is fully understood.</para>
-					<para>What this value affects internally:</para>
-					<para>
-						1. When talk detection AMI events are enabled, this value
+					<synopsis>The number of milliseconds of silence necessary to declare talking stopped.</synopsis>
+					<description>
+						<para>The time in milliseconds of sound falling below the
+						<replaceable>dsp_talking_threshold</replaceable> option when
+						a user is considered to stop talking.  This value affects several
+						operations and should not be changed unless the impact on call
+						quality is fully understood.
+						</para>
+						<para>What this value affects internally:
+						</para>
+						<para>1. When talk detection AMI events are enabled, this value
 						determines when the user has stopped talking after a
 						period of talking.  If this value is set too low
 						AMI events indicating the user has stopped talking
 						may get falsely sent out when the user briefly pauses
 						during mid sentence.
-					</para>
-					<para>
-						2. The <replaceable>drop_silence</replaceable> option depends on this value to
-						determine when the user's audio should begin to be
-						dropped from the conference bridge after the user
+						</para>
+						<para>2. The <replaceable>drop_silence</replaceable> option
+						depends on this value to determine when the user's audio should
+						begin to be dropped from the conference bridge after the user
 						stops talking.  If this value is set too low the user's
-						audio stream may sound choppy to the other participants.
-						This is caused by the user transitioning constantly from
-						silence to talking during mid sentence.
-					</para>
-					<para>
-						The best way to approach this option is to set it slightly above
-						the maximum amount of ms of silence a user may generate during
-						natural speech.
-					</para>
-					<para>By default this value is 2500ms. Valid values are 1 through 2^31.</para>
+						audio stream may sound choppy to the other participants.  This
+						is caused by the user transitioning constantly from silence to
+						talking during mid sentence.
+						</para>
+						<para>The best way to approach this option is to set it slightly
+						above the maximum amount of milliseconds of silence a user may
+						generate during natural speech.
+						</para>
+						<para>Valid values are 1 through 2^31.</para>
 					</description>
 				</configOption>
 				<configOption name="dsp_talking_threshold">
-					<synopsis>The number of milliseconds of detected non-silence necessary to triger talk detection</synopsis>
-					<description><para>
-						The time in milliseconds of sound above what the dsp has
-						established as base line silence for a user before a user
-						is considered to be talking.  This value affects several
-						operations and should not be changed unless the impact on
-						call quality is fully understood.</para>
-						<para>
-						What this value affects internally:
+					<synopsis>Average magnitude threshold to determine talking.</synopsis>
+					<description>
+						<para>The minimum average magnitude per sample in a frame
+						for the DSP to consider talking/noise present.  A value below
+						this level is considered silence.  This value affects several
+						operations and should not be changed unless the impact on call
+						quality is fully understood.
 						</para>
-						<para>
-						1. Audio is only mixed out of a user's incoming audio stream
-						if talking is detected.  If this value is set too
-						loose the user will hear themselves briefly each
-						time they begin talking until the dsp has time to
-						establish that they are in fact talking.
+						<para>What this value affects internally:
 						</para>
-						<para>
-						2. When talk detection AMI events are enabled, this value
+						<para>1. Audio is only mixed out of a user's incoming audio
+						stream if talking is detected.  If this value is set too
+						high the user will hear himself talking.
+						</para>
+						<para>2. When talk detection AMI events are enabled, this value
 						determines when talking has begun which results in
-						an AMI event to fire.  If this value is set too tight
+						an AMI event to fire.  If this value is set too low
 						AMI events may be falsely triggered by variants in
 						room noise.
 						</para>
-						<para>
-						3. The <replaceable>drop_silence</replaceable> option depends on this value to determine
-						when the user's audio should be mixed into the bridge
-						after periods of silence.  If this value is too loose
-						the beginning of a user's speech will get cut off as they
-						transition from silence to talking.
+						<para>3. The <replaceable>drop_silence</replaceable> option
+						depends on this value to determine when the user's audio should
+						be mixed into the bridge after periods of silence.  If this value
+						is too high the user's speech will get discarded as they will
+						be considered silent.
 						</para>
-						<para>By default this value is 160 ms. Valid values are 1 through 2^31</para>
+						<para>Valid values are 1 through 2^15.</para>
 					</description>
 				</configOption>
 				<configOption name="jitterbuffer">
@ -1479,7 +1473,7 @@ static char *handle_cli_confbridge_show_user_profile(struct ast_cli_entry *e, in
 		"enabled" : "disabled");
 	ast_cli(a->fd,"Silence Threshold:       %ums\n",
 		u_profile.silence_threshold);
-	ast_cli(a->fd,"Talking Threshold:       %ums\n",
+	ast_cli(a->fd,"Talking Threshold:       %u\n",
 		u_profile.talking_threshold);
 	ast_cli(a->fd,"Denoise:                 %s\n",
 		u_profile.flags & USER_OPT_DENOISE ?
--- a/apps/confbridge/include/confbridge.h
+++ b/apps/confbridge/include/confbridge.h
@ -41,7 +41,10 @@
 #define DEFAULT_BRIDGE_PROFILE "default_bridge"
 #define DEFAULT_MENU_PROFILE "default_menu"

+/*! Default minimum average magnitude threshold to determine talking by the DSP. */
 #define DEFAULT_TALKING_THRESHOLD 160
+
+/*! Default time in ms of silence necessary to declare talking stopped by the bridge. */
 #define DEFAULT_SILENCE_THRESHOLD 2500

 enum user_profile_flags {
@ -140,9 +143,9 @@ struct user_profile {
 	char announcement[PATH_MAX];
 	unsigned int flags;
 	unsigned int announce_user_count_all_after;
-	/*! The time in ms of talking before a user is considered to be talking by the dsp. */
+	/*! Minimum average magnitude threshold to determine talking by the DSP. */
 	unsigned int talking_threshold;
-	/*! The time in ms of silence before a user is considered to be silent by the dsp. */
+	/*! Time in ms of silence necessary to declare talking stopped by the bridge. */
 	unsigned int silence_threshold;
 	/*! The time in ms the user may stay in the confbridge */
 	unsigned int timeout;
--- a/bridges/bridge_softmix.c
+++ b/bridges/bridge_softmix.c
@ -53,9 +53,16 @@
 /*! \brief Number of mixing iterations to perform between gathering statistics. */
 #define SOFTMIX_STAT_INTERVAL 100

-/* This is the threshold in ms at which a channel's own audio will stop getting
- * mixed out its own write audio stream because it is not talking. */
+/*!
+ * \brief Default time in ms of silence necessary to declare talking stopped by the bridge.
+ *
+ * \details
+ * This is the time at which a channel's own audio will stop getting
+ * mixed out of its own write audio stream because it is no longer talking.
+ */
 #define DEFAULT_SOFTMIX_SILENCE_THRESHOLD 2500
+
+/*! Default minimum average magnitude threshold to determine talking by the DSP. */
 #define DEFAULT_SOFTMIX_TALKING_THRESHOLD 160

 #define SOFTBRIDGE_VIDEO_DEST_PREFIX "softbridge_dest"
--- a/configs/samples/confbridge.conf.sample
+++ b/configs/samples/confbridge.conf.sample
@ -49,59 +49,67 @@ type=user
                       ; noise from the conference. Highly recommended for large conferences
                       ; due to its performance enhancements.

-;dsp_talking_threshold=128  ; The time in milliseconds of sound above what the dsp has
-                            ; established as base line silence for a user before a user
-                            ; is considered to be talking.  This value affects several
+;dsp_talking_threshold=128  ; Average magnitude threshold to determine talking.
+                            ;
+                            ; The minimum average magnitude per sample in a frame for the
+                            ; DSP to consider talking/noise present.  A value below this
+                            ; level is considered silence.  This value affects several
                            ; operations and should not be changed unless the impact on
                            ; call quality is fully understood.
                            ;
                            ; What this value affects internally:
                            ;
-                            ; 1. Audio is only mixed out of a user's incoming audio stream
-                            ;    if talking is detected.  If this value is set too
-                            ;    loose the user will hear themselves briefly each
-                            ;    time they begin talking until the dsp has time to
-                            ;    establish that they are in fact talking.
-                            ; 2. When talk detection AMI events are enabled, this value
-                            ;    determines when talking has begun which results in
-                            ;    an AMI event to fire.  If this value is set too tight
-                            ;    AMI events may be falsely triggered by variants in
-                            ;    room noise.
-                            ; 3. The drop_silence option depends on this value to determine
-                            ;    when the user's audio should be mixed into the bridge
-                            ;    after periods of silence.  If this value is too loose
-                            ;    the beginning of a user's speech will get cut off as they
-                            ;    transition from silence to talking.
+                            ; 1. Audio is only mixed out of a user's incoming audio
+                            ;    stream if talking is detected.  If this value is set too
+                            ;    high the user will hear himself talking.
                            ;
-                            ; By default this value is 160 ms. Valid values are 1 through 2^31
+                            ; 2. When talk detection AMI events are enabled, this value
+                            ;    determines when talking has begun which results in an
+                            ;    AMI event to fire.  If this value is set too low AMI
+                            ;    events may be falsely triggered by variants in room
+                            ;    noise.
+                            ;
+                            ; 3. The 'drop_silence' option depends on this value to
+                            ;    determine when the user's audio should be mixed into the
+                            ;    bridge after periods of silence.  If this value is too
+                            ;    high the user's speech will get discarded as they will
+                            ;    be considered silent.
+                            ;
+                            ; Valid values are 1 through 2^15.
+                            ; By default this value is 160.

-;dsp_silence_threshold=2000 ; The time in milliseconds of sound falling within the what
-                            ; the dsp has established as baseline silence before a user
-                            ; is considered be silent.  This value affects several
-                            ; operations and should not be changed unless the impact
-                            ; on call quality is fully understood.
+;dsp_silence_threshold=2000 ; The number of milliseconds of silence necessary to declare
+                            ; talking stopped.
+                            ;
+                            ; The time in milliseconds of sound falling below the
+                            ; 'dsp_talking_threshold' option when a user is considered to
+                            ; stop talking.  This value affects several operations and
+                            ; should not be changed unless the impact on call quality is
+                            ; fully understood.
                            ;
                            ; What this value affects internally:
                            ;
                            ; 1. When talk detection AMI events are enabled, this value
                            ;    determines when the user has stopped talking after a
-                            ;    period of talking.  If this value is set too low
-                            ;    AMI events indicating the user has stopped talking
-                            ;    may get falsely sent out when the user briefly pauses
-                            ;    during mid sentence.
-                            ; 2. The drop_silence option depends on this value to
+                            ;    period of talking.  If this value is set too low AMI
+                            ;    events indicating the user has stopped talking may get
+                            ;    falsely sent out when the user briefly pauses during mid
+                            ;    sentence.
+                            ;
+                            ; 2. The 'drop_silence' option depends on this value to
                            ;    determine when the user's audio should begin to be
-                            ;    dropped from the conference bridge after the user
-                            ;    stops talking.  If this value is set too low the user's
-                            ;    audio stream may sound choppy to the other participants.
-                            ;    This is caused by the user transitioning constantly from
+                            ;    dropped from the conference bridge after the user stops
+                            ;    talking.  If this value is set too low the user's audio
+                            ;    stream may sound choppy to the other participants.  This
+                            ;    is caused by the user transitioning constantly from
                            ;    silence to talking during mid sentence.
                            ;
-                            ; The best way to approach this option is to set it slightly above
-                            ; the maximum amount of ms of silence a user may generate during
-                            ; natural speech.
+                            ; The best way to approach this option is to set it slightly
+                            ; above the maximum amount of milliseconds of silence a user
+                            ; may generate during natural speech.
                            ;
-                            ; By default this value is 2500ms. Valid values are 1 through 2^31
+                            ; Valid values are 1 through 2^31.
+                            ; By default this value is 2500ms.

 ;talk_detection_events=yes ; This option sets whether or not notifications of when a user
                           ; begins and ends talking should be sent out as events over AMI.
--- a/include/asterisk/bridge_technology.h
+++ b/include/asterisk/bridge_technology.h
@ -46,11 +46,9 @@ enum ast_bridge_preference {
 * performing talking optimizations.
 */
 struct ast_bridge_tech_optimizations {
-	/*! The amount of time in ms that talking must be detected before
-	 *  the dsp determines that talking has occurred */
+	/*! Minimum average magnitude threshold to determine talking by the DSP. */
 	unsigned int talking_threshold;
-	/*! The amount of time in ms that silence must be detected before
-	 *  the dsp determines that talking has stopped */
+	/*! Time in ms of silence necessary to declare talking stopped by the bridge. */
 	unsigned int silence_threshold;
 	/*! Whether or not the bridging technology should drop audio
 	 *  detected as silence from the mix. */
--- a/include/asterisk/dsp.h
+++ b/include/asterisk/dsp.h
@ -87,7 +87,7 @@ void ast_dsp_free(struct ast_dsp *dsp);
 * created with */
 unsigned int ast_dsp_get_sample_rate(const struct ast_dsp *dsp);

-/*! \brief Set threshold value for silence */
+/*! \brief Set the minimum average magnitude threshold to determine talking by the DSP. */
 void ast_dsp_set_threshold(struct ast_dsp *dsp, int threshold);

 /*! \brief Set number of required cadences for busy */
@ -106,19 +106,41 @@ int ast_dsp_set_call_progress_zone(struct ast_dsp *dsp, char *zone);
   busies, and call progress, all dependent upon which features are enabled */
 struct ast_frame *ast_dsp_process(struct ast_channel *chan, struct ast_dsp *dsp, struct ast_frame *inf);

-/*! \brief Return non-zero if this is silence.  Updates "totalsilence" with the total
-   number of seconds of silence  */
+/*!
+ * \brief Process the audio frame for silence.
+ *
+ * \param dsp DSP processing audio media.
+ * \param f Audio frame to process.
+ * \param totalsilence Variable to set to the total accumulated silence in ms
+ * seen by the DSP since the last noise.
+ *
+ * \return Non-zero if the frame is silence.
+ */
 int ast_dsp_silence(struct ast_dsp *dsp, struct ast_frame *f, int *totalsilence);

-/*! \brief Return non-zero if this is silence.  Updates "totalsilence" with the total
-   number of seconds of silence. Returns the average energy of the samples in the frame
-   in frames_energy variable. */
+/*!
+ * \brief Process the audio frame for silence.
+ *
+ * \param dsp DSP processing audio media.
+ * \param f Audio frame to process.
+ * \param totalsilence Variable to set to the total accumulated silence in ms
+ * seen by the DSP since the last noise.
+ * \param frames_energy Variable to set to the average energy of the samples in the frame.
+ *
+ * \return Non-zero if the frame is silence.
+ */
 int ast_dsp_silence_with_energy(struct ast_dsp *dsp, struct ast_frame *f, int *totalsilence, int *frames_energy);

 /*!
- * \brief Return non-zero if this is noise.  Updates "totalnoise" with the total
- * number of seconds of noise
+ * \brief Process the audio frame for noise.
 * \since 1.6.1
+ *
+ * \param dsp DSP processing audio media.
+ * \param f Audio frame to process.
+ * \param totalnoise Variable to set to the total accumulated noise in ms
+ * seen by the DSP since the last silence.
+ *
+ * \return Non-zero if the frame is silence.
 */
 int ast_dsp_noise(struct ast_dsp *dsp, struct ast_frame *f, int *totalnoise);

--- a/main/dsp.c
+++ b/main/dsp.c
@ -122,12 +122,19 @@ static struct progress {
 	{ GSAMP_SIZE_UK, { 350, 400, 440 } },				/*!< UK */
 };

-/*!\brief This value is the minimum threshold, calculated by averaging all
- * of the samples within a frame, for which a frame is determined to either
- * be silence (below the threshold) or noise (above the threshold).  Please
- * note that while the default threshold is an even exponent of 2, there is
- * no requirement that it be so.  The threshold will accept any value between
- * 0 and 32767.
+/*!
+ * \brief Default minimum average magnitude threshold to determine talking/noise by the DSP.
+ *
+ * \details
+ * The magnitude calculated for this threshold is determined by
+ * averaging the absolute value of all samples within a frame.
+ *
+ * This value is the threshold for which a frame's average magnitude
+ * is determined to either be silence (below the threshold) or
+ * noise/talking (at or above the threshold).  Please note that while
+ * the default threshold is an even exponent of 2, there is no
+ * requirement that it be so.  The threshold will work for any value
+ * between 1 and 2^15.
 */
 #define DEFAULT_THRESHOLD	512

@ -397,7 +404,9 @@ typedef struct {
 struct ast_dsp {
 	struct ast_frame f;
 	int threshold;
+	/*! Accumulated total silence in ms since last talking/noise. */
 	int totalsilence;
+	/*! Accumulated total talking/noise in ms since last silence. */
 	int totalnoise;
 	int features;
 	int ringtimeout;