Sample value distribution statistics

Release: 0.2

△ Index ▼ Sample value distribution ▷ Download ▷ Build
▾ Options ▾ Result ▾ File formats ▾ History ▾ Todo


sampledist is intended to analyze audio data statistically.

Use cases

Program sequence

  1. The program reads 16-Bit PCM encoded audio data with two channels from stdin or a file in RIFF wave format.
  2. The data is cut into chunks of N samples.
  3. Each block is analyzed.
  4. The results are written to stderr.
  5. Optionally the value distribution is written to a file.
  6. Optionally a command is executed or a constant is written to stdout.
  7. Go for the next block.

Command line options

infilename - name of input file - default stdin
Read PCM data from filename (instead of stdin). The file name could be a transient source like a fifo or a character device.
bnsamples - block length - 32768 by default
This is the number of samples read in one block. Each time this number of samples are received the result is analyzed and written to console or file. Less samples do not produce any output.
lncount - number of cycles - 1 by default
Number of blocks to process until the program completes.
loop - infinite input
Switches the program to continuous mode. It can only be terminated by either sending a interrupt signal or when the input stream gets closed.
alcount - add blocks - infinite by default
Add the results of count blocks before the statistic is cleared. By default the results are cumulative, i.e. the statistics apply to all samples analyzed so far rather than the last block. Use al1 to provide only statistics for the last block read.
psanum - discard first num samples
Use this to discard spikes at the starting or to reach a steady state. You may also use this option to discard headers from PCM fileslike RIFF wave format.
dffilename - write histogram data to file
Writes the sample value distribution to filename. See ? file format
wd - write histogram data to hist.dat - shortcut for dfhist.dat.
Writes the sample value distribution to hist.dat.
rffilename - raw data file
Write raw data to filename. These is the raw input data without any processing so far, except for option psa. It is intended for diagnostics only. ? file format
wr - write raw data
Write raw data to default file raw.dat - shortcut for rfraw.dat.
execcommand - execute shell command
Each time a block has completed and the data has been written command is passed to system(). Note that sampledist waits for the command to complete. This gives you exclusive access to the data files but it may also interfere with the real time processing of the input data. You may alternatively consider to pipe the command to stdout instead (option plot), if you do not need this kind of synchronization.
plotcommand - pipe command
Write command to stdout each time a block has completed. You can use this to synchronize plot programs when new data arrives. Note that sampledist will not wait for any command completion.




The program writes blocks like this to stderr.

	samples 	dB
min	-30581	-23544	-0.6	-2.9
max	30736	24574	-0.6	-2.5
mean	1.90	-16.49
stddev	5995.40	3912.31
skew	-0.0087	-0.0189
kurtos.	0.00820	0.00918
crest	0.19506	0.15920	-14.2	-16.0
min, max
Minimum and maximum sample value of left and right channel as absolute value and as dBFSR.
Average sample value of left and right channel. A larger non-zero value indicates a DC bias in the audio data.
Standard deviation of sample value of left and right channel. The unit is digits. This is related to the RMS power in the audio data, but it should not be confused with the psychoacoustic loudness.
Bias corrected Skewness of deviation of sample value of left and right channel.
This value should be close to zero indicating symmetric distributions. Everything else likely indicates a significant non-linearity.
Bias corrected Kurtosis excess of deviation of sample value of left and right channel.
This value is small for deviations close to the standard deviation, e.g. white noise. However, it increases as the audio data contain high dynamic. On the other side negative values indicate highly compressed audio data, e.g. loudness war.
Crest factor of the audio data, i.e. ratio of the peak value to the RMS value. ?2 for sinusoidal, ?4 for noise, even more for typical audio.

File formats

hist.dat - value distribution

Column Symbol Description
[1] v sample value, [-32768, 32767]
[2] hL relative frequency of sample value at the left channel
[3] hR
relative frequency of sample value at the right channel
[4] HL absolute frequency of sample value at the left channel
[5] HR absolute frequency of sample value at the right channel

raw.dat - raw data

Column Symbol Description
- line number
sample index n
[1] L(n) Channel 1 sample value
[2] R(n) Channel 2 sample value

Change log

Version 0.2

Version 0.1

TODOs, known issues

Input formats
Currently only 16 bit PCM data is supported. Indeed it does not make too much sense to analyze the frequency of sample values for 24 or 32 bit data. But the moment analysis of the value distribution does make sense.