Overview¶

Concatenator is a tool for synthesizing interpretations of a sound, through the analysis and synthesis of audio grains from a corpus database. The program works by analysing overlapping segments of audio (known as grains) from both the target sound and the source database, then searching for the closest matching grain in the source database to the target sound. Finally, the output is generated by overlap-adding the best matches.

To create the final output, there are three main operations to perform:

digraph a {
"Database Analysis" -> "Grain Matching" -> "Output Synthesis";
}

Analysis¶

First, the descriptor analyses are generated for each audio file in both the source and target database. Full details on the types of descriptor available and their function can be found in the Audio Descriptor Definitions section of this documentation. Analyses are then stored to a HDF5 file, ready for matching.

$digraph b { subgraph cluster0 { style=filled; color=lightgrey; node [shape=record,width=.1,height=.1]; node0 [label = "<f0> | <f1> | <f2> | <f3> | <f4> | <f5> | <f6> ",width=2.5] label = "Audio\nFiles"; labeljust="l"; } subgraph cluster2 { style=filled; color=lightgrey; node [shape=record,width=.1,height=.1]; node2 [label = "RMS | F0 | Centroid | Kurtosis | etc..."] label = "Analyses"; labeljust="l"; } database[shape=rectangle, label="Audio Directory"]; HDF[shape=rectangle, label="HDF5 File"]; database -> node0; node0 -> node2; node2 -> HDF }$

Matching¶

Both the source and target HDF5 files are loaded to compare the values of their analyses. Each audio file’s analyses are split into equally sized overlapping grains and averaged in the appropriate way to be compared to grains from the other database. The matching algorithm then calculates the grains that have the smallest overall difference, based on user defined weightings for each of the analysis types. This weighting of analyses allows for certain analyses to gain precedence over others based on user preference. The best match indexes are then saved to the output database ready for synthesis.

There are currently two implementations for the matching algorithm:

Brute Force
K-d Tree Search

Both will return similar results, however the K-d tree search algorithm is far more efficient when analysing large datasets so is the preferred method.

$digraph b { subgraph cluster0 { style=filled; color=lightgrey; node [shape=record,width=.1,height=.1]; node0 [label = "<f0> | <f1> | <f2>Source | <f3>Audio | <f4>Analysis | <f5> | <f6> ",width=2.5] labeljust="l"; } subgraph cluster1 { style=filled; color=lightgrey; node [shape=record,width=.1,height=.1]; node1 [label = "<f0> | <f1> | <f2> | <f3> | <f4> | <f5> | <f6> | <f7> | <f8> | <f9>Source | <f10>Analysis | <f11>Grains | <f12> | <f13> | <f14> | <f15> | <f16> | <f17> | <f18> | <f19> | <f20> ",width=2.5] label="\n\n\n\n"; labeljust="l"; } subgraph cluster2 { style=filled; color=lightgrey; node [shape=record,width=.1,height=.1]; node2 [label = "<f0>Target Audio Analysis"] labeljust="l"; } subgraph cluster3 { style=filled; color=lightgrey; node [shape=record,width=.1,height=.1]; node3 [label = "<f0> | <f1> | <f2>Target | <f3>Analysis | <f4>Grains | <f5> | <f6>",width=2.5] label="\n\n\n\n"; labeljust="l"; } database1[shape=rectangle, label="Source HDF5 File"]; database2[shape=rectangle, label="Target HDF5 File"]; database3[shape=rectangle, label="Output HDF5 File"]; matcher[shape=rectangle, label="Matching Algorithm"]; node0:f0 -> node1:f0 node0:f0 -> node1:f1 node0:f0 -> node1:f2 node0:f1 -> node1:f3 node0:f1 -> node1:f4 node0:f1 -> node1:f5 node0:f2 -> node1:f6 node0:f2 -> node1:f7 node0:f2 -> node1:f8 node0:f3 -> node1:f9 node0:f3 -> node1:f10 node0:f3 -> node1:f11 node0:f4 -> node1:f12 node0:f4 -> node1:f13 node0:f4 -> node1:f14 node0:f5 -> node1:f15 node0:f5 -> node1:f16 node0:f5 -> node1:f17 node0:f6 -> node1:f18 node0:f6 -> node1:f19 node0:f6 -> node1:f20 node2:f0 -> node3:f0 node2:f0 -> node3:f1 node2:f0 -> node3:f2 node2:f0 -> node3:f3 node2:f0 -> node3:f4 node2:f0 -> node3:f5 node2:f0 -> node3:f6 database1 -> node0; database2 -> node2; node1 -> matcher node3 -> matcher matcher -> database3 }$

Synthesis¶

The synthesis process involves loading the best match grains from the source database, performing any post-processing (such as pitch shifting and amplitude scaling) to improve the similarity of the match, then windowed overlap adding the grains to create the final output. The post-processing phase involves using the ratio difference between the source and target grain to artificially alter the source grain so that it better resembles the target. This is particularly useful when using small source databases as it improves the similarity of any match (important when best matches aren’t very close to the target.) The final output is saved to the output database’s audio directory.