Corpus Analysis Of Whole Data Essay

1421 Words May 25th, 2016 6 Pages
Corpus Analysis of Whole Data
The data as a whole, contains a reasonably large number of taboo lexical items, perhaps to be expected, due to the way in which the data was collected; through a search for taboo lexical items. Therefore, the total number of expletives is 10809, according to those corresponding to Thelwalls’ (2008) categorisation, which is just under 7.5% of the total number of words. This should obviously not be considered representative of the average amount of taboo language on Twitter, as the data itself was specifically selected for the presence of taboo lexical items.

It is particularly interesting, however to investigate the inventiveness of Twitter users who are already using conventional taboo lexical items, as found in Thelwall’s classification. Table 2 below shows all of the conventional taboo lexical items used more than once and found in, by frequency found in the entire dataset.
Table 2
All Taboo Lexical Items Used (According to Thelwall)
Word Frequency Strength
FUCK 3380 4
FUCKING 1830 4
NIGGA 1296 5
NIGGAS 774 5
GAY 731 3
FUCKED 504 4
DICK 492 3
PUSSY 435 3
FUCKIN 269 4
COCK 144 3
ASSHOLE 135 3
PISS 120 3
CUNT 73 5
BASTARD 57 3
PAKI 55 5
WHORE 52 3
FUCKER 39 4
QUEER 39 3
PISSING 38 3
NIGGAZ 35 5
PISSED 30 3
DICKS 29 3
PRICK 29 3
TWAT 26 3
CUNTS 25 5
JEW 24 5
WANK 19 3
NIGGAH 16 5
SHAG 16 3
MOTHERFUCKING 15 5
FUCKEN 14 4
WANKING 14 3
NIGGER 13 5
WANKER 10 3
AVERAGE 317 3.8

Each taboo item in Table 2 above is used an average of 317 times, quite a…

Related Documents