- Using Textstat To Use
- Using Textstat To Check
- Using Textstat To Change
- Using Textstat To Make
- Using Test Statistic
Description
Textstatkeyness compares two partitions of a corpus to determine the words that are 'key' or differentially occurring between the two partitions. So for you to compare any target corpus to a baseline corpus, you would need to combine the two into a single dfm, and then specify the target appropriately. Nov 22, 2020 Textstat. Textstat is an easy to use library to calculate statistics from text. It helps determine readability, complexity, and grade level. Photo by Patrick Tomasso on Unsplash. Dec 15, 2020 Textstat. Modified from the original by Jonathan Pyle to remove the Pyphen dependency because it is a GPL library and textstat is MIT licensed. Textstat is an easy to use library to calculate statistics from text. Mar 15, 2014 The free computer aided translation (CAT) tool for professionals. Text File Statistics. A command line utility to display statistics about a text file consisting of lines of data. The statistics include counts of line terminator pairs (CR, LF, CR+LF) and line counts. Also shows if there is an unterminated trailing line.
Produces counts and document frequencies summaries of the features in adfm, optionally grouped by a docvars variable or other suppliedgrouping variable.
Usage
Arguments
a dfm object
(optional) integer specifying the top n
features to be returned,within group if groups
is specified
either: a character vector containing the names of documentvariables to be used for grouping; or a factor or object that can becoerced into a factor equal in length or rows to the number of documents.NA
values of the grouping value are dropped.See groups for details.
character string specifying how ties are treated. Seedata.table::frank()
for details. Unlike that function,however, the default is 'min'
, so that frequencies of 10, 10, 11would be ranked 1, 1, 3.
Using Textstat To Use
additional arguments passed to dfm_group()
. This canbe useful in passing force = TRUE
, for instance, if you are grouping adfm that has been weighted.
Value
a data.frame containing the following variables:
feature
(character) the feature
frequency
count of the feature
rank
rank of the feature, where 1 indicates the greatestfrequency
docfreq
Using Textstat To Check
document frequency of the feature, as a count (thenumber of documents in which this feature occurred at least once)
docfreq
Using Textstat To Change
document frequency of the feature, as a count
group
(only if groups
is specified) the label of the group.If the features have been grouped, then all counts, ranks, and documentfrequencies are within group. If groups is not specified, the group
column is omitted from the returned data.frame.
Using Textstat To Make
textstat_frequency
returns a data.frame of features andtheir term and document frequencies within groups.