} # Verb
PP: { } # PP -> P NP
VP: { *} # VP -> V (NP|PP)*
* Output: Chunker (A python dictionary containing the Chunker object and its arguments.)
Widget: Chunking Hub
---------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
TODO
* Input: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Input: Chunker (Chunker which will be used to parse the text into chunks.)
* Parameter: Sentence's Annotation (System.String)
* Default value: Sentence
* Parameter: Element's Annotation (Tokens which feature's will be used for tagging.)
* Default value: Token
* Parameter: POS Feature Name (Element Annotations' POS Tag Feature Names )
* Default value: POS Tag
* Parameter: Output Feature Name (System.String)
* Default value: IOB Tag
* Output: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
Widget: Extract Annotations from IOB tags
------------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
TODO
* Input: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Parameter: Sentence's Annotation (Tokens which will be used to group element annotations.)
* Default value: Sentence
* Parameter: Element's Annotation (Tokens which feature's will be used in extraction.)
* Default value: Token
* Parameter: IOB Feature Name (Element Annotations' IOB Tag Feature Names )
* Default value: IOB Tag
* Parameter: POS Feature Name (Element Annotations' POS Tag Feature Names )
* Default value: POS Tag
* Parameter: Grammar Labels to be extracted (Grammar labels which will be extracted from the text as new annotations (NP,PP,VP), separated by a comma. NP - noun phrases, VP - verb phrases.)
* Default value: NP,VP
* Parameter: Annotation to be produced (The prefix for annotation of newly discovered tokens. Annotations names will be constructed as a combinations of this prefix and the label type e.x. "Chunk_NP")
* Default value: Chunk
* Output: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
Category Stemming
=================
Category Latino
---------------
Category Advanced
~~~~~~~~~~~~~~~~~
Widget: Stemming Tagger Hub (Text)
```````````````````````````````````
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function TagStringStemLemma in package latino. The original function signature: TagStringStemLemma.
* Input: Text (System.Object)
* Input: Token Tagger (System.Object)
* Parameter: Output Feature Name (System.String)
* Default value: stem
* Output: String (string or array of strings (based on the input))
Widget: Lemma Tagger LemmaGen
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructLemmaSharpLemmatizer in package latino. The original function signature: ConstructLemmaSharpLemmatizer.
* Parameter: Language (Latino.TextMining.Language)
* Possible values:
* Bulgarian
* Czech
* English
* Estonian
* French
* German
* Hungarian
* Italian
* Romanian
* Serbian
* Slovene
* Spanish
* Default value: English
* Output: Lemmatizer (Tagger)
* Example usage: `Stemmer and Lemmatizer classification evaluation `_
Widget: Stem Tagger Snowball
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructSnowballStemmer in package latino. The original function signature: ConstructSnowballStemmer.
* Parameter: Language (Latino.TextMining.Language)
* Possible values:
* Danish
* Dutch
* English
* Finnish
* French
* German
* Italian
* Norwegian
* Portuguese
* Russian
* Spanish
* Swedish
* Default value: English
* Output: Stemmer (Tagger)
* Example usage: `Stemmer and Lemmatizer classification evaluation `_
Widget: Stemming Tagger Hub
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Taggs the given annotated document corpus with the given tagger.
* Input: Annotated Document Corpus (LatinoInterfaces.DocumentCorpus)
* Input: Token Tagger (Token Annotation of the token to be tagged. If also the feature name is used than the feature value of selected token will be tagged.
Usage:
1. TokenName
2. TokenName/FeatureName
If multiple taggers are used then one line per tagger must be specified.)
* Parameter: Token Annotation (System.String)
* Default value: Token
* Parameter: Output Feature Name (System.String)
* Default value: stem
* Output: Annotated Document Corpus
Category Nltk
-------------
Widget: ISRI Stemmer
~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
ISRI Arabic stemmer based on algorithm: Arabic Stemming without a root dictionary. Information Science Research Institute. University of Nevada, Las Vegas, USA. A few minor modifications have been made to ISRI basic algorithm.
See the source code of this module for more information. isri.stem(token) returns Arabic root for the given token. The ISRI Stemmer requires that all tokens have Unicode string types. If you use Python IDLE on Arabic Windows you have to decode text first using Arabic '1256' coding.
* Output: Stemmer (Tagger)
* Example usage: `Stemmer and Lemmatizer classification evaluation `_
Widget: Regex Stemmer
~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
A stemmer that uses regular expressions to identify morphological affixes. Any substrings that match the regular expressions will be removed.
* Parameter: Pattern (The regular expression that should be used to
identify morphological affixes.)
* Parameter: Minimum length of string (The minimum length of string to stem.)
* Default value: 0
* Output: Stemmer (Tagger)
Widget: RSLP Stemmer
~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
A stemmer for Portuguese.
* Output: Stemmer (Tagger)
* Example usage: `Stemmer and Lemmatizer classification evaluation `_
Widget: Snowball Stemmer
~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
The following languages are supported:
Danish, Dutch, English, Finnish, French, German,
Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian,
Spanish and Swedish.
The algorithm for English is documented here:
Porter, M. \"An algorithm for suffix stripping.\"
Program 14.3 (1980): 130-137.
The algorithms have been developed by Martin Porter.
These stemmers are called Snowball, because Porter created
a programming language with this name for creating
new stemming algorithms. There is more information available
at http://snowball.tartarus.org/
* Parameter: Language (The following languages are supported: Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish and Swedish.)
* Possible values:
* Danish
* Dutch
* English
* Finnish
* French
* German
* Hungarian
* Italian
* Norwegian
* Portuguese
* Romanian
* Russian
* Spanish
* Swedish
* Default value: danish
* Parameter: Ignore stopwords (If set to True, stopwords are
not stemmed and returned unchanged.
Set to False by default.)
* Output: Stemmer (Tagger)
* Example usage: `Stemmer and Lemmatizer classification evaluation `_
Widget: Default Lemmatizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Default Lemmatizer
Lemmatizer that can be used as a baseline. Does not do anything, returns word unchanged.
* Output: Stemmer (Tagger)
* Example usage: `Stemmer and Lemmatizer extrinsic evaluation `_
Widget: Lemmagen Lemmatizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Lemmagen lemmatizer as implemented in Python
* Output: Stemmer (Tagger)
* Example usage: `Intrinsic lemmatizer evaluation `_
Widget: Pattern Lemmatizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Pattern Lemmatizer
Lemmatize using Pattern's library built-in stem function.
* Output: Stemmer (Tagger)
* Example usage: `Stemmer and Lemmatizer extrinsic evaluation `_
Widget: Pattern Porter Stemmer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Porter stemmer from Pattern library.
* Output: Stemmer (Tagger)
* Example usage: `Stemmer and Lemmatizer extrinsic evaluation `_
Widget: WordNet Lemmatizer
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
WordNet Lemmatizer
Lemmatize using WordNet's built-in morphy function. Returns the input word unchanged if it cannot be found in WordNet.
* Parameter: POS Annotation (Define the name of the part of speech annotations form ADC corpus that wordnet lemmatizer will use when trying to lemmatize words.)
* Default value: POS Tag
* Output: Stemmer (Tagger)
* Example usage: `Stemmer and Lemmatizer classification evaluation `_
Widget: Lancaster Stemmer
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
A word stemmer based on the Lancaster stemming algorithm.
>>> from nltk.stem.lancaster import LancasterStemmer
>>> st = LancasterStemmer()
>>> st.stem('maximum') # Remove "-um" when word is intact
'maxim'
>>> st.stem('presumably') # Don't remove "-um" when word is not intact
'presum'
>>> st.stem('multiply') # No action taken if word ends with "-ply"
'multiply'
>>> st.stem('provision') # Replace "-sion" with "-j" to trigger "j" set of rules
'provid'
>>> st.stem('owed') # Word starting with vowel must contain at least 2 letters
'ow'
>>> st.stem('ear') # ditto
'ear'
>>> st.stem('saying') # Words starting with consonant must contain at least 3
'say'
>>> st.stem('crying') # letters and one of those letters must be a vowel
'cry'
>>> st.stem('string') # ditto
'string'
>>> st.stem('meant') # ditto
'meant'
>>> st.stem('cement') # ditto
'cem'
* Output: Stemmer (Tagger)
* Example usage: `Stemmer and Lemmatizer classification evaluation `_
Widget: Porter Stemmer
~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
This is the Porter stemming algorithm, ported to Python from the version coded up in ANSI C by the author. It follows the algorithm presented in
Porter, M. "An algorithm for suffix stripping." Program 14.3 (1980): 130-137.
only differing from it at the points marked --DEPARTURE-- and --NEW--
below.
For a more faithful version of the Porter algorithm, see
http://www.tartarus.org/~martin/PorterStemmer/
* Output: Stemmer (Tagger)
* Example usage: `Stemmer and Lemmatizer classification evaluation `_
Widget: Lemmatizer Evaluator
-----------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
This widgets can be used to evaluate lemmatizers. Inputs are lemmatizer and a corpus on which you wish to evaluate lemmatizer
* Input: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Input: Lemmatizer (Lemmatizer to be avaluated)
* Output: Actual and predicted labels (List of actual and predicted labels (see help for details))
Widget: Stem/Lemma Tagger Hub
------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Taggs the given annotated document corpus with the given tagger.
* Input: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Input: Token Tagger (Token Annotation of the token to be tagged. If also the feature name is used than the feature value of selected token will be tagged.
Usage:
1. TokenName
2. TokenName/FeatureName
If multiple taggers are used then one line per tagger must be specified.)
* Parameter: Token Annotation (System.String)
* Default value: Token
* Parameter: POS Annotation (Name of Part of Speech annotation in ADC corpus if ADC corpus contains part of speech tags. Used by wordnet lemmatizer which uses POS tags for lemma prediction.)
* Default value: POS Tag
* Parameter: Output Feature Name (System.String)
* Default value: Stem
* Output: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Example usage: `LBD workflows for outlier detection `_
Category Chunking
=================
Widget: Chunking Hub
---------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
TODO
* Input: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Input: Chunker (TODO)
* Parameter: Input Feature Name (System.String)
* Default value: POS Tag
* Parameter: Output Feature Name (System.String)
* Default value: Chunk
* Output: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
Widget: Classifier based parser
--------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
TODO
* Output: classifier based chunker
Widget: Regex parser
---------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
TODO
* Parameter: Grammar (System.String)
* Default value: "NP: {?*}"
* Output: regex chunker
Category Dataset
================
Category Latino
---------------
Widget: Add Labels
~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function AddLabelsToDocumentVectors in package latino. The original function signature: AddLabelsToDocumentVectors.
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Input: Labeles (Array of Strings) (System.Collections.Generic.List`1[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]])
* Output: Dataset
Widget: Extract Labels
~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ExtractDatasetLabels in package latino. The original function signature: ExtractDatasetLabels.
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Output: Labels (Array of Strings)
Widget: Remove Labels
~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function RemoveDocumentVectorsLabels in package latino. The original function signature: RemoveDocumentVectorsLabels.
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Output: Dataset
Widget: Split
~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function DatasetSplitSimple in package latino. The original function signature: DatasetSplitSimple.
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Parameter: Percentage (System.Double)
* Default value: 10
* Parameter: Random Seed (-1 for random (time dependet) random seed)
* Default value: -1
* Output: Dataset with Extracted Set
* Output: Dataset of Remaining Sets
Widget: Split to Predefined Sets
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function DatasetSplitPredefined in package latino. The original function signature: DatasetSplitPredefined.
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Input: Sets (List with predefined set numbers) (System.Int32[])
* Input: SetId (System.Int32)
* Output: Dataset with Extracted Set
* Output: Dataset of Remaining Sets
Widget: Dataset to Object
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function DatasetToObject in package latino. The original function signature: DatasetToObject.
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Output: Standard Object Representataion of Dataset (List>> explained as: (List of Examples)<(Example Tuple)<(Id) int,(Label) string,(BOW Dictionary)<(Word Id) int,(Word Weight) double>>>)
Widget: Object to Dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ObjectToDataset in package latino. The original function signature: ObjectToDataset.
* Input: Standard Object Representataion of Dataset (List>> explained as: (List of Examples)<(Example Tuple)<(Id) int,(Label) string,(BOW Dictionary)<(Word Id) int,(Word Weight) double>>>)
* Output: Dataset
Widget: Add Labels
-------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function AddLabelsToDocumentVectors in package latino. The original function signature: AddLabelsToDocumentVectors.
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Input: Labeles (Array of Strings) (System.Collections.Generic.List`1[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]])
* Output: Dataset
Category Stop Words
===================
Category Latino
---------------
Category Advanced
~~~~~~~~~~~~~~~~~
Widget: Stop Word Tagger Hub (Text)
````````````````````````````````````
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function TagStringStopwords in package latino. The original function signature: TagStringStopwords.
* Input: Text (System.Object)
* Input: Token Tagger (string or array of strings)
* Parameter: Output Feature Name (System.String)
* Default value: stopword
* Output: String (string or array of strings (based on the input))
Widget: Stop Word Sets
~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function GetStopWords in package latino. The original function signature: GetStopWords.
* Parameter: Language (Latino.TextMining.Language)
* Possible values:
* Bulgarian
* Czech
* Danish
* Dutch
* English
* Finnish
* French
* German
* Hungarian
* Italian
* Norwegian
* Portuguese
* Romanian
* Russian
* Serbian
* Slovene
* Spanish
* Swedish
* Default value: English
* Output: StopWords
* Example usage: `Simple Document Preprocessing `_
Widget: Stop Word Tagger
~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructStopWordsTagger in package latino. The original function signature: ConstructStopWordsTagger.
* Input: Stopwords (List of stopwords)
* Parameter: Ignore Case (If true than words are marked stopword regardless of their casing.)
* Default value: true
* Output: Stop Word Tagger
Widget: Stop Word Tagger Hub
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function TagADCStopwords in package latino. The original function signature: TagADCStopwords.
* Input: Annotated Document Corpus (LatinoInterfaces.DocumentCorpus)
* Input: Token Tagger (System.Object)
* Parameter: Token Annotation (System.String)
* Default value: Token
* Parameter: Output Feature Name (System.String)
* Default value: stopword
* Output: Annotated Document Corpus
Category Nltk
-------------
Widget: Stop Word Tagger
~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Constructs a python stop word tagger object.
* Input: Stop Words (A list or string (stop words separated by new lines) of stop words.)
* Parameter: Ignore Case (If true than words are marked as stop word regardless of their casing.)
* Default value: true
* Output: Stop Word Tagger (A python dictionary containing the StopWordTagger object and its arguments.)
* Example usage: `Simple Document Preprocessing `_
Widget: Stop Word Tagger Hub
-----------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Apply the *stop_word_tagger* object on the Annotated Document Corpus (*adc*):
1. first select only annotations of type Token Annotation *element_annotation*,
2. apply the stop_word tagger
3. create new annotations *output_feature* with the outputs of the stop word tagger.
* Input: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Input: Stop Word Tagger (A python dictionary containing the stop word tagger object and its arguments.)
* Parameter: Token Annotation (Which annotated part of document to be searched for stopwords.)
* Default value: Token
* Parameter: Output Feature Name (How to annotate the newly discovered stop word features.)
* Default value: StopWord
* Output: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Example usage: `LBD workflows for outlier detection `_
Category Similarity Matrix
==========================
Category Latino
---------------
Widget: Calculate Similarity Matrix
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function CalculateSimilarityMatrix in package latino. The original function signature: CalculateSimilarityMatrix.
* Input: Dataset (Latino.Model.IUnlabeledExampleCollection`1[[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Input: Dataset (Latino.Model.IUnlabeledExampleCollection`1[[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Parameter: Similarity Threshold (System.Double)
* Default value: 0
* Parameter: Full Matrix (not only Lower Triangular) (System.Boolean)
* Default value: true
* Output: Similarity Matrix
Widget: Convert Matrix to Table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function SparseMatrixToTable in package latino. The original function signature: SparseMatrixToTable.
* Input: Sparse Matrix (Latino.SparseMatrix`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]])
* Output: Matrix Table
Widget: Calculate Similarity Matrix
------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function CalculateSimilarityMatrix in package latino. The original function signature: CalculateSimilarityMatrix.
* Input: Dataset (Latino.Model.IUnlabeledExampleCollection`1[[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Input: Dataset (Latino.Model.IUnlabeledExampleCollection`1[[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Parameter: Similarity Threshold (System.Double)
* Default value: 0
* Parameter: Full Matrix (not only Lower Triangular) (System.Boolean)
* Default value: true
* Output: Similarity Matrix
Category Clustering
===================
Category Latino
---------------
Widget: KMeans Clusterer
~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructKMeansClusterer in package latino. The original function signature: ConstructKMeansClusterer.
* Parameter: K (Number of Clusteres) (System.Int32)
* Default value: 10
* Parameter: Centroid Type (Latino.Model.CentroidType)
* Possible values:
* Avg
* Nrm L2
* Sum
* Default value: NrmL2
* Parameter: Similarity Measure (LatinoInterfaces.SimilarityModel)
* Possible values:
* Cosine
* Dot Product
* Default value: Cosine
* Parameter: Random Seed (-1: Use Always Different) (System.Int32)
* Default value: -1
* Parameter: Eps (System.Double)
* Default value: 0.0005
* Parameter: Trials (Num of Initializations) (System.Int32)
* Default value: 1
* Output: Clusterer
Widget: KMeans Fast Clusterer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructKMeansFastClusterer in package latino. The original function signature: ConstructKMeansFastClusterer.
* Parameter: K (Number of Clusteres) (System.Int32)
* Default value: 10
* Parameter: Random Seed (-1: Use Always Different) (System.Int32)
* Default value: -1
* Parameter: Eps (System.Double)
* Default value: 0.0005
* Parameter: Trials (Num of Initializations) (System.Int32)
* Default value: 1
* Output: Clusterer
Widget: Hierarchical Bisecting Clusterer
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructHierarchicalBisectingClusterer in package latino. The original function signature: ConstructHierarchicalBisectingClusterer.
* Parameter: Min Quality (System.Double)
* Default value: 0.2
* Output: Clusterer
Widget: Clustering Results Info
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ClusteringResultsInfo in package latino. The original function signature: ClusteringResultsInfo.
* Input: Clustering Results (Latino.Model.ClusteringResult)
* Output: Document Labels (Array of Clusteres Ids)
* Output: Clusters Tree
Widget: View Clusters
~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ViewClusters_PYTHON in package latino. The original function signature: ViewClusters_PYTHON.
* Input: Clustering Results (System.Object)
* Outputs: Popup window which shows widget's results
Category Scikit
---------------
Widget: k-Means
~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares. This algorithm requires the number of clusters to be specified. It scales well to large number of samples and has been used across a large range of application areas in many different fields.
* Parameter: Number of clusters (The number of clusters to form as well as the number of centroids to generate.)
* Default value: 8
* Parameter: Max iterations (Maximum number of iterations of the k-means algorithm for a single run.)
* Default value: 300
* Parameter: Tolerance (Relative tolerance with regards to inertia to declare convergence.)
* Default value: 1e-4
* Output: Clustering
Widget: Clustering Hub
-----------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ClusterDocumentVectors in package latino. The original function signature: ClusterDocumentVectors.
* Input: Clusterer (LatinoClowdFlows.IClusterer)
* Input: Dataset (Latino.Model.IUnlabeledExampleCollection`1[[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Output: Clustering Results
Category Classification
=======================
Category Latino
---------------
Widget: Nearest Centroid Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructCentroidClassifier in package latino. The original function signature: ConstructCentroidClassifier.
* Parameter: Similarity Model (LatinoInterfaces.SimilarityModel)
* Possible values:
* Cosine
* Dot Product
* Default value: Cosine
* Parameter: Normalize Centorids (System.Boolean)
* Default value: false
* Output: Centroid Classifier
* Example usage: `Classifier evaluation `_
Widget: Naive Bayes Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructNaiveBayesClassifier in package latino. The original function signature: ConstructNaiveBayesClassifier.
* Parameter: Normalize (System.Boolean)
* Default value: false
* Parameter: Log Sum Exp Trick (System.Boolean)
* Default value: true
* Output: Classifier
* Example usage: `Classifier evaluation `_
Widget: SVM Binary Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructSvmBinaryClassifier in package latino. The original function signature: ConstructSvmBinaryClassifier.
* Parameter: C (zero implies default value ([avg. x*x]^-1))
* Default value: 0
* Parameter: Biased Hyperplane (System.Boolean)
* Default value: true
* Parameter: Kernel Type (Latino.Model.SvmLightKernelType)
* Possible values:
* Linear
* Polynomial
* Radial Basis Function
* Sigmoid
* Default value: Linear
* Parameter: Kernel Parameter Gamma (System.Double)
* Default value: 1
* Parameter: Kernel Parameter D (System.Double)
* Default value: 1
* Parameter: Kernel Parameter S (System.Double)
* Default value: 1
* Parameter: Kernel Parameter C (System.Double)
* Default value: 0
* Parameter: Eps (System.Double)
* Default value: 0.001
* Parameter: Max Iterations (System.Int32)
* Default value: 100000
* Parameter: Custom Parameter String (System.String)
* Output: Classifier
Widget: SVM Multiclass Fast Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructSvmMulticlassFast in package latino. The original function signature: ConstructSvmMulticlassFast.
* Parameter: C (System.Double)
* Default value: 5000
* Parameter: Eps (System.Double)
* Default value: 0.1
* Output: Classifier
Widget: Majority Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructMajorityClassifier in package latino. The original function signature: ConstructMajorityClassifier.
* Output: Classifier
Widget: Maximum Entropy Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructMaximumEntropyClassifier in package latino. The original function signature: ConstructMaximumEntropyClassifier.
* Parameter: Move Data (System.Boolean)
* Default value: false
* Parameter: Num of Iterations (System.Int32)
* Default value: 100
* Parameter: CutOff (System.Int32)
* Default value: 0
* Parameter: Num of Threads (System.Int32)
* Default value: 1
* Parameter: Normalize (System.Boolean)
* Default value: false
* Output: Classifier
* Example usage: `Classifier evaluation `_
Widget: Maximum Entropy Fast Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructMaximumEntropyClassifierFast in package latino. The original function signature: ConstructMaximumEntropyClassifierFast.
* Parameter: Move Data (System.Boolean)
* Default value: false
* Parameter: Num of Iterations (System.Int32)
* Default value: 100
* Parameter: CutOff (System.Int32)
* Default value: 0
* Parameter: Num of Threads (System.Int32)
* Default value: 1
* Parameter: Normalize (System.Boolean)
* Default value: false
* Output: Classifier
Widget: Knn Classifier
~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructKnnClassifier in package latino. The original function signature: ConstructKnnClassifier.
* Parameter: Similarity Model (LatinoInterfaces.SimilarityModel)
* Possible values:
* Cosine
* Dot Product
* Default value: Cosine
* Parameter: K (Neighbourhood) (System.Int32)
* Default value: 10
* Parameter: Soft Voting (System.Boolean)
* Default value: true
* Output: Classifier
Widget: Knn Fast Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructKnnClassifierFast in package latino. The original function signature: ConstructKnnClassifierFast.
* Parameter: K (Neighbourhood) (System.Int32)
* Default value: 10
* Parameter: Soft Voting (System.Boolean)
* Default value: true
* Output: Classifier
* Example usage: `Classifier evaluation `_
Widget: Accuracy Claculation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function AccuracyClaculation in package latino. The original function signature: AccuracyClaculation.
* Input: True Labels (System.Collections.IList)
* Input: Predicted Labels (System.Collections.IList)
* Output: Accuracy
* Output: Statistics (Statistics:confusionMatrix: first level of confusion matrix dictionary present true labels (first input) while the second, inner layer, presents predicted labels (second output).
Stataistics:additinalScores: dictionary's id presents the label that was considered positive for calculation and dictionary's value are actual additioanl scores.)
Widget: Cross Validation
~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function CrossValidation in package latino. The original function signature: CrossValidation.
* Input: Classifier (Latino.Model.IModel`1[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]])
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Parameter: Num of Sets (System.Int32)
* Default value: 10
* Parameter: Assign Sets Randomly (System.Boolean)
* Default value: true
* Parameter: Use Seed for Random (System.Boolean)
* Default value: false
* Parameter: Random Seed (System.Int32)
* Default value: 0
* Output: Data Object with results
Widget: Cross Validation (Predefined Splits)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function CrossValidationPredefSplits in package latino. The original function signature: CrossValidationPredefSplits.
* Input: Classifier (Latino.Model.IModel`1[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]])
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Input: Sets (List with predefined set numbers) (System.Collections.Generic.List`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]])
* Output: Data Object with results
Widget: Multiple Splits Validation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function CrossValidationPredefMultiSplits in package latino. The original function signature: CrossValidationPredefMultiSplits.
* Input: Classifier (Latino.Model.IModel`1[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]])
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Input: Multiple Set Indexes (Dictionary with multiple predefined split element indexes. {"train0":[1,2,3],"test0":[4,5],"train1":[2,3,4],"test1":[5,6]})
* Output: Data Object with results
Widget: Predict Classification
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function PredictClassification in package latino. The original function signature: PredictClassification.
* Input: Classifier (Latino.Model.IModel`1[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]])
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Output: Prediction(s)
* Output: Labeled dataset
Widget: Prediction Info
~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function PredictionInfo in package latino. The original function signature: PredictionInfo.
* Input: Prediction(s) (System.Collections.Generic.List`1[[Latino.Model.Prediction`1[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Output: Lable(s) (Array of Strings)
* Output: Prediction Info(s)
Widget: View Classifications
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ViewClasssifications_PYTHON in package latino. The original function signature: ViewClasssifications_PYTHON.
* Input: Prediction(s) (System.Object)
* Outputs: Popup window which shows widget's results
Category Nltk
-------------
Widget: Naive Bayes Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
A classifier based on the Naive Bayes algorithm. In order to find the
probability for a label, this algorithm first uses the Bayes rule to
express P(label|features) in terms of P(label) and P(features|label):
| P(label) * P(features|label)
| P(label|features) = ------------------------------
| P(features)
The algorithm then makes the 'naive' assumption that all features are
independent, given the label:
| P(label) * P(f1|label) * ... * P(fn|label)
| P(label|features) = --------------------------------------------
| P(features)
Rather than computing P(featues) explicitly, the algorithm just
calculates the denominator for each label, and normalizes them so they
sum to one:
| P(label) * P(f1|label) * ... * P(fn|label)
| P(label|features) = --------------------------------------------
| SUM[l]( P(l) * P(f1|l) * ... * P(fn|l) )
* Parameter: Normalize (System.Boolean)
* Default value: false
* Parameter: Log Sum Exp Trick (System.Boolean)
* Default value: true
* Output: Classifier
Category Scikit
---------------
Widget: Decision Tree Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/scikit_classifiers/static/scikit_classifiers/icons/widget/scikit_Tree-icon.png
:width: 50
:height: 50
A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility.
* Parameter: Max features (The number of features to consider when looking for the best split:
If int, then consider max_features features at each split.
If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.
If “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.)
* Default value: auto
* Parameter: Max depth (The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. )
* Default value: 100
* Output: Classifier
* Example usage: `LBD workflows for outlier detection `_
Widget: Gaussian Naive Bayes Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/scikit_classifiers/static/scikit_classifiers/icons/widget/classifier_naive_bayes_image.png
:width: 50
:height: 50
Gaussian Naive Bayes. When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a Gaussian distribution.
* Output: Classifier
* Example usage: `Classifier evaluation `_
Widget: k-Nearest Neighbours Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/scikit_classifiers/static/scikit_classifiers/icons/widget/classifier_knn_image.png
:width: 50
:height: 50
Classifier implementing the k-nearest neighbors vote.
* Parameter: Number of neighbors (Number of neighbors to use by default for k_neighbors queries.)
* Default value: 5
* Parameter: Algorithm (Algorithm used to compute the nearest neighbors:
‘ball_tree’ will use BallTree
‘kd_tree’ will use KDTree
‘brute’ will use a brute-force search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method.
Note: fitting on sparse input will override the setting of this parameter, using brute force.)
* Possible values:
* ball tree
* brute
* kd tree
* most appropriate (automatically)
* Default value: auto
* Parameter: Weights (weight function used in prediction. Possible values:
‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.
Uniform weights are used by default.)
* Possible values:
* distance
* uniform
* Default value: uniform
* Output: Classifier
* Example usage: `Classifier evaluation `_
Widget: Logistic regression Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/scikit_classifiers/static/scikit_classifiers/icons/widget/scikit_LogisticRegression.png
:width: 50
:height: 50
Logistic regression, despite its name, is a linear model for classification rather than regression. Logistic regression is also known in the literature as logit regression, maximum-entropy classification (MaxEnt) or the log-linear classifier. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function.
* Parameter: Penalty (Used to specify the norm used in the penalization.)
* Possible values:
* l1
* l2
* Default value: l1
* Parameter: C (Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization.)
* Default value: 1.0
* Output: Classifier
Widget: Multinomial Naive Bayes Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/scikit_classifiers/static/scikit_classifiers/icons/widget/classifier_naive_bayes_image.png
:width: 50
:height: 50
The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.
* Parameter: Alpha (Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing). )
* Default value: 1.0
* Parameter: Fit prior (Whether to learn class prior probabilities or not.
If false, a uniform prior will be used.)
* Output: Classifier
* Example usage: `Outlier document detection `_
Widget: SVM Classifier
~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/scikit_classifiers/static/scikit_classifiers/icons/widget/classifier_svm_image.png
:width: 50
:height: 50
Implementation of Support Vector Machine classifier using libsvm: the kernel can be non-linear but its SMO algorithm does not scale to large number of samples as LinearSVC does. Furthermore SVC multi-class mode is implemented using one vs one scheme while LinearSVC uses one vs the rest.
* Parameter: C (Penalty parameter C of the error term.)
* Default value: 1.0
* Parameter: Degree (Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.)
* Default value: 3
* Parameter: Kernel (Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to precompute the kernel matrix.)
* Possible values:
* linear
* poly
* precomputed
* rbf
* sigmoid
* Default value: rbf
* Output: Classifier
* Example usage: `POS tagger intrinsic evaluation - experiment 1 `_
Widget: SVM Linear Classifier
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/scikit_classifiers/static/scikit_classifiers/icons/widget/classifier_svm_image.png
:width: 50
:height: 50
Similar to Support Vector Classification with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better (to large numbers of samples).
* Parameter: C (Penalty parameter C of the error term.)
* Default value: 1.0
* Parameter: Loss (Specifies the loss function. ‘l1’ is the hinge loss (standard SVM) while ‘l2’ is the squared hinge loss.)
* Possible values:
* l1
* l2
* Default value: l2
* Parameter: Penalty (Specifies the norm used in the penalization. The ‘l2’ penalty is the standard used in SVC. The ‘l1’ leads to coef_ vectors that are sparse.)
* Possible values:
* l1
* l2
* Default value: l2
* Parameter: Multi class (Determines the multi-class strategy if y contains more than two classes. ovr trains n_classes one-vs-rest classifiers, while crammer_singer optimizes a joint objective over all classes. While crammer_singer is interesting from an theoretical perspective as it is consistent it is seldom used in practice and rarely leads to better accuracy and is more expensive to compute. If crammer_singer is choosen, the options loss, penalty and dual will be ignored.)
* Possible values:
* crammer singer
* ovr
* Default value: ovr
* Output: Classifier
* Example usage: `Classifier evaluation `_
Widget: Apply Classifier Hub
-----------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
TODO
* Input: Classifier (Latino.Model.IModel`1[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]])
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Parameter: Calculate class probabilities (Calculate classification class probabilities. May slow down algorithm prediction.)
* Default value: true
* Output: Prediction(s)
* Output: Labeled dataset
Widget: Train Classifier Hub
-----------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function TrainClassifier in package latino. The original function signature: TrainClassifier.
* Input: Classifier (Latino.Model.IModel`1[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]])
* Input: Dataset (Latino.Model.LabeledDataset`2[[System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[Latino.SparseVector`1[[System.Double, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]], Latino, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]])
* Output: Classifier
Widget: Extract Classifier Name
--------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Returns a string with pretty classifier name.
* Input: Classifier
* Output: Classifier Name
Widget: Extract Actual and Predicted Values
--------------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Takes as an input a ADC object with already defined actual and predicted features that can be compared. Outputs a combined list of actual and predicted values which can be used e.g. by the Classification Statistics widget.
* Input: Predictions (Classification Predictions)
* Input: Dataset (BoW Dataset)
* Output: Actual and Predicted Values (List of Actual and Predicted Values)
Category Lexicology
===================
Category Controlled Vocabularies
--------------------------------
Widget: MeSH vocabulary builder
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Constructs vocabulary from selected top categories in MeSH hierarchy.
* Parameter: N-grams (Construct n-grams subsets of words from a MeSH term)
* Output: List of MeSH terms (List of MeSH terms.)
Category Literature Based Discovery
===================================
Category Heuristic Calculation
------------------------------
Widget: Exclude Terms that Appear in One Domain Only
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
* Input: Bag of Words Model Constructor (Bag of Words Model Constructor )
* Input: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Output: Bag of Words Model Constructor with Filtered Vocabulary (Bag of Words Model Constructor (BowModelConstructor) gathers utilities to build feature vectors from annotated document corpus.)
* Output: BOW Model Dataset (Sparse BOW feature vectors.)
Widget: Calculate Term Heuristics Scores
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Calculate all input heuristics.
* Input: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Input: Bag of Words Model (Bag of Words Model Constructor (BowModelConstructor) gathers utilities to build feature vectors from annotated document corpus.)
* Input: Heuristic or Heuristic list (List of heuristic names which scores will be calculated.)
* Output: Heuristic Scores (Calculated B-Term Heuristic Scores)
* Example usage: `Literature Based Discovery (overview with vocab) `_
Widget: Actual and Predicted Values
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Prepare actual and predicted values for B-term Heuristics.
* Input: Bag of Words Model Constructor (Bag of Words Model Constructor (BowModelConstructor) gathers utilities to build feature vectors from annotated document corpus.)
* Input: B-terms (List of bridging terms)
* Input: Heuristic Scores (Calculated B-Term Heuristic Scores)
* Output: Actual and Predicted Values (List of actual and predicted values for every B-term Discovery Heuristic)
Category Heuristic Specification
--------------------------------
Widget: Frequency-based heuristics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Interactive widget which allows specification of of frquency-based bridging term discovery heuristics.
* Output: List of Selected Heuristics for Bringing Term Discovery
* Example usage: `Literature Based Discovery (overview) `_
Widget: TF-IDF-based heuristics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Interactive widget which allows specification of TF-IDF based bridging term discovery heuristics.
* Output: List of Selected Heuristics for Bringing Term Discovery
* Example usage: `Literature Based Discovery (overview) `_
Widget: Similarity-based heuristics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Interactive widget which allows specification of similarity-based bridging term discovery heuristics.
* Output: List of Selected Heuristics for Bringing Term Discovery
Widget: Outlier-based heuristics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Interactive widget which allows specification of outlier-based bridging term discovery heuristics.
* Output: List of Selected Heuristics for Bringing Term Discovery
* Example usage: `Literature Based Discovery (overview) `_
Widget: Banded matrix-based heuristics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Interactive widget which allows specification of bridging term discovery heuristics based on banded matrices.
* Output: List of Selected Heuristics for Bringing Term Discovery
Widget: Outlier-based heuristic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Interactive widget which allows specification of a custom outlier-based bridging term discovery heuristics by using the classifiers from the input.
* Input: Classifier
* Output: List of Selected Heuristics for Bringing Term Discovery
Widget: Heuristic Maximum
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Defines a calculated heuristic that is the maximum (for every term) of the input heuristics.
* Input: Heuristic or Heuristic list
* Output: Heuristic Max Specification (Heuristic Maximum Specification)
Widget: Heuristic Minimum
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Defines a calculated heuristic that is the minimum (for every term) of the input heuristics.
* Input: Heuristic or Heuristic list
* Output: Heuristic Min Specification (Heuristic Minimum Specification)
Widget: Heuristic Normalization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Defines calculated heuristics where scores are scaled to [0,1] values using the minimum and maximum scores.
* Input: Heuristic or Heuristic list
* Output: Normalized Heuristic or Heuristic Specifications list (Normalized Heuristic Specification or Heuristic Specifications list)
* Example usage: `LBD workflows for outlier detection `_
Widget: Heuristic Sum
~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Defines a calculated heuristic that is the summation of the input heuristics.
* Input: Heuristic or Heuristic list
* Output: Heuristic Sum Specification (Heuristic Summation Specification)
* Example usage: `Literature Based Discovery (overview) `_
Widget: Ensemble Average Position
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
The Ensemble Average Position score is calculated as an average of position scores of individual base heuristics.
* Input: Heuristic or Heuristic list
* Output: Ensemble Average Position Specification
* Example usage: `Literature Based Discovery (overview) `_
Widget: Ensemble Heuristic Vote
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Every term get an integer score, which represents how many of input heuristics voted for the term. Each input heuristic gives one vote to each term which is in the first third in its ranked list of terms.
* Input: Heuristic or Heuristic list
* Output: Ensemble Heuristic Vote Specification
* Example usage: `LBD workflows for outlier detection `_
Category Term ranking and Exploration
-------------------------------------
Widget: Explore in CrossBee
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Explore heuristic scores and terms in CrossBee.
* Input: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Input: Bag of Words Model Constructor (Bag of Words Model Constructor )
* Input: BOW Model Dataset (Sparse BOW feature vectors)
* Input: B-terms (List of bridging terms)
* Input: Heuristic Scores (Calculated B-term)
* Parameter: CrossBee API URL (URL to the CrossBee API for exploring external data. Data to be displayed in CrossBee will be available at TextFlows' URL. This URL will be send to CrossBee API via replacing "{dataurl.json}" string in the supplied Crossbe API URL.)
* Default value: http://crossbee.ijs.si/Home/ImportFromJson
* Parameter: Primary Heuristic Index (Index of the primary heuristics to be analized as ensamble)
* Default value: 0
* Output: Serialized Annotated Document Corpus (Serialized Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Output: Vocabulary
* Output: Heuristic Scores (Calculated B-Term Heuristic Scores)
* Output: B-terms (List of bridging terms)
* Output: Serialized BOW Model Dataset (Serialized sparse BOW feature vectors)
* Output: Primary Heuristic Index (Index of the primary heuristics to be analized as ensamble)
* Example usage: `LBD workflows for outlier detection `_
Category Helpers
================
Category Tagging
----------------
Widget: Condition Tagger
~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ConstructConditionTagger in package latino. The original function signature: ConstructConditionTagger.
* Parameter: Feature Condition (Condition which tokens to include based on their features.
Format examples:
-Feature1 (don't include tokens with Feature1 set ta any value)
-Feature1=Value1 (don't include tokens with Feature1 set to the value Value1)
-Feature1 +Feature2 (don't include tokens with Feature1 set unless it has also Feature2 set)
-Feature1=Value1 +Feature2 (don't include tokens with Feature1 set to Value1 unless it has also Feature2 set to any value)...)
* Parameter: output Feature Value (System.String)
* Default value: true
* Parameter: Put token/feature text as the output feature value (If set to true than token or token's feature text is asigned as output feature value)
* Output: Tagger
Widget: Advanced Object Viewer
-------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Displays any input.
* Input: Object (Any type of object.)
* Parameter: Attribute (The depth of the object display)
* Parameter: Maximum Output Length (System.Int32)
* Default value: 5000
* Outputs: Popup window which shows widget's results
Widget: Random Cross Validation Sets
-------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function RandomCrossValidationSets in package latino. The original function signature: RandomCrossValidationSets.
* Input: Example List (Not required, but if set, then it overrides parameter 'numOfExamples' and len(examples) is used for 'numOfExamples'. This should be a type implementing Count, Count() or Length.)
* Parameter: Num of Examples (This determines the length of the set id list. If input 'examples' is set then len(examples) is used for 'numOfExamples' and this setting is overriden.)
* Default value: 100
* Parameter: Num of Sets (System.Int32)
* Default value: 10
* Parameter: Assign Sets Randomly (System.Boolean)
* Default value: true
* Parameter: Use Seed for Random (System.Boolean)
* Default value: false
* Parameter: Random Seed (System.Int32)
* Default value: 0
* Output: Example SetIds List
Widget: Random Sequential Validation Sets
------------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function RandomSequentialValidationSets in package latino. The original function signature: RandomSequentialValidationSets.
* Input: Example List (Not required, but if set, then it overrides parameter 'numOfExamples' and len(examples) is used for 'numOfExamples'. This should be a type implementing Count, Count() or Length.)
* Parameter: Num of Examples (This determines the length of the set id list. If input 'examples' is set then len(examples) is used for 'numOfExamples' and this setting is overriden.)
* Default value: 100
* Parameter: Num of Sets (System.Int32)
* Default value: 10
* Parameter: Assign Sets Randomly (If not set then sets are exactly evenly distributet across the whole dataset.)
* Default value: true
* Parameter: Use Seed for Random (System.Boolean)
* Default value: false
* Parameter: Random Seed (System.Int32)
* Default value: 0
* Parameter: Size of Train Set (May be specified as absolute number or number foloweed by '%' to denote the percentage of the whole dataset.)
* Default value: 40%
* Parameter: Size of Test Set (May be specified as absolute number or number foloweed by '%' to denote the percentage of the whole dataset.)
* Default value: 10%
* Parameter: Size of Space Between Train and Test Set (May be specified as absolute number or number foloweed by '%' to denote the percentage of the whole dataset.)
* Default value: 1%
* Output: Multiple Set Indexes
Widget: Advanced Object to String Converter
--------------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Displays any input.
* Input: Object (Any type of object.)
* Parameter: Attribute (The attribute of the object to display)
* Parameter: Maximum Output Length (System.Int32)
* Default value: 500000
* Output: Object String Representation
Widget: C#.NET Snippet
-----------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Runs c#.NET snippet. You can use variable which is provided on the input by the name "in1" .. "inN". Whatever you want to otput needs to be asigned to the variable "out1" before the code is terminated
* Input: Snippet Input Parameter(s) (input can be accesed as variable "in1" .. "inN" inside the code)
* Parameter: C# Snippet Code (Input can be accesed as variable "in1" .. "inN" inside the code and output can be accesed/assigned as variable "out1" inside the code.)
* Default value: // This is the C#.NET Code Snippet where you can modify the data.
// Varaible "in1" .. "inN" contains whatever you connected to the input port
// Input variables are correctly typed.
// Whatever is assigned to the variable "out1" will be transfered to the output port.
out1 = in1;
* Parameter: Namespace Section (using directives) (System.String)
* Default value: using System;
using System.Collections.Generic;
using System.Linq;
using Latino;
using Latino.TextMining;
using LatinoInterfaces;
* Parameter: Additional References (imports) (System.String)
* Default value: System.dll
System.Xml.dll
System.Core.dll
workflows\textflows_dot_net\bin\Latino.dll
workflows\textflows_dot_net\bin\LatinoWorkflows.dll
workflows\textflows_dot_net\bin\LatinoInterfaces.dll
* Output: out (output can be accesed/assigned as variable "out1" inside the code)
* Output: Console Output
* Output: Possible compile/runtime errors
* Output: Generated Code
Widget: Display Table
----------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function ShowTable_PYTHON in package latino. The original function signature: ShowTable_PYTHON.
* Input: Table (System.Object)
* Outputs: Popup window which shows widget's results
Widget: Get Multi Set Indexes
------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Generates multiple set indexes from a list of predefined set numbers. See widgets "Cross Validation (Predefined Splits)" and "Multiple Splits Validation"
* Input: Sets (List with predefined set numbers) (System.Collections.Generic.List`1[[System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]])
* Output: Multiple Set Indexes
Widget: Flatten String Hierarchy
---------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function FlattenObjectToStringArray in package latino. The original function signature: FlattenObjectToStringArray.
* Input: data (System.Object)
* Output: flatData
Widget: Generate Integer Range
-------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function GenerateIntegerRange in package latino. The original function signature: GenerateIntegerRange.
* Parameter: Start (System.Int32)
* Default value: 0
* Parameter: Stop (System.Int32)
* Default value: 10
* Parameter: Step (System.Int32)
* Default value: 1
* Output: Range
Widget: Python Snippet
-----------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Runs python snippet. You can use variable which is provided on the input by the name "in1" .. "inN". Whatever you want to otput needs to be asigned to the variable "out1" before the code is terminated
* Input: in (input can be accesed as variable "in1" .. "inN" inside the code)
* Parameter: Python Snippet Code (Input can be accesed as variable "in1" .. "inN" inside the code and output can be accesed/assigned as variable "out1" inside the code.)
* Default value: # This is the Python Code Snippet where you can modify the data however is needed.
# Varaible "in1" .. "inN" contains whatever you connected to the input port
# Whatever is assigned to the variable "out1" will be transfered to the output port.
out1 = in1
* Output: out (output can be accesed/assigned as variable "out1" inside the code)
Widget: Split Object
---------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Automatically generated widget from function SplitObject_PYTHON in package latino. The original function signature: SplitObject_PYTHON.
* Input: object (System.Object)
* Parameter: Object Modifier (if one wants to extract object's attributes, leading dot should be used.)
* Output: object
Category Noise Handling
=======================
Category Noise Filters
----------------------
Widget: Classification Filter
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/noise/static/noise/icons/widget/CF-filter-black.png
:width: 50
:height: 50
A widget which uses a classifier as a tool for detecting noisy instances in data.
* Input: Learner
* Input: Dataset
* Parameter: Timeout
* Default value: 300
* Parameter: Number of Folds for Cross-Validation
* Possible values:
* 10
* 2
* 3
* 4
* 5
* 6
* 7
* 8
* 9
* Default value: 10
* Output: Noise instances
* Example usage: `Outlier document detection `_
Widget: Matrix Factorization Filter
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/noise/static/noise/icons/widget/CF-filter-black.png
:width: 50
:height: 50
* Input: Dataset
* Parameter: Threshold
* Default value: 10
* Output: Noise instances
Widget: Saturation Filter
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/noise/static/noise/icons/widget/SF-filter_1.png
:width: 50
:height: 50
Widget implementing a saturation filter used to eliminate noisy training examples from labeled data.
Reference: http://www.researchgate.net/publication/228898399
* Input: Dataset
* Parameter: Type of Saturation Filtering
* Possible values:
* Normal
* Pre-pruned
* Default value: normal
* Output: Noise instances
Widget: HARF
-------------
.. image:: ../workflows/noise/static/noise/icons/widget/HARF_60-48-RF.png
:width: 50
:height: 50
High Agreement Random Forest
* Parameter: Agreement Level
* Possible values:
* 60
* 70
* 80
* 90
* Default value: 70
* Output: HARF Classifier
Widget: NoiseRank
------------------
.. image:: ../workflows/noise/static/noise/icons/widget/NoiseRank3.png
:width: 50
:height: 50
Widget implementing an ensemble-based noise ranking methodology for explicit noise and outlier identification.
Reference: http://dx.doi.org/10.1007/s10618-012-0299-1
* Input: Dataset
* Input: Noisy Instances
* Output: All Noise
* Output: Selected Instances
* Output: Selected Indices
* Example usage: `Outlier document detection `_
Category Performance Evaluation
===============================
Widget: Aggregate Detection Results
------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Aggregates results of the detection of noisy instances in data
* Input: Positive Indices
* Input: Detected Instances
* Output: Aggregated Detection Results
Widget: Classification statistics
----------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Calculates various classification statistics from true and predicted labels. Labels can be provided in two ways:
a) [y_true, y_predicted]
or for folds:
b) [[y_true_1, y_predicted_1], [y_true_2, y_predicted_2], ...]
* Input: True and predicted labels (List of true and predicted labels (see help for details))
* Output: Classification accuracy
* Output: Precision
* Output: Recall
* Output: F1 (F1 measure)
* Output: AUC
* Output: Confusion matrix
* Example usage: `COMTRADE demo `_
Widget: Evaluate Detection Algorithms
--------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
* Input: Noisy Instances
* Input: Detected Noise
* Parameter: Beta parameter for F-mesure
* Default value: 1
* Output: Noise Detection Performance
Widget: Evaluate Repeated Detection
------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
* Input: Algorithm Performances
* Parameter: F-measure Beta-parameter
* Default value: 1
* Output: Performance Results
Widget: Evaluation Results to 2d Table
---------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Table that can be used in workflows with nested loops. You can define names on x and y axis. You can also choose the evaluation metrics that you want to show from a dropdown menu.
* Input: Evaluation Results
* Parameter: Evaluation metric (Choose the evaluation measurement you would like to show in the table.)
* Possible values:
* accuracy
* auc
* fscore
* precision
* recall
* Default value: accuracy
* Outputs: Popup window which shows widget's results
* Example usage: `POS tagger intrinsic evaluation - experiment 5 `_
Widget: Evaluation Results to Table
------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
* Input: Evaluation Results
* Outputs: Popup window which shows widget's results
* Example usage: `POS tagging classification evaluation (copy) `_
Widget: Performance Chart
--------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
* Input: Evaluation Results
* Outputs: Popup window which shows widget's results
* Example usage: `POS tagging classification evaluation (copy) `_
Widget: VIPER: Visual Performance Evaluation
---------------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
VIPER: performance evaluation in the Precision-Recall (PR) space. An interactive widget showing the PR plot , which can also be saved as an image or printed.
* Input: Algorithm Performance
* Parameter: eps-proximity evaluation parameter [%]
* Possible values:
* 1
* 10
* 2
* 3
* 4
* 5
* 6
* 7
* 8
* 9
* Do not use eps-proximity evaluation
* Default value: 0.05
* Outputs: Popup window which shows widget's results
* Example usage: `POS tagging classification evaluation (copy) `_
Widget: Extract Actual and Predicted features
----------------------------------------------
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Takes as an input an ADC object with predicted features and an ADC object with actual features(golden standard). Output is a list containing a list of predicted features and a list contained actual features.
* Input: Annotated Document Corpus (Annotated Document Corpus (workflows.textflows.DocumentCorpus))
* Parameter: Predicted annotation (System.String)
* Default value: POS tag
* Parameter: Actual annotation (System.String)
* Default value: POS tag
* Parameter: Lowercase (Convert features to lowercase)
* Default value: False
* Output: Actual and Predicted Values (List of Actual and Predicted Values)
Category Visual performance evaluation (ViperCharts)
====================================================
Category Column charts
----------------------
Widget: Column chart
~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Standard graphical presentation of algorithm performance. Also referred to as a bar chart. Visualizes the values of one or more performance measures of the evaluated algorithms.
* Input: Performance results
* Outputs: Popup window which shows widget's results
Category Curve charts
---------------------
Widget: Lift curves
~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
The Lift curve widget plots the true positive rate (also found in ROC and PR curves) against the predicted positive rate (the fraction of examples, classified as positive). Each point represents the classifier performance for a given threshold or ranking cut-off point. http://viper.ijs.si/types/curve/
* Input: Performance results
* Parameter: Chart title
* Outputs: Popup window which shows widget's results
Widget: ROC curves
~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
A widget which illustrates the trade off between the true positive rate and the true negative rate of a classifier. Each point represents the classifier performance for a given threshold or ranking cut-off point. http://viper.ijs.si/types/curve/
* Input: Performance results
* Parameter: Chart title
* Outputs: Popup window which shows widget's results
Widget: ROC hull curves
~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
The ROC Hull chart widget plots the upper convex hull of the ROC chart. Each point represents the classifier performance for a given threshold or ranking cut-off point. Points on the ROC Hull represent an optimal performance of the classifier for certain misclassification costs. http://viper.ijs.si/types/curve/
* Input: Performance results
* Parameter: Chart title
* Outputs: Popup window which shows widget's results
Widget: PR curves
~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
A widget which provides the PR (precision recall) curve. It presents the trade off between the precision (the fraction of examples classified as positive that are truly positive) and the recall or true positive rate. Each point represents the classifier performance for a given threshold or ranking cut-off point. http://viper.ijs.si/types/curve/
* Input: Performance results
* Parameter: Chart title
* Outputs: Popup window which shows widget's results
Widget: Cost curves
~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
The Cost curve widget plots the normalized expected cost of the classifier as a function of the skew (fraction of positive examples multiplied by the cost of misclassifying a positive example) of the data on which it is deployed. Lines and points on the cost curve correspond to points and lines on the ROC curve of the classifier. http://viper.ijs.si/types/curve/
* Input: Performance results
* Parameter: Chart title
* Outputs: Popup window which shows widget's results
Widget: Kendall curves
~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
The Kendall chart widget presents the difference between the normalized expected cost of the classifier and the normalized expected cost of an ideal classifier. Costs for both classifiers are calculated using the rate-driven threshold choice method. http://viper.ijs.si/types/curve/
* Input: Performance results
* Parameter: Chart title
* Outputs: Popup window which shows widget's results
Widget: Rate driven curves
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
The Rate-Driven chart widget plots the expected loss for the classifier as a function of the skew (fraction of positive examples multiplied by the cost of misclassifying a positive example) of the data on which it is deployed. The cost is calculated using the rate-driven threshold choice method. http://viper.ijs.si/types/curve/
* Input: Performance results
* Parameter: Chart title
* Outputs: Popup window which shows widget's results
Widget: Related results table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
* Input: Performance results
* Outputs: Popup window which shows widget's results
Category Scatter charts
-----------------------
Widget: ROC space
~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Scatter chart - ROC space. Provides an easy and intuitive visual performance evaluation in terms of Recall, Precision and F-measure. By introducing the F-isolines into the precision-recall space the 2-dimensional graphic representation reveals information about an additional, third evaluation measure. http://viper.ijs.si/types/scatter/
* Input: Performance results
* Outputs: Popup window which shows widget's results
Widget: PR space
~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
Scatter chart - Precision-Recall space. Provides an easy and intuitive visual performance evaluation in terms of Recall, Precision and F-measure. By introducing the F-isolines into the precision-recall space the 2-dimensional graphic representation reveals information about an additional, third evaluation measure. http://viper.ijs.si/types/scatter/
* Input: Performance results
* Outputs: Popup window which shows widget's results
Category Utilities
------------------
Widget: Prepare performance curve data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. image:: ../workflows/static/widget-icons/question-mark.png
:width: 50
:height: 50
* Input: Actual and Predicted values
* Parameter: Prediction Type (Prediction scores or ranks)
* Possible values:
* Ranks
* Scores
* Default value: -score
* Output: Performance curve data