Semi-blind machine learning - ensemble learning (SML-EL)
-----------------------------------------------------------

The program **vsml** implements a new approach for predictive modelling based on fMRI.
The main idea is to supplement fMRI data with readily available non-imaging information so 
that reliable predictive modeling becomes feasible even for smaller sample sizes.

The difference between *vsml* and *vsm_statistics* is that *vsm_statistics* investigates a number of
of randomly sampled training and test set pairs so that a statistic about the accuracy of predictions
can be made. The program *vsml* on the other hand applies SML to only one specific training and
test set pair.


The input into **vsml** are collections of connectomes together with a
textfiles containing the target variable of interest, e.g. intelligence,
and additional textfiles containing non-imaging supplementary information.

The program **vsml** expects as input a list of connectomes for training (parameter '-train'),
and a list of connectomes for testing (parameter '-test').
The connectomes must be in vista-format. The program **vreadconnectome** can be used to
convert those inputs into the required format.

Furthermore, the program **vsml** requires as input a text-file containing the target variable of
interest (e.g. IQ, parameter '-ytrain').
This file is used for training. It must contain one number per subject of the training set,
so that the number of rows in this file equals the number of training connectomes.

Optionally, a text-file containing the target variable for
interest for the test set can also be supplied (parameter '-ytest'). If available, this information can be used
to assess the accuracy of the prediction.

The order in which the connectomes are listed as input into the '-train' and '-test' parameters 
must coincide with the order of the rows in the respective text-files.

Likewise, **vsml** requires as input text-files containing information about the supplementary info.
There should be one file for the training set (parameter '-xtrain') and one file for the test set (parameter '-xtest').

The parameters '-dimX' and '-npls' are used to control the partial least squares regression (PLS),
where '-dimX' determines the number of features (edges of the connectome) that are input into the PLS,
while '-npls' determines the number of latent components.
The parameter -nensembles' determines the number of ensembles in the ensemble learning process.


The output is a text-file containing the predicted values of the target variable for the given test set.


Example:
```````````

 :: 


   vsml -train train_*.v -test test_*.v -ytrain IQ_train.txt -ytest IQ_test.txt \
    -xtrain Edu_train.txt -xtest Edu_test.txt -dimX 800 -npls 10 -nensembles 1000 -seed 12345  \
    -out results.txt


Parameters of 'vsml':
`````````````````````````

   -help       Prints usage information.
   -train      Input fMRI files, training set (Required).
   -test       Input fMRI files, test set (Optional).
   -out        Output textfile.
   -ytrain     Textfile containing the target variable of the training set.
   -ytest      Textfile containing the target variable of the test set.
   -xtrain     Textfile containing the supplementary info of the training set.
   -xtest      Textfile containing the supplementary info of the test set.
   -dimX       Number of features per ensemble.
   -npls       Number of components for PLS.
   -nensembles  Number of ensembles.
   -seed       Seed for random number generator.


.. index:: vsml


Reference:
^^^^^^^^^^^^^^^^^^^^^^

 Lohmann, G. et al (2023), bioRxiv, Improving the reliability of fMRI-based predictions of intelligence via semi-blind machine learning, https://doi.org/10.1101/2023.11.03.565485