Skip to content

05-Optimzation: Enhancements and fixes #57

@manojneuro

Description

@manojneuro

The following items can be improved in this notebook:

  • The early sections "Recap" and "Dataset" are almost identical, so redundant

  • Exercise 1
    Presumably the expectation is to separate the train/test sets for the classifier and also for the voxel selection. It might be worth emphasizing that using all the data for voxel selection is a common but subtle error. There are probably quite a few good examples in the literature that got past less technical reviewers
    In this example, I consistently get slightly below chance performance. I believe that this is driven by the cross-validation, see:
    Classification based hypothesis testing in neuroscience: Below‐chance level classification rates and overlooked statistical properties of linear parametric classifiers. HBM 2016
    Another subtle example of bias is given in the following by Watts et al 😊 Potholes and Molehills: Bias in the Diagnostic Performance of Diffusion-Tensor Imaging in Concussion. Radiology 2014

  • In 3.1 Grid search
    Strictly, the dependence of the number of combinations on granularity of the grid search is not exponential

  • 3.2 Regularization Example: L2 vs L1
    L1 regularization now requires solver='saga' in LogisticRegression call for L1 penalty. This is probably a change in the default behavior of Scikit Learn

  • 4. Build a Pipeline
    As with 3.1, there seem to be a lot of parameters that give perfect accuracy. Maybe classifying by blocks is too easy, and the number of blocks is relatively low, so big steps in accuracy

  • c_steps = [10e-1, 10e0, 10e1, 10e2] is confusing notation for exponents

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions