User Tools

Site Tools


windows_linux_instructions

Windows and Linux 3 Script Process

To run docking from pulling ligands from PubChem through making a table of top DSX scores, use the following 3 scripts:

Script 1: PUGREST to Vina using Windows Powershell

Important Note: Before running a downloaded Powershell script, you must change the permissions for Windows Powershell. Once this is done on a computer, it will remain set until changed. The Execution Policy can be determined at any time by typing

   Get-ExecutionPolicy 

in a Windows Powershell command line. This can be done by

  1. Open Windows Powershell as an Administrator (right click and choose “Run As Administrator”)
  2. type in the command
    Set-ExecutionPolicy -ExecutionPolicy Unrestricted 
  3. type “Y”

This allows you to run any downloaded script, but will give you a warning if it is Unsigned (meaning basically not created on your computer). Make sure to run only trusted scripts.

First, on Windows, open Windows PowerShell (open a new window if you just changed permissions). The default is for downloaded files to go to the file of whatever user you are currently using. Set the directory to your desired folder using “cd”.

Program Prerequisites:

  1. protein pdbqt file must be in the designated folder (i.e. 5BYY.pdbqt)
  2. a text file with NSC numbers of desired ligands, one number per line must be in the designated folder (i.e. divset3_nsc_ids.zip)
  3. the program file itself must also be in the designated folder
  4. must have known gridbox information for the protein file acquired manually using AutoDock Vina
  5. If you are running a protein file with flexible residues, also include the pdbqt file with these flexible residues in the designated folder

Program File: pugrest_to_vina.zip

To run the file:

  1. type
    .\PUGREST_to_Vina.ps1  

    (.ps1 is the Powershell script extention)

  2. There will be a prompt to give the name of the protein file and the grid box information, type this information in and then check to make sure it is correct.
  3. A configuration file is then created, if there is already a file named PROTEIN_config.txt, there will be an error. If there is no error, hit “Y” when prompted.
  4. Type in the name of the text file with the NSC numbers. Make sure to leave off the .txt extension (it is added automatically)
  5. The first two ligands will be run through AutoDock Vina. Preferably, these two ligands should be tested manually to confirm the output. Open the Results files and check that the results are what would be expected. If not, hit “N” to exit the program. If everything looks fine, hit “Y” and the program will run through all the numbers in the text file.
  6. At the end, the program will give you the number of molecules successfully analyzed. It also gives 2 lists of unsuccessful numbers, one set of numbers that could not be used as NSC identifiers to pull an SDF file from PubChem, and another set of numbers that could not be analyzed through Vina.
  7. A success.txt file is also created at the end of the program. This file contains the NSC numbers of all the successful ligands (meaning there were files created for them), and can be used in the next stage to bypass missing files.

Timing note: For the first run of 1592 compounds, it takes an average of ~5 seconds per compound (including the ones not found using PUGREST and those that failed in Vina)

Script 2: Getting DSX results using Linux Terminal

First, open the Linux Terminal. We run Linux by using Ubuntu on a Oracle VM Virtual Box.

Program Prerequisites:

  1. All the NSC_###_Results.pdbqt from part 1 must be moved to the dsx folder (a folder that can be accessed on a virtual machine and must also contain the pdb_pot folder and lib files for dsx)
  2. The protein pdbqt file must also be in the dsx folder
  3. The file success.txt must also be copied to the dsx folder. However, this file will also need to be “translated” from Windows to Linux using the command
     dos2unix success.txt 
  4. Change permissions for the bash file (this must be done on a file-by-file basis) using the command
     chmod u+x ./getDSXscores.bash 

Program File: getDSXscores.zip

To run the file:

  1. type
     ./getDSXscores.bash 
  2. enter the name of the protein as it is saved as a pdbqt file (case sensitive), so if the file is 5BYY.pdbqt, enter “5BYY”

The results will automatically go into the dsx folder as text files.

Script 3: Pulling results from DSX text files and creating a spreadsheet using Windows Powershell

Now, go back to Windows Powershell to use this last script. Program Prerequisites:

  1. Move all the DSX result text files to a folder of your choosing
  2. The Windows success.txt file (the original version) must also be in this folder
  3. The program file itself must also be in this folder
  4. The Powershell path must be directed to this folder (change this using cd)

Program File: pullTop3Scores.zip

To run the program:

  1. type
     .\pullTop3Scores.ps1 
  2. type in the name of your choosing as the name of the spreadsheet to be created by the program. Note that this program will override any other file of the same name in the current directory.

The spreadsheet gives each ligand's NSC number, top 3 modes, and top 3 DSX scores. The program can be modified by changing the maximum value for i, up to 1 + the number of modes used in Vina (default 9). It is currently set to 4 (giving us the top 3 results).

windows_linux_instructions.txt · Last modified: 2019/03/30 21:13 by smparker