Results 1 to 9 of 9
  1. #1

    Automating downloading/decompressing files from another site

    Hi,

    We have a database that requires updating once a day.

    To do this, we must manually log into another service with our account username and password, click on a access subscription link and then select a zip file containing .csv lists using a form <select> dialogue box. We then run the .csv files through a PHP script which does the work of updating the database.

    Obviously I would like to automate the process but still being a novice with PHP, I've had difficulties finding sample scripts to help illustrate how this is done - even when browsing the manual.

    I understand the script would need to be managed with a cronjob but that is as far as I've gotten.

    Can anyone offer any advice?

    Thanks.

  2. #2
    Join Date
    Nov 2001
    Location
    Vancouver
    Posts
    2,422
    Basically you have to act as a web client to the remote site. You may find it possible to use command line tools such as curl or wget to do this; depending on remote site / the authentication method used, you may have to do quite a bit of coding... non trivial coding.

    I would examine curl/wget first. It may not get you all the way there but will give you some insight. Check the man pages.

    Now if you can't make that work out fully for you, then you'll likely get all the way there via web testing tools - these are tools that pretend to be a "browser", allowing scripts to take actions on web sites / test for reactions. Automated testing is the principle purpose but these suites work well for automating repetitive tasks too.

    http://twill.idyll.org/python-api.html

    The above link shows an interactive session - this sort of ability is invaluable while trying to figure out how you'll craft a solution. I would highly recommend downloading and trying twill (and some of the other web testing tools if twill does not fully meet your needs).

    http://webunit.sourceforge.net/ - look at this sample session (it is Python code but that doesn't matter... I think you'll get an appreciation for what is happening): http://webunit.sourceforge.net/session_example.py

    Don't let the fact that it is Python stop you from looking at this. Consider this just another application with a specific type of control file layout.
    “Even those who arrange and design shrubberies are under
    considerable economic stress at this period in history.”

  3. #3
    Join Date
    Feb 2008
    Location
    Houston, Texas, USA
    Posts
    3,262
    I've done this a long time ago with Lynx (more specifically with flags -cmd_log and -cmd_script). It should still work (Lynx has gone through some drastic source code changes). First, run the command line browser. Let's say I want to record actions from browsing http://www.unixy.net:

    lynx -cmd_log=/root/auto_login_download.chat https://www.unixy.net
    Perform the action that you would normally do. Lynx will record all actions as macros inside the file auto_login_download.chat. Once done, exit from Lynx.

    Schedule a cron job to run at 6am or whichever time is convenient:

    * 6 * * * * /root/csv_download.sh
    The csv_download.sh script contains the following:

    #!/bin/bash

    lynx -cmd_script=/root/auto_login_download.chat https://www.unixy.net
    Best
    UNIXy - Fully Managed Servers and Clusters - Established in 2006
    Server Management - Unlimited Servers. Unlimited Requests. One Plan!
    cPanel Varnish Plugin -- Seamless SSL Caching (Let's Encrypt, AutoSSL, etc)
    Slow Site or Server? Unable to handle traffic? Same day performance fix: joe@unixy

  4. #4
    Join Date
    Nov 2001
    Location
    Vancouver
    Posts
    2,422
    Ok, I learned something new today - never knew Lynx had the playback functionality.
    “Even those who arrange and design shrubberies are under
    considerable economic stress at this period in history.”

  5. #5
    Thanks for replies.

    I was afraid it would get a little messy.

    I'll have a further look and see what I can come up with.

  6. #6
    UNIXy, I neglected to mention that we're on a shared hosting package that does not offer shell access (only CPANEL). Would I still be able to do what you described under such circumstances?

  7. #7
    Join Date
    Nov 2001
    Location
    Vancouver
    Posts
    2,422
    That is a rather significant detail left out.

    Personally -- for software development even of the scope you are faced with I would not remain at a host that offers no shell access. It just makes getting on with the job too hard and there are many good hosts out there you can turn to that will give you all the tools you need to do this. You will almost certainly be frustrated at every turn if all you have is CPanel access.

    If you have access to a local Linux/Unix machine or another account you can record your command script via Lynx as UNIXy has suggested there, then upload the files and execute them via a cron job.

    However lynx is essentially a command line non GUI web browser, one which most users employ from within a login shell. If your host provides no shell access then it is less likely, though still possible, that they will have a copy of lynx installed for you to call from a script.


    I should point out that if there is any logic (i.e. you are not selecting the same file over and over) in your actions, simply recording a script - for Lynx or any other solution, is not going to be satisfactory.

    You should explore also wget and curl -- both are command line tools that are routinely included in scripts to download files and both can pass username and password to info to remote sites using basic, http-digest and in some cases form-based authentication. Really your task might be very simple if the remote application supports http-basic, http-digest, or NTLM authentication handling.

    curl can post to forms, pass along specific header info, authentication - it really is a very useful tool for doing web app interaction.

    Code:
    curl -X POST -d '{"somefield":"Somevalue"}' \
         -U username:password \
         http://theotherapp.com/want-it/
    You may need to pass some header fields as well; curl also has some more convenient ways of passing form data than I've shown here.

    curl and/or wget are somewhat more likely to be available on a host than Lynx.
    Last edited by mwatkins; 06-13-2009 at 10:40 PM.
    “Even those who arrange and design shrubberies are under
    considerable economic stress at this period in history.”

  8. #8
    Join Date
    Jun 2008
    Location
    Mumbai, India
    Posts
    126
    Why don't you run a Cron Job instead ? will it help in updating the script or do you need to run it manually ?

  9. #9
    Join Date
    Feb 2008
    Location
    Houston, Texas, USA
    Posts
    3,262
    Quote Originally Posted by phaedarus View Post
    UNIXy, I neglected to mention that we're on a shared hosting package that does not offer shell access (only CPANEL). Would I still be able to do what you described under such circumstances?
    Have you asked your provider for jailshell access? Shared hosts will enable shell access for a one time fee. If they still refuse to enable the shell, run the "recording" part of the command elsewhere and simply preserve the chat file (there's a Windows port of Lynx so you probably can run the recording of the chat on your workstation). Then upload the chat file (ex: auto_login_download.chat) into your home directory on the existing cPanel account. From cPanel, set up the cron job. -cmd_script will now point to /home/cpanel_account/auto_login_download.chat

    If your provider is so uncooperative it might be time to revalue the contract. A Web host should be an enabler especially when you intend to do no harm.

    Regards
    UNIXy - Fully Managed Servers and Clusters - Established in 2006
    Server Management - Unlimited Servers. Unlimited Requests. One Plan!
    cPanel Varnish Plugin -- Seamless SSL Caching (Let's Encrypt, AutoSSL, etc)
    Slow Site or Server? Unable to handle traffic? Same day performance fix: joe@unixy

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •