Posted in Data Imports
3758
11:48 pm, February 22, 2022
 

Importing CSS Reference

I thought it would be kinda cool to have the css reference as some kind of database that i could refer to.

this is how to import it somewhere else.

Import CSS Reference Function

This is the initial function, which i had to modify later, as it was crashing due to too many http requests. 

PHP

/* Import CSS Reference */
  public function import_css_reference (
    $loop_max = 10
  ) {
      /* This will be run on a loaded import item, so dont need to pass variabled to the function */
      global $db;
      global $functions;
      $out = "";
      $for_counter = 0;
      $main_loop_tag = "#sect2 li";

      $db_table_name = $this->db->escapeString($this->db_table_name);
      $loop_max = $this->db->escapeString($loop_max);

      require_once("lib/simple_html_dom.php");
      $html = file_get_html($this->import_url);

        foreach($html->find($main_loop_tag) as $item) {

            $for_counter++;
            if($for_counter == $loop_max) {
              continue;
            }

            $css_link = $item->find('a',0)->href;
            $css_title = $item->find('a',0)->plaintext;

            $out .= "\$css_link:$css_link<br />";
            $out .= "\$css_title:$css_title<br />";
            // this is all we need for stage one, then open the page link and process.

            $html_source = file_get_html($css_link);

            $reply_count = 0;

            $out .= "<hr />";

            return $out;

	}
}

this is returning the first item here, so its working as intended so far. 

Find the div with the id #sect2 li as a loop item. 

Then for each of them grab the link and the title. 

Its finding the first item here and getting its link and title

 

append the domain link

usually the links will not include the full domain, so need to manually append it for the full link

PHP

$x_element_link = "https://the-domain.org".$x_element_link;

 

grab the content

this part is causing a timeout on the server, processing too many html requests on one loop can cause the server to crash so, need to move this part to a separate function and request.

PHP

$html_source = file_get_html($x_element_link);

 

importing problem, server 504 error timeout

Problem its only importing 11 items, so will need to manually add each link to a temp table and then, run the details as a separate import for each item.

Split the import, so the 1st bit just adds the titles and links, then another import can go through each of them, and add the missing details from the second part of the function.

Increased memory size limit of php from 128mb to 256mb, but still timing out on import.

/etc/php/7.4/fpm$ sudo nano php.ini

#find mem_limit and change to 256

Import Full

currently crashing the server, causing 504 error timeout. 

PHP

/* Import CSS Reference */
  /* this import crashes after 10 items - so need to split into smaller import chunks */

  public function import_css_reference (
    $loop_max = 10
  ) {
      /* This will be run on a loaded import item, so dont need to pass variabled to the function */
      global $db;
      global $functions;
      $out = "";
      $for_counter = 0;
      $main_loop_tag = "#sect2 li";

      $db_table_name = $this->db->escapeString($this->db_table_name);
      $loop_max = $this->db->escapeString($loop_max);

      require_once("lib/simple_html_dom.php");
      $html = file_get_html($this->import_url);

        foreach($html->find($main_loop_tag) as $item) {

            $for_counter++;
            if($for_counter == $loop_max) {
              continue;
            }

            $x_element_link = $item->find('a',0)->href;
            $x_element_title = $item->find('a',0)->plaintext;
            $x_element_title = trim($x_element_title);

            $out .= "\$x_element_link:$x_element_link<br />";
            $out .= "\$x_element_title:$x_element_title<br />";

            $x_element_link = "https://the-domain.org".$x_element_link;
            $html_source = file_get_html($x_element_link);
            $reply_count = 0;
            $out .= "<hr />";

            foreach($html_source->find(".main-content") as $main_content) {

              $x_title = $main_content->find("h1",0)->plaintext;
              $x_title = trim($x_title);
              $out .= "\$x_title:$x_title<br />";

              $x_summary = $main_content->find("p",0)->innertext;
              $x_summary = trim($x_summary);
              $out .= "\$x_summary:$x_summary<br />";

              $x_summary_2 = $main_content->find("p",1)->innertext;
              $x_summary_2 = trim($x_summary_2);
              $out .= "\$x_summary_2:$x_summary_2<br />";

              $x_md5 = md5($x_element_title);
              $out .= "\$x_md5:$x_md5<br />";

              $x_category = "CSS";
              $out .= "\$x_category:$x_category<br />";

              $x_additional = $main_content->innertext;
              $out .= "\$x_additional:$x_additional<br />";


              // start the class
              $linked_class = new $this->linked_class;
              $linked_class->add_to_menu = false;
              $linked_class->start();

              // assign all vars
              $linked_class->title = $x_title;
              $linked_class->additional = $x_additional;
              $linked_class->category = $x_category;
              $linked_class->md5 = $x_md5;
              $linked_class->summary = $x_summary;
              $linked_class->summary_2 = $x_summary_2;
              $linked_class->element_title = $x_element_title;
              $linked_class->source_link = $x_element_link;

              // check if title md5 exists
              if(!$linked_class->md5_exists($x_md5)) {
                if($linked_class->add()) {
                    $out .= "Item $linked_class->title Added<br>";
                }
              }

            }

        }

        return $out;

  }
  /* Import CSS Reference */

 

Split the import into part1 and part2, so the 1st part of the import should just be loading one page, but its still giving me a 504 Gateway Time-out and only adding 4 items for some reason. 

Even less items than the more complicated import.

 

Import Part 1

This is a smaller import and is only grabbing the 1st page and not following url’s so it should be working better than the full import, but only adds 4 items. Hmm... 

PHP

/* Import CSS Reference - Part 1 */
  /* this import crashes after 10 items - so need to split into smaller import chunks */

  public function import_css_reference_part1 (
    $loop_max = 10
  ) {
      /* This will be run on a loaded import item, so dont need to pass variabled to the function */
      global $db;
      global $functions;
      $out = "";
      $for_counter = 0;
      $main_loop_tag = "#sect2 li";

      $db_table_name = $this->db->escapeString($this->db_table_name);
      $loop_max = $this->db->escapeString($loop_max);

      require_once("lib/simple_html_dom.php");
      $html = file_get_html($this->import_url);

        foreach($html->find($main_loop_tag) as $item) {

            $for_counter++;
            if($for_counter == $loop_max) {
              continue;
            }

            $x_element_link = $item->find('a',0)->href;
            $x_element_title = $item->find('a',0)->plaintext;
            $x_element_title = trim($x_element_title);

            $out .= "\$x_element_link:$x_element_link<br />";
            $out .= "\$x_element_title:$x_element_title<br />";

            $x_element_link = "https://the-domain.org".$x_element_link;
            $html_source = file_get_html($x_element_link);
            $reply_count = 0;
            $out .= "<hr />";

            // start the class
            $linked_class = new $this->linked_class;
            $linked_class->add_to_menu = false;
            $linked_class->start();

            $x_md5 = md5($x_element_title);
            $x_category = "CSS";

            // assign items
            $linked_class->title = $x_element_title;
            $linked_class->category = $x_category;
            $linked_class->md5 = $x_md5;
            $linked_class->source_link = $x_element_link;

            /* these following items can come from part 2 of the import */

            // $linked_class->additional = $x_additional;
            // $linked_class->summary = $x_summary;
            // $linked_class->summary_2 = $x_summary_2;
            // $linked_class->long_title = $long_title;

            // check if title md5 exists
            if(!$linked_class->md5_exists($x_md5)) {
              if($linked_class->add()) {
                  $out .= "Item $linked_class->title Added<br>";
              }
            }


        }

        return $out;

  }
  /* Import CSS Reference - Part 1 */

Still causing this timeout. 

Timeout Fixed

Actually I see the issue now, i left the download source line in there. Doh!

PHP

// get rid of this line and it should run ok
$html_source = file_get_html($x_element_link);

 

Import Stage 1 Working Now

Just the titles and the links for now. 

Woo 695 CSS Attribute Items, with no crash. 

Now to get the second part of the import done.

 

Part 2 Import

the import will need a way to check if the import has already been processed.

check through each item in css_reference and if the other flag is blank then process it, otherwise set it to processed. just do one at a time, and then add to a 1 min cron, then in 695 minutes it should be all processed. thats a long time, maybe run it every 5 seconds, and stop it after 700 x 5 seconds.

added cron, remove this after a day or so.

*/3 * * * * wget --spider https://the_import_url/ > /dev/null 2>&1

PHP

/* Import CSS Reference - Part 2 */

  /*
  This one needs to, load a single item from the css_reference
  grab the url, load the content and populate the missing items.
  when loading the item it needs to also add something to the other field, to mark it processed
  */

  public function import_css_reference_part2 (
    $loop_max = 10
  ) {
      /* This will be run on a loaded import item, so dont need to pass variabled to the function */
      global $db;
      global $functions;
      $out = "";
      $for_counter = 0;
      $main_loop_tag = ".main-content";

      $css_reference = new css_reference;
      $css_reference->add_to_menu = false;
      $css_reference->start();

      // load item - using fields array
      $fields_array = [
        "other" => "",
      ];
      if(!$css_reference->load_from_fields_array($fields_array, $max = 1)) {
        return "nothing to load";
      }

      // new item should now be loaded

      $out .= $css_reference->title . "<br />";

      $db_table_name = $this->db->escapeString($this->db_table_name);
      $loop_max = $this->db->escapeString($loop_max);

      require_once("lib/simple_html_dom.php");
      $html = file_get_html($css_reference->source_link); // new url based on loaded item

        foreach($html->find($main_loop_tag) as $main_content) {

            if($for_counter == $loop_max) {
              continue;
            }
            $for_counter++;

            $x_title = $main_content->find("h1",0)->plaintext;
            $x_title = trim($x_title);
            $out .= "\$x_title:$x_title<br />";
            $css_reference->long_title = $x_title;

            $x_summary = $main_content->find("p",0)->innertext;
            $x_summary = trim($x_summary);
            $out .= "\$x_summary:$x_summary<br />";
            $css_reference->summary = $x_summary;

            //$x_summary_2 = $main_content->find("p",1)->innertext;
            $x_summary_2 = $main_content->find(".code-example",0)->innertext;
            $x_summary_2 = trim($x_summary_2);
            $out .= "\$x_summary_2:$x_summary_2<br />";
            $css_reference->summary_2 = $x_summary_2;

            $x_additional = $main_content->innertext;
            $out .= "\$x_additional:$x_additional<br />";
            $css_reference->additional = $x_additional;

            $css_reference->other = "processed";

            if($css_reference->update()) {
              $out .= "Item $css_reference->title Updated<br>";
            }

            // check if title md5 exists
            /*
            if(!$css_reference->md5_exists($x_md5)) {
              if($css_reference->add()) {
                  $out .= "Item $css_reference->title Updated<br>";
              }
            }
            */


        }

        return $out;

  }
  /* Import CSS Reference - Part 2 */

Ran this over night and some of the items  imported and then the importer was timing out again, so increased the script processing time to 60 seconds on php, and now it seems to be working slowly again. Maybe the end site is slow. 

Found the reason it was crashing is that the source link was not the correct doc link, so it was trying to import from an incorrect page, which was crashing the script somehow. 

So go through and delete or mark as processed the ones with incorrect source links and it should continue.

I think the issue was that it had some items with disabled links that it was still using as a link source, removing these disabled links stopped the crashing. Yay!

View Statistics
This Week
82
This Month
326
This Year
1157

No Items Found.

Add Comment
Type in a Nick Name here
 
Other Items in Data Imports
Search Articles
Search Articles by entering your search text above.
Welcome

This is my test area for webdev. I keep a collection of code here, mostly for my reference. Also if i find a good link, i usually add it here and then forget about it. more...

Subscribe to weekly updates about things i have added to the site or thought interesting during the last week.

You could also follow me on twitter or not... does anyone even use twitter anymore?

If you found something useful or like my work, you can buy me a coffee here. Mmm Coffee. ☕

❤️👩‍💻🎮

🪦 2000 - 16 Oct 2022 - Boots
Random Quote
"Let us prepare our minds as if we'd come to the very end of life. Let us postpone nothing. Let us balance life's books each day ... The one who puts the finishing touches on their life each day is never short of time."
Seneca
Random CSS Property

:last-of-type

The :last-of-type CSS pseudo-class represents the last element of its type among a group of sibling elements.
:last-of-type css reference