AWS CloudFront duplicate content issue Solution

AWS Cloudfront is like proxy service, e.g. A request to Cloudfront will be served with content hosted on origin server. You chose your origin server, in the process Cloudfront will cache any files it has already served and will return the cached version for future requests.

AWS Cloudfront process:

Request -> CloudFront -> Origin Server

Origin Server -> CloudFront -> Response

 

Upon searching on internet other solutions suggest to use robot.txt, but the issue with robot.txt is that you have to make changes to your site plus it will block access to CSS and JS files too. As now Google bot are like modern browsers they need access to CSS files (to detect if your site is responsive/mobile friendly).

This solution assumes that you want to only serve static files like CSS, JS and image files.

You can use AWS Cloudfront service to cache a complete domain of your choosing, but this would create duplicate for all the content served by origin server.

So for this example we suppose that you have domain static.example.com and you are serving all its content from XXXX.cloudfront.net, this is not good SEO as it would cause duplicate content issue with search engines.

To get around the issue: once you have created your CloudFront distribution, go to “Origins” tab and add a new origin to domain say “non-existent.example.com

Once you have added a new origin to a domain that doesn’t exists any request to domain won’t get served by CloudFront.

Now go to “Behaviors” tab edit the default behavior and set origin to non existent, after this all requests to XXXX.cloudfront.net should give error(given that it is still not cached by edge locations).

Now create a new behavior with “Path Pattern” set to something like *.css for CSS files and set the origin to static.example.com, repeat this step for all the path patterns that you actually want to resolve to a successful request.

The above setup will ensure that your distribution only serve the paths patterns that you have included.

 

How to clear EhCache OnDemand

I found the following solution on Spring framework forum.

You can add following controller to your admin app, make sure the below URL is only available to you and not exposed to public.

In below code we are injecting CacheManager object, the I name used (ehCacheManager) might be different for your code.

After deploying below code visit http://localhost:8080/list-ehcache-objects and click on the cache names to clear them.


package com.mycompany.controller;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;

import javax.annotation.Resource;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import net.sf.ehcache.CacheManager;


import org.springframework.stereotype.Controller;
import org.springframework.ui.Model;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.servlet.ModelAndView;
import org.springframework.web.servlet.mvc.ParameterizableViewController;

@Controller
public class EHCacheController  {
	
	@Resource(name="ehCacheManager")
	private CacheManager cacheManager;

    /* (non-Javadoc)
     * @see org.springframework.web.servlet.mvc.ParameterizableViewController#handleRequestInternal(javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse)
     */
    @SuppressWarnings("unchecked")
    @RequestMapping( value = "/list-ehcache-objects")
	protected String clear(Model viewModel, HttpServletRequest request, HttpServletResponse response) throws Exception {
	        HashMap model = new HashMap();
	        //Get all the active caches
	        List caches = new ArrayList(cacheManager.getCacheNames().length);
	        ArrayList cacheNamesList = new ArrayList();
	        String[] cacheNames = cacheManager.getCacheNames();
	        Iterator iter = Arrays.asList(cacheNames).iterator();
	        String cacheName = request.getParameter("cacheName");
	        while (iter.hasNext()){
	        	
	            // If the cache name has been passed from the request then flush it //
	            String cacheNameTest = (String) iter.next();
	            if (cacheNameTest.equalsIgnoreCase(cacheName)){
	                cacheManager.getCache(cacheNameTest).removeAll();
	            }
	            caches.add(cacheManager.getCache(cacheNameTest));
	            cacheNamesList.add(cacheNameTest);
	        }
	        //Stick the caches in the page model
	        model.put("caches", caches);
	        model.put("cacheNames", cacheNamesList);
	        viewModel.addAllAttributes(model);
	        return "layout/clearehcache";
	    }

    
	/**
	 * Setter for the EHCacheManager
	 * @param cacheManager
	 */
	public void setCacheManager(CacheManager cacheManager) {
		this.cacheManager = cacheManager;
	}
	
}

For my controller view I used the following, since I am using Thymeleaf the syntax will be different to that of JSP files.

Error connecting to self generated SSL certifiate

Even now & then I run into this issue, so adding here for my own reference and everyone else.

If you are getting following error:

javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException

This means the server you are trying to connect to use self generated certificate, to solve this issue you need to tell JRE/JDK to trust the certificate.
To do that you need to import the SSL certificate into your JRE.

In my case I am trying to connect to SMTP over SSL, a quick service for “download smtp certificate” gave me http://notepad2.blogspot.co.uk/2012/04/import-gmail-certificate-into-java.html

To download certificate for HTTP, you can use Firefox and Internet explorer, by clicking the secure icon in address bar.

so to download the certificate for SMTP run following in console:

openssl s_client -connect smtp.gmail.com:465

Although I downloaded the certificate using port 465 but my java configuration only works on port 587 for SMTP with TLS enabled.

which outputs the certificate, e.g.

-----BEGIN CERTIFICATE----- .......... -----END CERTIFICATE-----

you can save the certificate into a file e.g.

nano smtp.gmail.com.cert

now import the certificate using:

keytool -import -trustcacerts -alias smtp.gmail.com -file /path/to/smtp.gmail.com.cert

Say “yes” to the command prompt, now you should be able to connect.

This will import the certificate to the default keystore, for me it under the home directory. As in the above command I didn’t specify a keystore, I use the same keystore e.g “~/.tomcat” for my tomcat configurations.

Also import the certificate to your JRE/JDK keystore using:

keytool -import -trustcacerts -alias smtp.gmail.com -file /path/to/smtp.gmail.com.cert -keystore $JAVA_HOME/jre/lib/security/cacerts

How to block/restrict max number of HTTP connection per IP address

I have written a simple PHP file to do that for me and I run it every 1 minute using Cron-Job.

The script needs to run as PHP-CLI (cron or command prompt) as it runs system commands and would be blocked if run as Apache script.

Here is the code.

How to fix Warning: is_readable(): open_basedir restriction in effect in Zend Framework => 1.10

If you are getting load of this error after upgrading to Zend framework 1.10 and greater, you can fix it by changing Zend/Loader.php file

Change method isReadable() to following
Also make sure anything in your include path should be in your basedir option too.
Waring!!! use the code at your own risk as I have not done extensive tests.


    /**
     * Returns TRUE if the $filename is readable, or FALSE otherwise.
     * This function uses the PHP include_path, where PHP's is_readable()
     * does not.
     *
     * Note from ZF-2900:
     * If you use custom error handler, please check whether return value
     *  from error_reporting() is zero or not.
     * At mark of fopen() can not suppress warning if the handler is used.
     *
     * @param string   $filename
     * @return boolean
     */
    public static function isReadable($filename)
    {
        if (is_readable($filename)) {
            // Return early if the filename is readable without needing the 
            // include_path
            return true;
        }

        if (strtoupper(substr(PHP_OS, 0, 3)) == 'WIN'
            && preg_match('/^[a-z]:/i', $filename)
        ) {
            // If on windows, and path provided is clearly an absolute path, 
            // return false immediately
            return false;
        }

        if(strpos($filename, '/')==0) return false; //if absolute path skip

        foreach (self::explodeIncludePath() as $path) {
            if ($path == '.') {
                if (is_readable($filename)) {
                    return true;
                }
                continue;
            }
            $file = $path . '/' . $filename;
            if (is_readable(realpath($file))) {
                return true;
            }
        }
        return false;
    }

How to Find & replace recursively in Linux from command line script

Following is the script to find any phrase and replace it with another.

for i in `grep –include=”*\.include” -rl ‘phrase to search’ /var/www/`; do

sed –in-place=bak -e ‘s/phrase to search/phrase to replace with/g’ “$i”

done

The above script searches folder /var/www and all of its sub folders with any file with extension .include for phrase “phrase to search” and will replace all the found phrases with phrase “phrase to replace with

How to auto start Red 5 on Ubuntu or any linux distro

Following script monitors RMTP port, which indicates if Red5 is running or not

if (netstat -na | grep -q :1935) ; then
echo “Red5 running.”
else
bash /usr/local/red5/red5.sh
fi
If RMTP is open for connection nothing is done, else the script tries to start Red5 server by executing the red5.sh startup script.
so save the above script in red5 folder say /usr/local/red5/cron.sh
now we need to add a cron job to run the cron.sh at specified intervals, e.g. we will run it every 2 mins.
edit cron jobs with following command
crontab -e
now add following line to the cron jobs
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,59 * * * * /usr/local/red5/cron.sh & >/dev/null 2>&1
The above setup will monitor red5 every two minute , if server is not running it will start the server.

The script whose uid is 33 is not allowed to access /tmp owned by uid 0

If SAFE MODE is on in PHP, the owner ID of the PHP file needs to be same to any folder or file it tries to access. So if you try to use sessions and if your PHP is configured to use /tmp folder for saving session data it can cause error shown in title of this blog post.

To fix it check following directive is set as follows

session.save_path = /tmp

If yes create folder

/var/lib/php5/session

change owner of the above folder to same as that of the web server, so for e.g. you would say

chown www-data:www-data /var/lib/php5/session

now change /etc/php.ini  or /etc/php5/apache2/php.ini

to use following setting

session.save_path = /var/lib/php5/session

now reload web server e.g.

service apache2 reload

or

/etc/init.d/apache reload

This should fix the problem

Problems faced configuring LAMP

1. All the virtual host need to use the private IP address in the virtual host directive rather the elastic IP or it will not work.

i.e.

Good

<VirtualHost 10.202.150.134:80>

Not

<VirtualHost *:80>

<VirtualHost 0.0.0.0:80>

<VirtualHost 184.72.230.132:80>

2. Access denied for user ‘www-data’@’localhost’ shows up if the first attempt to connect to database has failed and application has used mysql_query after that, so php won’t find any active connections so it would try default settings to connect to database, quite hard to fix.

Mysql Replication Settings

http://www.softwareprojects.com/resources/programming/t-how-to-move-copy-a-live-mysql-database-and-what-1257.html

On Mysql Server

1. Turn binary logging on and add server id

CAUTION ! DO NOT USE “log-bin=/var/log/mysql/mysql-bin.log’

[mysqld]
log-bin=mysql-bin
server-id=1

2. Create replication account on the server

GRANT REPLICATION SLAVE ON *.* TO ‘repl’@’slave-ip-address’ IDENTIFIED BY ‘password-here’;

3. Keep the above mysql console open, start new terminal and connect to server via new terminal

cd /var/lib/mysql

now use dir, you should see all the database folders

4. Time to lock all the databases on server.

mysql> SET GLOBAL WAIT_TIMEOUT=600000; SET WAIT_TIMEOUT = 600000; FLUSH TABLES WITH READ LOCK;

now on the new console enter

root@server:/var/lib/mysql#tar -cvf /tmp/mysql-snapshot.tar ./ –exclude mysql &

enter following in mysql console, this will show the position of server in log file, keep this window open and note file and position column

SHOW MASTER STATUS;

now enter following in mysql console

UNLOCK TABLES;

5. Copy the tar file from master to slave server

scp -i permission-file.pem /tmp/mysql-snapshot.tar root@slave-server-ip:/root/

6. Extract the tar file on slave server

cd /var/lib/mysql

mv /root/mysql-snapshot.tar .

tar –extract –file=mysql-snapshot.tar

7. Change server id on slave mysql server, add following to config file /etc/my.cnf or /etc/mysql/my.cnf

[mysqld]

server-id=2

8. Start or restart Slave mysql server with

service mysql restart

9. Login to mysql console and enter

mysql>stop slave;

mysql> CHANGE MASTER TO
MASTER_HOST=’ip-of-master-server’,
MASTER_USER=’repl’,
MASTER_PASSWORD=’replication_password-from-step-2′,
MASTER_LOG_FILE=’recorded_log_file_name-from-step-4′,
MASTER_LOG_POS=recorded_log_position-from-step-4;

mysql>start slave;

If every thing is done correctly the slave should start replicating.

Once master has been set on and slave started on Slave, after restart the slave will auto start.

If something is not right on slave, login to slave mysql console

use following

stop slave;

reset slave;

now from the new bash console, reextract the tar file replacing existing file with following command

root@slave-server:/var/lib/mysql# tar –overwrite –extract –file=mysql-snapshot.tar

now repeat step 9.